1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (C) 2004, 2005 Oracle. All rights reserved. */ #include <linux/slab.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/configfs.h> #include "tcp.h" #include "nodemanager.h" #include "heartbeat.h" #include "masklog.h" #include "sys.h" /* for now we operate under the assertion that there can be only one * cluster active at a time. Changing this will require trickling * cluster references throughout where nodes are looked up */ struct o2nm_cluster *o2nm_single_cluster = NULL; static const char *o2nm_fence_method_desc[O2NM_FENCE_METHODS] = { "reset", /* O2NM_FENCE_RESET */ "panic", /* O2NM_FENCE_PANIC */ }; static inline void o2nm_lock_subsystem(void); static inline void o2nm_unlock_subsystem(void); struct o2nm_node *o2nm_get_node_by_num(u8 node_num) { struct o2nm_node *node = NULL; if (node_num >= O2NM_MAX_NODES || o2nm_single_cluster == NULL) goto out; read_lock(&o2nm_single_cluster->cl_nodes_lock); node = o2nm_single_cluster->cl_nodes[node_num]; if (node) config_item_get(&node->nd_item); read_unlock(&o2nm_single_cluster->cl_nodes_lock); out: return node; } EXPORT_SYMBOL_GPL(o2nm_get_node_by_num); int o2nm_configured_node_map(unsigned long *map, unsigned bytes) { struct o2nm_cluster *cluster = o2nm_single_cluster; BUG_ON(bytes < (sizeof(cluster->cl_nodes_bitmap))); if (cluster == NULL) return -EINVAL; read_lock(&cluster->cl_nodes_lock); bitmap_copy(map, cluster->cl_nodes_bitmap, O2NM_MAX_NODES); read_unlock(&cluster->cl_nodes_lock); return 0; } EXPORT_SYMBOL_GPL(o2nm_configured_node_map); static struct o2nm_node *o2nm_node_ip_tree_lookup(struct o2nm_cluster *cluster, __be32 ip_needle, struct rb_node ***ret_p, struct rb_node **ret_parent) { struct rb_node **p = &cluster->cl_node_ip_tree.rb_node; struct rb_node *parent = NULL; struct o2nm_node *node, *ret = NULL; while (*p) { int cmp; parent = *p; node = rb_entry(parent, struct o2nm_node, nd_ip_node); cmp = memcmp(&ip_needle, &node->nd_ipv4_address, sizeof(ip_needle)); if (cmp < 0) p = &(*p)->rb_left; else if (cmp > 0) p = &(*p)->rb_right; else { ret = node; break; } } if (ret_p != NULL) *ret_p = p; if (ret_parent != NULL) *ret_parent = parent; return ret; } struct o2nm_node *o2nm_get_node_by_ip(__be32 addr) { struct o2nm_node *node = NULL; struct o2nm_cluster *cluster = o2nm_single_cluster; if (cluster == NULL) goto out; read_lock(&cluster->cl_nodes_lock); node = o2nm_node_ip_tree_lookup(cluster, addr, NULL, NULL); if (node) config_item_get(&node->nd_item); read_unlock(&cluster->cl_nodes_lock); out: return node; } EXPORT_SYMBOL_GPL(o2nm_get_node_by_ip); void o2nm_node_put(struct o2nm_node *node) { config_item_put(&node->nd_item); } EXPORT_SYMBOL_GPL(o2nm_node_put); void o2nm_node_get(struct o2nm_node *node) { config_item_get(&node->nd_item); } EXPORT_SYMBOL_GPL(o2nm_node_get); u8 o2nm_this_node(void) { u8 node_num = O2NM_MAX_NODES; if (o2nm_single_cluster && o2nm_single_cluster->cl_has_local) node_num = o2nm_single_cluster->cl_local_node; return node_num; } EXPORT_SYMBOL_GPL(o2nm_this_node); /* node configfs bits */ static struct o2nm_cluster *to_o2nm_cluster(struct config_item *item) { return item ? container_of(to_config_group(item), struct o2nm_cluster, cl_group) : NULL; } static struct o2nm_node *to_o2nm_node(struct config_item *item) { return item ? container_of(item, struct o2nm_node, nd_item) : NULL; } static void o2nm_node_release(struct config_item *item) { struct o2nm_node *node = to_o2nm_node(item); kfree(node); } static ssize_t o2nm_node_num_show(struct config_item *item, char *page) { return sprintf(page, "%d\n", to_o2nm_node(item)->nd_num); } static struct o2nm_cluster *to_o2nm_cluster_from_node(struct o2nm_node *node) { /* through the first node_set .parent * mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */ if (node->nd_item.ci_parent) return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); else return NULL; } enum { O2NM_NODE_ATTR_NUM = 0, O2NM_NODE_ATTR_PORT, O2NM_NODE_ATTR_ADDRESS, }; static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; int ret = 0; tmp = simple_strtoul(p, &p, 0); if (!p || (*p && (*p != '\n'))) return -EINVAL; if (tmp >= O2NM_MAX_NODES) return -ERANGE; /* once we're in the cl_nodes tree networking can look us up by * node number and try to use our address and port attributes * to connect to this node.. make sure that they've been set * before writing the node attribute? */ if (!test_bit(O2NM_NODE_ATTR_ADDRESS, &node->nd_set_attributes) || !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ o2nm_lock_subsystem(); cluster = to_o2nm_cluster_from_node(node); if (!cluster) { o2nm_unlock_subsystem(); return -EINVAL; } write_lock(&cluster->cl_nodes_lock); if (cluster->cl_nodes[tmp]) ret = -EEXIST; else if (test_and_set_bit(O2NM_NODE_ATTR_NUM, &node->nd_set_attributes)) ret = -EBUSY; else { cluster->cl_nodes[tmp] = node; node->nd_num = tmp; set_bit(tmp, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); o2nm_unlock_subsystem(); if (ret) return ret; return count; } static ssize_t o2nm_node_ipv4_port_show(struct config_item *item, char *page) { return sprintf(page, "%u\n", ntohs(to_o2nm_node(item)->nd_ipv4_port)); } static ssize_t o2nm_node_ipv4_port_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); unsigned long tmp; char *p = (char *)page; tmp = simple_strtoul(p, &p, 0); if (!p || (*p && (*p != '\n'))) return -EINVAL; if (tmp == 0) return -EINVAL; if (tmp >= (u16)-1) return -ERANGE; if (test_and_set_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EBUSY; node->nd_ipv4_port = htons(tmp); return count; } static ssize_t o2nm_node_ipv4_address_show(struct config_item *item, char *page) { return sprintf(page, "%pI4\n", &to_o2nm_node(item)->nd_ipv4_address); } static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); struct o2nm_cluster *cluster; int ret, i; struct rb_node **p, *parent; unsigned int octets[4]; __be32 ipv4_addr = 0; ret = sscanf(page, "%3u.%3u.%3u.%3u", &octets[3], &octets[2], &octets[1], &octets[0]); if (ret != 4) return -EINVAL; for (i = 0; i < ARRAY_SIZE(octets); i++) { if (octets[i] > 255) return -ERANGE; be32_add_cpu(&ipv4_addr, octets[i] << (i * 8)); } o2nm_lock_subsystem(); cluster = to_o2nm_cluster_from_node(node); if (!cluster) { o2nm_unlock_subsystem(); return -EINVAL; } ret = 0; write_lock(&cluster->cl_nodes_lock); if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, &p, &parent)) ret = -EEXIST; else if (test_and_set_bit(O2NM_NODE_ATTR_ADDRESS, &node->nd_set_attributes)) ret = -EBUSY; else { rb_link_node(&node->nd_ip_node, parent, p); rb_insert_color(&node->nd_ip_node, &cluster->cl_node_ip_tree); } write_unlock(&cluster->cl_nodes_lock); o2nm_unlock_subsystem(); if (ret) return ret; memcpy(&node->nd_ipv4_address, &ipv4_addr, sizeof(ipv4_addr)); return count; } static ssize_t o2nm_node_local_show(struct config_item *item, char *page) { return sprintf(page, "%d\n", to_o2nm_node(item)->nd_local); } static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; ssize_t ret; tmp = simple_strtoul(p, &p, 0); if (!p || (*p && (*p != '\n'))) return -EINVAL; tmp = !!tmp; /* boolean of whether this node wants to be local */ /* setting local turns on networking rx for now so we require having * set everything else first */ if (!test_bit(O2NM_NODE_ATTR_ADDRESS, &node->nd_set_attributes) || !test_bit(O2NM_NODE_ATTR_NUM, &node->nd_set_attributes) || !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ o2nm_lock_subsystem(); cluster = to_o2nm_cluster_from_node(node); if (!cluster) { ret = -EINVAL; goto out; } /* the only failure case is trying to set a new local node * when a different one is already set */ if (tmp && tmp == cluster->cl_has_local && cluster->cl_local_node != node->nd_num) { ret = -EBUSY; goto out; } /* bring up the rx thread if we're setting the new local node. */ if (tmp && !cluster->cl_has_local) { ret = o2net_start_listening(node); if (ret) goto out; } if (!tmp && cluster->cl_has_local && cluster->cl_local_node == node->nd_num) { o2net_stop_listening(node); cluster->cl_local_node = O2NM_INVALID_NODE_NUM; } node->nd_local = tmp; if (node->nd_local) { cluster->cl_has_local = tmp; cluster->cl_local_node = node->nd_num; } ret = count; out: o2nm_unlock_subsystem(); return ret; } CONFIGFS_ATTR(o2nm_node_, num); CONFIGFS_ATTR(o2nm_node_, ipv4_port); CONFIGFS_ATTR(o2nm_node_, ipv4_address); CONFIGFS_ATTR(o2nm_node_, local); static struct configfs_attribute *o2nm_node_attrs[] = { &o2nm_node_attr_num, &o2nm_node_attr_ipv4_port, &o2nm_node_attr_ipv4_address, &o2nm_node_attr_local, NULL, }; static struct configfs_item_operations o2nm_node_item_ops = { .release = o2nm_node_release, }; static const struct config_item_type o2nm_node_type = { .ct_item_ops = &o2nm_node_item_ops, .ct_attrs = o2nm_node_attrs, .ct_owner = THIS_MODULE, }; /* node set */ struct o2nm_node_group { struct config_group ns_group; /* some stuff? */ }; #if 0 static struct o2nm_node_group *to_o2nm_node_group(struct config_group *group) { return group ? container_of(group, struct o2nm_node_group, ns_group) : NULL; } #endif static ssize_t o2nm_cluster_attr_write(const char *page, ssize_t count, unsigned int *val) { unsigned long tmp; char *p = (char *)page; tmp = simple_strtoul(p, &p, 0); if (!p || (*p && (*p != '\n'))) return -EINVAL; if (tmp == 0) return -EINVAL; if (tmp >= (u32)-1) return -ERANGE; *val = tmp; return count; } static ssize_t o2nm_cluster_idle_timeout_ms_show(struct config_item *item, char *page) { return sprintf(page, "%u\n", to_o2nm_cluster(item)->cl_idle_timeout_ms); } static ssize_t o2nm_cluster_idle_timeout_ms_store(struct config_item *item, const char *page, size_t count) { struct o2nm_cluster *cluster = to_o2nm_cluster(item); ssize_t ret; unsigned int val; ret = o2nm_cluster_attr_write(page, count, &val); if (ret > 0) { if (cluster->cl_idle_timeout_ms != val && o2net_num_connected_peers()) { mlog(ML_NOTICE, "o2net: cannot change idle timeout after " "the first peer has agreed to it." " %d connected peers\n", o2net_num_connected_peers()); ret = -EINVAL; } else if (val <= cluster->cl_keepalive_delay_ms) { mlog(ML_NOTICE, "o2net: idle timeout must be larger " "than keepalive delay\n"); ret = -EINVAL; } else { cluster->cl_idle_timeout_ms = val; } } return ret; } static ssize_t o2nm_cluster_keepalive_delay_ms_show( struct config_item *item, char *page) { return sprintf(page, "%u\n", to_o2nm_cluster(item)->cl_keepalive_delay_ms); } static ssize_t o2nm_cluster_keepalive_delay_ms_store( struct config_item *item, const char *page, size_t count) { struct o2nm_cluster *cluster = to_o2nm_cluster(item); ssize_t ret; unsigned int val; ret = o2nm_cluster_attr_write(page, count, &val); if (ret > 0) { if (cluster->cl_keepalive_delay_ms != val && o2net_num_connected_peers()) { mlog(ML_NOTICE, "o2net: cannot change keepalive delay after" " the first peer has agreed to it." " %d connected peers\n", o2net_num_connected_peers()); ret = -EINVAL; } else if (val >= cluster->cl_idle_timeout_ms) { mlog(ML_NOTICE, "o2net: keepalive delay must be " "smaller than idle timeout\n"); ret = -EINVAL; } else { cluster->cl_keepalive_delay_ms = val; } } return ret; } static ssize_t o2nm_cluster_reconnect_delay_ms_show( struct config_item *item, char *page) { return sprintf(page, "%u\n", to_o2nm_cluster(item)->cl_reconnect_delay_ms); } static ssize_t o2nm_cluster_reconnect_delay_ms_store( struct config_item *item, const char *page, size_t count) { return o2nm_cluster_attr_write(page, count, &to_o2nm_cluster(item)->cl_reconnect_delay_ms); } static ssize_t o2nm_cluster_fence_method_show( struct config_item *item, char *page) { struct o2nm_cluster *cluster = to_o2nm_cluster(item); ssize_t ret = 0; if (cluster) ret = sprintf(page, "%s\n", o2nm_fence_method_desc[cluster->cl_fence_method]); return ret; } static ssize_t o2nm_cluster_fence_method_store( struct config_item *item, const char *page, size_t count) { unsigned int i; if (page[count - 1] != '\n') goto bail; for (i = 0; i < O2NM_FENCE_METHODS; ++i) { if (count != strlen(o2nm_fence_method_desc[i]) + 1) continue; if (strncasecmp(page, o2nm_fence_method_desc[i], count - 1)) continue; if (to_o2nm_cluster(item)->cl_fence_method != i) { printk(KERN_INFO "ocfs2: Changing fence method to %s\n", o2nm_fence_method_desc[i]); to_o2nm_cluster(item)->cl_fence_method = i; } return count; } bail: return -EINVAL; } CONFIGFS_ATTR(o2nm_cluster_, idle_timeout_ms); CONFIGFS_ATTR(o2nm_cluster_, keepalive_delay_ms); CONFIGFS_ATTR(o2nm_cluster_, reconnect_delay_ms); CONFIGFS_ATTR(o2nm_cluster_, fence_method); static struct configfs_attribute *o2nm_cluster_attrs[] = { &o2nm_cluster_attr_idle_timeout_ms, &o2nm_cluster_attr_keepalive_delay_ms, &o2nm_cluster_attr_reconnect_delay_ms, &o2nm_cluster_attr_fence_method, NULL, }; static struct config_item *o2nm_node_group_make_item(struct config_group *group, const char *name) { struct o2nm_node *node = NULL; if (strlen(name) > O2NM_MAX_NAME_LEN) return ERR_PTR(-ENAMETOOLONG); node = kzalloc(sizeof(struct o2nm_node), GFP_KERNEL); if (node == NULL) return ERR_PTR(-ENOMEM); strcpy(node->nd_name, name); /* use item.ci_namebuf instead? */ config_item_init_type_name(&node->nd_item, name, &o2nm_node_type); spin_lock_init(&node->nd_lock); mlog(ML_CLUSTER, "o2nm: Registering node %s\n", name); return &node->nd_item; } static void o2nm_node_group_drop_item(struct config_group *group, struct config_item *item) { struct o2nm_node *node = to_o2nm_node(item); struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent); if (cluster->cl_nodes[node->nd_num] == node) { o2net_disconnect_node(node); if (cluster->cl_has_local && (cluster->cl_local_node == node->nd_num)) { cluster->cl_has_local = 0; cluster->cl_local_node = O2NM_INVALID_NODE_NUM; o2net_stop_listening(node); } } /* XXX call into net to stop this node from trading messages */ write_lock(&cluster->cl_nodes_lock); /* XXX sloppy */ if (node->nd_ipv4_address) rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree); /* nd_num might be 0 if the node number hasn't been set.. */ if (cluster->cl_nodes[node->nd_num] == node) { cluster->cl_nodes[node->nd_num] = NULL; clear_bit(node->nd_num, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); mlog(ML_CLUSTER, "o2nm: Unregistered node %s\n", config_item_name(&node->nd_item)); config_item_put(item); } static struct configfs_group_operations o2nm_node_group_group_ops = { .make_item = o2nm_node_group_make_item, .drop_item = o2nm_node_group_drop_item, }; static const struct config_item_type o2nm_node_group_type = { .ct_group_ops = &o2nm_node_group_group_ops, .ct_owner = THIS_MODULE, }; /* cluster */ static void o2nm_cluster_release(struct config_item *item) { struct o2nm_cluster *cluster = to_o2nm_cluster(item); kfree(cluster); } static struct configfs_item_operations o2nm_cluster_item_ops = { .release = o2nm_cluster_release, }; static const struct config_item_type o2nm_cluster_type = { .ct_item_ops = &o2nm_cluster_item_ops, .ct_attrs = o2nm_cluster_attrs, .ct_owner = THIS_MODULE, }; /* cluster set */ struct o2nm_cluster_group { struct configfs_subsystem cs_subsys; /* some stuff? */ }; #if 0 static struct o2nm_cluster_group *to_o2nm_cluster_group(struct config_group *group) { return group ? container_of(to_configfs_subsystem(group), struct o2nm_cluster_group, cs_subsys) : NULL; } #endif static struct config_group *o2nm_cluster_group_make_group(struct config_group *group, const char *name) { struct o2nm_cluster *cluster = NULL; struct o2nm_node_group *ns = NULL; struct config_group *o2hb_group = NULL, *ret = NULL; /* this runs under the parent dir's i_rwsem; there can be only * one caller in here at a time */ if (o2nm_single_cluster) return ERR_PTR(-ENOSPC); cluster = kzalloc(sizeof(struct o2nm_cluster), GFP_KERNEL); ns = kzalloc(sizeof(struct o2nm_node_group), GFP_KERNEL); o2hb_group = o2hb_alloc_hb_set(); if (cluster == NULL || ns == NULL || o2hb_group == NULL) goto out; config_group_init_type_name(&cluster->cl_group, name, &o2nm_cluster_type); configfs_add_default_group(&ns->ns_group, &cluster->cl_group); config_group_init_type_name(&ns->ns_group, "node", &o2nm_node_group_type); configfs_add_default_group(o2hb_group, &cluster->cl_group); rwlock_init(&cluster->cl_nodes_lock); cluster->cl_node_ip_tree = RB_ROOT; cluster->cl_reconnect_delay_ms = O2NET_RECONNECT_DELAY_MS_DEFAULT; cluster->cl_idle_timeout_ms = O2NET_IDLE_TIMEOUT_MS_DEFAULT; cluster->cl_keepalive_delay_ms = O2NET_KEEPALIVE_DELAY_MS_DEFAULT; cluster->cl_fence_method = O2NM_FENCE_RESET; ret = &cluster->cl_group; o2nm_single_cluster = cluster; out: if (ret == NULL) { kfree(cluster); kfree(ns); o2hb_free_hb_set(o2hb_group); ret = ERR_PTR(-ENOMEM); } return ret; } static void o2nm_cluster_group_drop_item(struct config_group *group, struct config_item *item) { struct o2nm_cluster *cluster = to_o2nm_cluster(item); BUG_ON(o2nm_single_cluster != cluster); o2nm_single_cluster = NULL; configfs_remove_default_groups(&cluster->cl_group); config_item_put(item); } static struct configfs_group_operations o2nm_cluster_group_group_ops = { .make_group = o2nm_cluster_group_make_group, .drop_item = o2nm_cluster_group_drop_item, }; static const struct config_item_type o2nm_cluster_group_type = { .ct_group_ops = &o2nm_cluster_group_group_ops, .ct_owner = THIS_MODULE, }; static struct o2nm_cluster_group o2nm_cluster_group = { .cs_subsys = { .su_group = { .cg_item = { .ci_namebuf = "cluster", .ci_type = &o2nm_cluster_group_type, }, }, }, }; static inline void o2nm_lock_subsystem(void) { mutex_lock(&o2nm_cluster_group.cs_subsys.su_mutex); } static inline void o2nm_unlock_subsystem(void) { mutex_unlock(&o2nm_cluster_group.cs_subsys.su_mutex); } int o2nm_depend_item(struct config_item *item) { return configfs_depend_item(&o2nm_cluster_group.cs_subsys, item); } void o2nm_undepend_item(struct config_item *item) { configfs_undepend_item(item); } int o2nm_depend_this_node(void) { int ret = 0; struct o2nm_node *local_node; local_node = o2nm_get_node_by_num(o2nm_this_node()); if (!local_node) { ret = -EINVAL; goto out; } ret = o2nm_depend_item(&local_node->nd_item); o2nm_node_put(local_node); out: return ret; } void o2nm_undepend_this_node(void) { struct o2nm_node *local_node; local_node = o2nm_get_node_by_num(o2nm_this_node()); BUG_ON(!local_node); o2nm_undepend_item(&local_node->nd_item); o2nm_node_put(local_node); } static void __exit exit_o2nm(void) { /* XXX sync with hb callbacks and shut down hb? */ o2net_unregister_hb_callbacks(); configfs_unregister_subsystem(&o2nm_cluster_group.cs_subsys); o2cb_sys_shutdown(); o2net_exit(); o2hb_exit(); } static int __init init_o2nm(void) { int ret; o2hb_init(); ret = o2net_init(); if (ret) goto out_o2hb; ret = o2net_register_hb_callbacks(); if (ret) goto out_o2net; config_group_init(&o2nm_cluster_group.cs_subsys.su_group); mutex_init(&o2nm_cluster_group.cs_subsys.su_mutex); ret = configfs_register_subsystem(&o2nm_cluster_group.cs_subsys); if (ret) { printk(KERN_ERR "nodemanager: Registration returned %d\n", ret); goto out_callbacks; } ret = o2cb_sys_init(); if (!ret) goto out; configfs_unregister_subsystem(&o2nm_cluster_group.cs_subsys); out_callbacks: o2net_unregister_hb_callbacks(); out_o2net: o2net_exit(); out_o2hb: o2hb_exit(); out: return ret; } MODULE_AUTHOR("Oracle"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("OCFS2 cluster management"); module_init(init_o2nm) module_exit(exit_o2nm) |
2 2 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | // SPDX-License-Identifier: GPL-2.0-or-later /* * HID driver for some chicony "special" devices * * Copyright (c) 1999 Andreas Gal * Copyright (c) 2000-2005 Vojtech Pavlik <vojtech@suse.cz> * Copyright (c) 2005 Michael Haboustak <mike-@cinci.rr.com> for Concept2, Inc * Copyright (c) 2006-2007 Jiri Kosina * Copyright (c) 2007 Paul Walmsley * Copyright (c) 2008 Jiri Slaby */ /* */ #include <linux/device.h> #include <linux/input.h> #include <linux/hid.h> #include <linux/module.h> #include <linux/usb.h> #include "hid-ids.h" #define CH_WIRELESS_CTL_REPORT_ID 0x11 static int ch_report_wireless(struct hid_report *report, u8 *data, int size) { struct hid_device *hdev = report->device; struct input_dev *input; if (report->id != CH_WIRELESS_CTL_REPORT_ID || report->maxfield != 1) return 0; input = report->field[0]->hidinput->input; if (!input) { hid_warn(hdev, "can't find wireless radio control's input"); return 0; } input_report_key(input, KEY_RFKILL, 1); input_sync(input); input_report_key(input, KEY_RFKILL, 0); input_sync(input); return 1; } static int ch_raw_event(struct hid_device *hdev, struct hid_report *report, u8 *data, int size) { if (report->application == HID_GD_WIRELESS_RADIO_CTLS) return ch_report_wireless(report, data, size); return 0; } #define ch_map_key_clear(c) hid_map_usage_clear(hi, usage, bit, max, \ EV_KEY, (c)) static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_MSVENDOR) return 0; set_bit(EV_REP, hi->input->evbit); switch (usage->hid & HID_USAGE) { case 0xff01: ch_map_key_clear(BTN_1); break; case 0xff02: ch_map_key_clear(BTN_2); break; case 0xff03: ch_map_key_clear(BTN_3); break; case 0xff04: ch_map_key_clear(BTN_4); break; case 0xff05: ch_map_key_clear(BTN_5); break; case 0xff06: ch_map_key_clear(BTN_6); break; case 0xff07: ch_map_key_clear(BTN_7); break; case 0xff08: ch_map_key_clear(BTN_8); break; case 0xff09: ch_map_key_clear(BTN_9); break; case 0xff0a: ch_map_key_clear(BTN_A); break; case 0xff0b: ch_map_key_clear(BTN_B); break; case 0x00f1: ch_map_key_clear(KEY_WLAN); break; case 0x00f2: ch_map_key_clear(KEY_BRIGHTNESSDOWN); break; case 0x00f3: ch_map_key_clear(KEY_BRIGHTNESSUP); break; case 0x00f4: ch_map_key_clear(KEY_DISPLAY_OFF); break; case 0x00f7: ch_map_key_clear(KEY_CAMERA); break; case 0x00f8: ch_map_key_clear(KEY_PROG1); break; default: return 0; } return 1; } static const __u8 *ch_switch12_report_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) { struct usb_interface *intf = to_usb_interface(hdev->dev.parent); if (intf->cur_altsetting->desc.bInterfaceNumber == 1) { /* Change usage maximum and logical maximum from 0x7fff to * 0x2fff, so they don't exceed HID_MAX_USAGES */ switch (hdev->product) { case USB_DEVICE_ID_CHICONY_ACER_SWITCH12: if (*rsize >= 128 && rdesc[64] == 0xff && rdesc[65] == 0x7f && rdesc[69] == 0xff && rdesc[70] == 0x7f) { hid_info(hdev, "Fixing up report descriptor\n"); rdesc[65] = rdesc[70] = 0x2f; } break; } } return rdesc; } static int ch_probe(struct hid_device *hdev, const struct hid_device_id *id) { int ret; if (!hid_is_usb(hdev)) return -EINVAL; hdev->quirks |= HID_QUIRK_INPUT_PER_APP; ret = hid_parse(hdev); if (ret) { hid_err(hdev, "Chicony hid parse failed: %d\n", ret); return ret; } ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT); if (ret) { hid_err(hdev, "Chicony hw start failed: %d\n", ret); return ret; } return 0; } static const struct hid_device_id ch_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS3) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_ACER_SWITCH12) }, { } }; MODULE_DEVICE_TABLE(hid, ch_devices); static struct hid_driver ch_driver = { .name = "chicony", .id_table = ch_devices, .report_fixup = ch_switch12_report_fixup, .input_mapping = ch_input_mapping, .probe = ch_probe, .raw_event = ch_raw_event, }; module_hid_driver(ch_driver); MODULE_DESCRIPTION("HID driver for some chicony \"special\" devices"); MODULE_LICENSE("GPL"); |
1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | // SPDX-License-Identifier: GPL-2.0-or-later /* * drivers/net/team/team_mode_activebackup.c - Active-backup mode for team * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com> */ #include <linux/kernel.h> #include <linux/types.h> #include <linux/module.h> #include <linux/init.h> #include <linux/errno.h> #include <linux/netdevice.h> #include <net/rtnetlink.h> #include <linux/if_team.h> struct ab_priv { struct team_port __rcu *active_port; struct team_option_inst_info *ap_opt_inst_info; }; static struct ab_priv *ab_priv(struct team *team) { return (struct ab_priv *) &team->mode_priv; } static rx_handler_result_t ab_receive(struct team *team, struct team_port *port, struct sk_buff *skb) { struct team_port *active_port; active_port = rcu_dereference(ab_priv(team)->active_port); if (active_port != port) return RX_HANDLER_EXACT; return RX_HANDLER_ANOTHER; } static bool ab_transmit(struct team *team, struct sk_buff *skb) { struct team_port *active_port; active_port = rcu_dereference_bh(ab_priv(team)->active_port); if (unlikely(!active_port)) goto drop; if (team_dev_queue_xmit(team, active_port, skb)) return false; return true; drop: dev_kfree_skb_any(skb); return false; } static void ab_port_leave(struct team *team, struct team_port *port) { if (ab_priv(team)->active_port == port) { RCU_INIT_POINTER(ab_priv(team)->active_port, NULL); team_option_inst_set_change(ab_priv(team)->ap_opt_inst_info); } } static void ab_active_port_init(struct team *team, struct team_option_inst_info *info) { ab_priv(team)->ap_opt_inst_info = info; } static void ab_active_port_get(struct team *team, struct team_gsetter_ctx *ctx) { struct team_port *active_port; active_port = rcu_dereference_protected(ab_priv(team)->active_port, lockdep_is_held(&team->lock)); if (active_port) ctx->data.u32_val = active_port->dev->ifindex; else ctx->data.u32_val = 0; } static int ab_active_port_set(struct team *team, struct team_gsetter_ctx *ctx) { struct team_port *port; list_for_each_entry(port, &team->port_list, list) { if (port->dev->ifindex == ctx->data.u32_val) { rcu_assign_pointer(ab_priv(team)->active_port, port); return 0; } } return -ENOENT; } static const struct team_option ab_options[] = { { .name = "activeport", .type = TEAM_OPTION_TYPE_U32, .init = ab_active_port_init, .getter = ab_active_port_get, .setter = ab_active_port_set, }, }; static int ab_init(struct team *team) { return team_options_register(team, ab_options, ARRAY_SIZE(ab_options)); } static void ab_exit(struct team *team) { team_options_unregister(team, ab_options, ARRAY_SIZE(ab_options)); } static const struct team_mode_ops ab_mode_ops = { .init = ab_init, .exit = ab_exit, .receive = ab_receive, .transmit = ab_transmit, .port_leave = ab_port_leave, }; static const struct team_mode ab_mode = { .kind = "activebackup", .owner = THIS_MODULE, .priv_size = sizeof(struct ab_priv), .ops = &ab_mode_ops, .lag_tx_type = NETDEV_LAG_TX_TYPE_ACTIVEBACKUP, }; static int __init ab_init_module(void) { return team_mode_register(&ab_mode); } static void __exit ab_cleanup_module(void) { team_mode_unregister(&ab_mode); } module_init(ab_init_module); module_exit(ab_cleanup_module); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>"); MODULE_DESCRIPTION("Active-backup mode for team"); MODULE_ALIAS_TEAM_MODE("activebackup"); |
1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | // SPDX-License-Identifier: GPL-2.0-only /* * ebt_mark_m * * Authors: * Bart De Schuymer <bdschuym@pandora.be> * * July, 2002 * */ #include <linux/module.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter_bridge/ebtables.h> #include <linux/netfilter_bridge/ebt_mark_m.h> static bool ebt_mark_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct ebt_mark_m_info *info = par->matchinfo; if (info->bitmask & EBT_MARK_OR) return !!(skb->mark & info->mask) ^ info->invert; return ((skb->mark & info->mask) == info->mark) ^ info->invert; } static int ebt_mark_mt_check(const struct xt_mtchk_param *par) { const struct ebt_mark_m_info *info = par->matchinfo; if (info->bitmask & ~EBT_MARK_MASK) return -EINVAL; if ((info->bitmask & EBT_MARK_OR) && (info->bitmask & EBT_MARK_AND)) return -EINVAL; if (!info->bitmask) return -EINVAL; return 0; } #ifdef CONFIG_NETFILTER_XTABLES_COMPAT struct compat_ebt_mark_m_info { compat_ulong_t mark, mask; uint8_t invert, bitmask; }; static void mark_mt_compat_from_user(void *dst, const void *src) { const struct compat_ebt_mark_m_info *user = src; struct ebt_mark_m_info *kern = dst; kern->mark = user->mark; kern->mask = user->mask; kern->invert = user->invert; kern->bitmask = user->bitmask; } static int mark_mt_compat_to_user(void __user *dst, const void *src) { struct compat_ebt_mark_m_info __user *user = dst; const struct ebt_mark_m_info *kern = src; if (put_user(kern->mark, &user->mark) || put_user(kern->mask, &user->mask) || put_user(kern->invert, &user->invert) || put_user(kern->bitmask, &user->bitmask)) return -EFAULT; return 0; } #endif static struct xt_match ebt_mark_mt_reg __read_mostly = { .name = "mark_m", .revision = 0, .family = NFPROTO_BRIDGE, .match = ebt_mark_mt, .checkentry = ebt_mark_mt_check, .matchsize = sizeof(struct ebt_mark_m_info), #ifdef CONFIG_NETFILTER_XTABLES_COMPAT .compatsize = sizeof(struct compat_ebt_mark_m_info), .compat_from_user = mark_mt_compat_from_user, .compat_to_user = mark_mt_compat_to_user, #endif .me = THIS_MODULE, }; static int __init ebt_mark_m_init(void) { return xt_register_match(&ebt_mark_mt_reg); } static void __exit ebt_mark_m_fini(void) { xt_unregister_match(&ebt_mark_mt_reg); } module_init(ebt_mark_m_init); module_exit(ebt_mark_m_fini); MODULE_DESCRIPTION("Ebtables: Packet mark match"); MODULE_LICENSE("GPL"); |
48 48 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | /* * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/kernel.h> #include <linux/gfp.h> #include <linux/in.h> #include <net/tcp.h> #include <trace/events/sock.h> #include "rds.h" #include "tcp.h" void rds_tcp_keepalive(struct socket *sock) { /* values below based on xs_udp_default_timeout */ int keepidle = 5; /* send a probe 'keepidle' secs after last data */ int keepcnt = 5; /* number of unack'ed probes before declaring dead */ sock_set_keepalive(sock->sk); tcp_sock_set_keepcnt(sock->sk, keepcnt); tcp_sock_set_keepidle(sock->sk, keepidle); /* KEEPINTVL is the interval between successive probes. We follow * the model in xs_tcp_finish_connecting() and re-use keepidle. */ tcp_sock_set_keepintvl(sock->sk, keepidle); } /* rds_tcp_accept_one_path(): if accepting on cp_index > 0, make sure the * client's ipaddr < server's ipaddr. Otherwise, close the accepted * socket and force a reconneect from smaller -> larger ip addr. The reason * we special case cp_index 0 is to allow the rds probe ping itself to itself * get through efficiently. * Since reconnects are only initiated from the node with the numerically * smaller ip address, we recycle conns in RDS_CONN_ERROR on the passive side * by moving them to CONNECTING in this function. */ static struct rds_tcp_connection *rds_tcp_accept_one_path(struct rds_connection *conn) { int i; int npaths = max_t(int, 1, conn->c_npaths); /* for mprds, all paths MUST be initiated by the peer * with the smaller address. */ if (rds_addr_cmp(&conn->c_faddr, &conn->c_laddr) >= 0) { /* Make sure we initiate at least one path if this * has not already been done; rds_start_mprds() will * take care of additional paths, if necessary. */ if (npaths == 1) rds_conn_path_connect_if_down(&conn->c_path[0]); return NULL; } for (i = 0; i < npaths; i++) { struct rds_conn_path *cp = &conn->c_path[i]; if (rds_conn_path_transition(cp, RDS_CONN_DOWN, RDS_CONN_CONNECTING) || rds_conn_path_transition(cp, RDS_CONN_ERROR, RDS_CONN_CONNECTING)) { return cp->cp_transport_data; } } return NULL; } int rds_tcp_accept_one(struct socket *sock) { struct socket *new_sock = NULL; struct rds_connection *conn; int ret; struct inet_sock *inet; struct rds_tcp_connection *rs_tcp = NULL; int conn_state; struct rds_conn_path *cp; struct in6_addr *my_addr, *peer_addr; struct proto_accept_arg arg = { .flags = O_NONBLOCK, .kern = true, }; #if !IS_ENABLED(CONFIG_IPV6) struct in6_addr saddr, daddr; #endif int dev_if = 0; if (!sock) /* module unload or netns delete in progress */ return -ENETUNREACH; ret = sock_create_lite(sock->sk->sk_family, sock->sk->sk_type, sock->sk->sk_protocol, &new_sock); if (ret) goto out; ret = sock->ops->accept(sock, new_sock, &arg); if (ret < 0) goto out; /* sock_create_lite() does not get a hold on the owner module so we * need to do it here. Note that sock_release() uses sock->ops to * determine if it needs to decrement the reference count. So set * sock->ops after calling accept() in case that fails. And there's * no need to do try_module_get() as the listener should have a hold * already. */ new_sock->ops = sock->ops; __module_get(new_sock->ops->owner); rds_tcp_keepalive(new_sock); if (!rds_tcp_tune(new_sock)) { ret = -EINVAL; goto out; } inet = inet_sk(new_sock->sk); #if IS_ENABLED(CONFIG_IPV6) my_addr = &new_sock->sk->sk_v6_rcv_saddr; peer_addr = &new_sock->sk->sk_v6_daddr; #else ipv6_addr_set_v4mapped(inet->inet_saddr, &saddr); ipv6_addr_set_v4mapped(inet->inet_daddr, &daddr); my_addr = &saddr; peer_addr = &daddr; #endif rdsdebug("accepted family %d tcp %pI6c:%u -> %pI6c:%u\n", sock->sk->sk_family, my_addr, ntohs(inet->inet_sport), peer_addr, ntohs(inet->inet_dport)); #if IS_ENABLED(CONFIG_IPV6) /* sk_bound_dev_if is not set if the peer address is not link local * address. In this case, it happens that mcast_oif is set. So * just use it. */ if ((ipv6_addr_type(my_addr) & IPV6_ADDR_LINKLOCAL) && !(ipv6_addr_type(peer_addr) & IPV6_ADDR_LINKLOCAL)) { struct ipv6_pinfo *inet6; inet6 = inet6_sk(new_sock->sk); dev_if = READ_ONCE(inet6->mcast_oif); } else { dev_if = new_sock->sk->sk_bound_dev_if; } #endif if (!rds_tcp_laddr_check(sock_net(sock->sk), peer_addr, dev_if)) { /* local address connection is only allowed via loopback */ ret = -EOPNOTSUPP; goto out; } conn = rds_conn_create(sock_net(sock->sk), my_addr, peer_addr, &rds_tcp_transport, 0, GFP_KERNEL, dev_if); if (IS_ERR(conn)) { ret = PTR_ERR(conn); goto out; } /* An incoming SYN request came in, and TCP just accepted it. * * If the client reboots, this conn will need to be cleaned up. * rds_tcp_state_change() will do that cleanup */ rs_tcp = rds_tcp_accept_one_path(conn); if (!rs_tcp) goto rst_nsk; mutex_lock(&rs_tcp->t_conn_path_lock); cp = rs_tcp->t_cpath; conn_state = rds_conn_path_state(cp); WARN_ON(conn_state == RDS_CONN_UP); if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_ERROR) goto rst_nsk; if (rs_tcp->t_sock) { /* Duelling SYN has been handled in rds_tcp_accept_one() */ rds_tcp_reset_callbacks(new_sock, cp); /* rds_connect_path_complete() marks RDS_CONN_UP */ rds_connect_path_complete(cp, RDS_CONN_RESETTING); } else { rds_tcp_set_callbacks(new_sock, cp); rds_connect_path_complete(cp, RDS_CONN_CONNECTING); } new_sock = NULL; ret = 0; if (conn->c_npaths == 0) rds_send_ping(cp->cp_conn, cp->cp_index); goto out; rst_nsk: /* reset the newly returned accept sock and bail. * It is safe to set linger on new_sock because the RDS connection * has not been brought up on new_sock, so no RDS-level data could * be pending on it. By setting linger, we achieve the side-effect * of avoiding TIME_WAIT state on new_sock. */ sock_no_linger(new_sock->sk); kernel_sock_shutdown(new_sock, SHUT_RDWR); ret = 0; out: if (rs_tcp) mutex_unlock(&rs_tcp->t_conn_path_lock); if (new_sock) sock_release(new_sock); return ret; } void rds_tcp_listen_data_ready(struct sock *sk) { void (*ready)(struct sock *sk); trace_sk_data_ready(sk); rdsdebug("listen data ready sk %p\n", sk); read_lock_bh(&sk->sk_callback_lock); ready = sk->sk_user_data; if (!ready) { /* check for teardown race */ ready = sk->sk_data_ready; goto out; } /* * ->sk_data_ready is also called for a newly established child socket * before it has been accepted and the accepter has set up their * data_ready.. we only want to queue listen work for our listening * socket * * (*ready)() may be null if we are racing with netns delete, and * the listen socket is being torn down. */ if (sk->sk_state == TCP_LISTEN) rds_tcp_accept_work(sk); else ready = rds_tcp_listen_sock_def_readable(sock_net(sk)); out: read_unlock_bh(&sk->sk_callback_lock); if (ready) ready(sk); } struct socket *rds_tcp_listen_init(struct net *net, bool isv6) { struct socket *sock = NULL; struct sockaddr_storage ss; struct sockaddr_in6 *sin6; struct sockaddr_in *sin; int addr_len; int ret; ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock); if (ret < 0) { rdsdebug("could not create %s listener socket: %d\n", isv6 ? "IPv6" : "IPv4", ret); goto out; } sock->sk->sk_reuse = SK_CAN_REUSE; tcp_sock_set_nodelay(sock->sk); write_lock_bh(&sock->sk->sk_callback_lock); sock->sk->sk_user_data = sock->sk->sk_data_ready; sock->sk->sk_data_ready = rds_tcp_listen_data_ready; write_unlock_bh(&sock->sk->sk_callback_lock); if (isv6) { sin6 = (struct sockaddr_in6 *)&ss; sin6->sin6_family = PF_INET6; sin6->sin6_addr = in6addr_any; sin6->sin6_port = (__force u16)htons(RDS_TCP_PORT); sin6->sin6_scope_id = 0; sin6->sin6_flowinfo = 0; addr_len = sizeof(*sin6); } else { sin = (struct sockaddr_in *)&ss; sin->sin_family = PF_INET; sin->sin_addr.s_addr = INADDR_ANY; sin->sin_port = (__force u16)htons(RDS_TCP_PORT); addr_len = sizeof(*sin); } ret = kernel_bind(sock, (struct sockaddr *)&ss, addr_len); if (ret < 0) { rdsdebug("could not bind %s listener socket: %d\n", isv6 ? "IPv6" : "IPv4", ret); goto out; } ret = sock->ops->listen(sock, 64); if (ret < 0) goto out; return sock; out: if (sock) sock_release(sock); return NULL; } void rds_tcp_listen_stop(struct socket *sock, struct work_struct *acceptor) { struct sock *sk; if (!sock) return; sk = sock->sk; /* serialize with and prevent further callbacks */ lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); if (sk->sk_user_data) { sk->sk_data_ready = sk->sk_user_data; sk->sk_user_data = NULL; } write_unlock_bh(&sk->sk_callback_lock); release_sock(sk); /* wait for accepts to stop and close the socket */ flush_workqueue(rds_wq); flush_work(acceptor); sock_release(sock); } |
270 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 1994 Linus Torvalds * * Pentium III FXSR, SSE support * General FPU state handling cleanups * Gareth Hughes <gareth@valinux.com>, May 2000 * x86-64 work by Andi Kleen 2002 */ #ifndef _ASM_X86_FPU_API_H #define _ASM_X86_FPU_API_H #include <linux/bottom_half.h> #include <asm/fpu/types.h> /* * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It * disables preemption so be careful if you intend to use it for long periods * of time. * If you intend to use the FPU in irq/softirq you need to check first with * irq_fpu_usable() if it is possible. */ /* Kernel FPU states to initialize in kernel_fpu_begin_mask() */ #define KFPU_387 _BITUL(0) /* 387 state will be initialized */ #define KFPU_MXCSR _BITUL(1) /* MXCSR will be initialized */ extern void kernel_fpu_begin_mask(unsigned int kfpu_mask); extern void kernel_fpu_end(void); extern bool irq_fpu_usable(void); extern void fpregs_mark_activate(void); /* Code that is unaware of kernel_fpu_begin_mask() can use this */ static inline void kernel_fpu_begin(void) { #ifdef CONFIG_X86_64 /* * Any 64-bit code that uses 387 instructions must explicitly request * KFPU_387. */ kernel_fpu_begin_mask(KFPU_MXCSR); #else /* * 32-bit kernel code may use 387 operations as well as SSE2, etc, * as long as it checks that the CPU has the required capability. */ kernel_fpu_begin_mask(KFPU_387 | KFPU_MXCSR); #endif } /* * Use fpregs_lock() while editing CPU's FPU registers or fpu->fpstate. * A context switch will (and softirq might) save CPU's FPU registers to * fpu->fpstate.regs and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in * a random state. * * local_bh_disable() protects against both preemption and soft interrupts * on !RT kernels. * * On RT kernels local_bh_disable() is not sufficient because it only * serializes soft interrupt related sections via a local lock, but stays * preemptible. Disabling preemption is the right choice here as bottom * half processing is always in thread context on RT kernels so it * implicitly prevents bottom half processing as well. * * Disabling preemption also serializes against kernel_fpu_begin(). */ static inline void fpregs_lock(void) { if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_bh_disable(); else preempt_disable(); } static inline void fpregs_unlock(void) { if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_bh_enable(); else preempt_enable(); } /* * FPU state gets lazily restored before returning to userspace. So when in the * kernel, the valid FPU state may be kept in the buffer. This function will force * restore all the fpu state to the registers early if needed, and lock them from * being automatically saved/restored. Then FPU state can be modified safely in the * registers, before unlocking with fpregs_unlock(). */ void fpregs_lock_and_load(void); #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else static inline void fpregs_assert_state_consistent(void) { } #endif /* * Load the task FPU state before returning to userspace. */ extern void switch_fpu_return(void); /* * Query the presence of one or more xfeatures. Works on any legacy CPU as well. * * If 'feature_name' is set then put a human-readable description of * the feature there as well - this can be used to print error (or success) * messages. */ extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name); /* Trap handling */ extern int fpu__exception_code(struct fpu *fpu, int trap_nr); extern void fpu_sync_fpstate(struct fpu *fpu); extern void fpu_reset_from_exception_fixup(void); /* Boot, hotplug and resume */ extern void fpu__init_cpu(void); extern void fpu__init_system(void); extern void fpu__init_check_bugs(void); extern void fpu__resume_cpu(void); #ifdef CONFIG_MATH_EMULATION extern void fpstate_init_soft(struct swregs_state *soft); #else static inline void fpstate_init_soft(struct swregs_state *soft) {} #endif /* State tracking */ DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); /* Process cleanup */ #ifdef CONFIG_X86_64 extern void fpstate_free(struct fpu *fpu); #else static inline void fpstate_free(struct fpu *fpu) { } #endif /* fpstate-related functions which are exported to KVM */ extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature); extern u64 xstate_get_guest_group_perm(void); extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); /* KVM specific functions */ extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu); extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu); extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest); extern int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64 xfeatures); #ifdef CONFIG_X86_64 extern void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd); extern void fpu_sync_guest_vmexit_xfd_state(void); #else static inline void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) { } static inline void fpu_sync_guest_vmexit_xfd_state(void) { } #endif extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u64 xfeatures, u32 pkru); extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru); static inline void fpstate_set_confidential(struct fpu_guest *gfpu) { gfpu->fpstate->is_confidential = true; } static inline bool fpstate_is_confidential(struct fpu_guest *gfpu) { return gfpu->fpstate->is_confidential; } /* prctl */ extern long fpu_xstate_prctl(int option, unsigned long arg2); extern void fpu_idle_fpregs(void); #endif /* _ASM_X86_FPU_API_H */ |
1 5 5 2 3 48 48 7 1 5 5 3 4 5 1 2 2 2 3 4 1 3 3 3 2 1 1 7 7 6 5 5 4 1 48 48 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2015 Nicira, Inc. */ #include <linux/module.h> #include <linux/openvswitch.h> #include <linux/tcp.h> #include <linux/udp.h> #include <linux/sctp.h> #include <linux/static_key.h> #include <linux/string_helpers.h> #include <net/ip.h> #include <net/genetlink.h> #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_conntrack_count.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_labels.h> #include <net/netfilter/nf_conntrack_seqadj.h> #include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_conntrack_zones.h> #include <net/netfilter/ipv6/nf_defrag_ipv6.h> #include <net/ipv6_frag.h> #if IS_ENABLED(CONFIG_NF_NAT) #include <net/netfilter/nf_nat.h> #endif #include <net/netfilter/nf_conntrack_act_ct.h> #include "datapath.h" #include "drop.h" #include "conntrack.h" #include "flow.h" #include "flow_netlink.h" struct ovs_ct_len_tbl { int maxlen; int minlen; }; /* Metadata mark for masked write to conntrack mark */ struct md_mark { u32 value; u32 mask; }; /* Metadata label for masked write to conntrack label. */ struct md_labels { struct ovs_key_ct_labels value; struct ovs_key_ct_labels mask; }; enum ovs_ct_nat { OVS_CT_NAT = 1 << 0, /* NAT for committed connections only. */ OVS_CT_SRC_NAT = 1 << 1, /* Source NAT for NEW connections. */ OVS_CT_DST_NAT = 1 << 2, /* Destination NAT for NEW connections. */ }; /* Conntrack action context for execution. */ struct ovs_conntrack_info { struct nf_conntrack_helper *helper; struct nf_conntrack_zone zone; struct nf_conn *ct; u8 commit : 1; u8 nat : 3; /* enum ovs_ct_nat */ u8 force : 1; u8 have_eventmask : 1; u16 family; u32 eventmask; /* Mask of 1 << IPCT_*. */ struct md_mark mark; struct md_labels labels; char timeout[CTNL_TIMEOUT_NAME_MAX]; struct nf_ct_timeout *nf_ct_timeout; #if IS_ENABLED(CONFIG_NF_NAT) struct nf_nat_range2 range; /* Only present for SRC NAT and DST NAT. */ #endif }; #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) #define OVS_CT_LIMIT_UNLIMITED 0 #define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED #define CT_LIMIT_HASH_BUCKETS 512 static DEFINE_STATIC_KEY_FALSE(ovs_ct_limit_enabled); struct ovs_ct_limit { /* Elements in ovs_ct_limit_info->limits hash table */ struct hlist_node hlist_node; struct rcu_head rcu; u16 zone; u32 limit; }; struct ovs_ct_limit_info { u32 default_limit; struct hlist_head *limits; struct nf_conncount_data *data; }; static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = { [OVS_CT_LIMIT_ATTR_ZONE_LIMIT] = { .type = NLA_NESTED, }, }; #endif static bool labels_nonzero(const struct ovs_key_ct_labels *labels); static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info); static u16 key_to_nfproto(const struct sw_flow_key *key) { switch (ntohs(key->eth.type)) { case ETH_P_IP: return NFPROTO_IPV4; case ETH_P_IPV6: return NFPROTO_IPV6; default: return NFPROTO_UNSPEC; } } /* Map SKB connection state into the values used by flow definition. */ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) { u8 ct_state = OVS_CS_F_TRACKED; switch (ctinfo) { case IP_CT_ESTABLISHED_REPLY: case IP_CT_RELATED_REPLY: ct_state |= OVS_CS_F_REPLY_DIR; break; default: break; } switch (ctinfo) { case IP_CT_ESTABLISHED: case IP_CT_ESTABLISHED_REPLY: ct_state |= OVS_CS_F_ESTABLISHED; break; case IP_CT_RELATED: case IP_CT_RELATED_REPLY: ct_state |= OVS_CS_F_RELATED; break; case IP_CT_NEW: ct_state |= OVS_CS_F_NEW; break; default: break; } return ct_state; } static u32 ovs_ct_get_mark(const struct nf_conn *ct) { #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) return ct ? READ_ONCE(ct->mark) : 0; #else return 0; #endif } /* Guard against conntrack labels max size shrinking below 128 bits. */ #if NF_CT_LABELS_MAX_SIZE < 16 #error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes #endif static void ovs_ct_get_labels(const struct nf_conn *ct, struct ovs_key_ct_labels *labels) { struct nf_conn_labels *cl = NULL; if (ct) { if (ct->master && !nf_ct_is_confirmed(ct)) ct = ct->master; cl = nf_ct_labels_find(ct); } if (cl) memcpy(labels, cl->bits, OVS_CT_LABELS_LEN); else memset(labels, 0, OVS_CT_LABELS_LEN); } static void __ovs_ct_update_key_orig_tp(struct sw_flow_key *key, const struct nf_conntrack_tuple *orig, u8 icmp_proto) { key->ct_orig_proto = orig->dst.protonum; if (orig->dst.protonum == icmp_proto) { key->ct.orig_tp.src = htons(orig->dst.u.icmp.type); key->ct.orig_tp.dst = htons(orig->dst.u.icmp.code); } else { key->ct.orig_tp.src = orig->src.u.all; key->ct.orig_tp.dst = orig->dst.u.all; } } static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, const struct nf_conntrack_zone *zone, const struct nf_conn *ct) { key->ct_state = state; key->ct_zone = zone->id; key->ct.mark = ovs_ct_get_mark(ct); ovs_ct_get_labels(ct, &key->ct.labels); if (ct) { const struct nf_conntrack_tuple *orig; /* Use the master if we have one. */ if (ct->master) ct = ct->master; orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; /* IP version must match with the master connection. */ if (key->eth.type == htons(ETH_P_IP) && nf_ct_l3num(ct) == NFPROTO_IPV4) { key->ipv4.ct_orig.src = orig->src.u3.ip; key->ipv4.ct_orig.dst = orig->dst.u3.ip; __ovs_ct_update_key_orig_tp(key, orig, IPPROTO_ICMP); return; } else if (key->eth.type == htons(ETH_P_IPV6) && !sw_flow_key_is_nd(key) && nf_ct_l3num(ct) == NFPROTO_IPV6) { key->ipv6.ct_orig.src = orig->src.u3.in6; key->ipv6.ct_orig.dst = orig->dst.u3.in6; __ovs_ct_update_key_orig_tp(key, orig, NEXTHDR_ICMP); return; } } /* Clear 'ct_orig_proto' to mark the non-existence of conntrack * original direction key fields. */ key->ct_orig_proto = 0; } /* Update 'key' based on skb->_nfct. If 'post_ct' is true, then OVS has * previously sent the packet to conntrack via the ct action. If * 'keep_nat_flags' is true, the existing NAT flags retained, else they are * initialized from the connection status. */ static void ovs_ct_update_key(const struct sk_buff *skb, const struct ovs_conntrack_info *info, struct sw_flow_key *key, bool post_ct, bool keep_nat_flags) { const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt; enum ip_conntrack_info ctinfo; struct nf_conn *ct; u8 state = 0; ct = nf_ct_get(skb, &ctinfo); if (ct) { state = ovs_ct_get_state(ctinfo); /* All unconfirmed entries are NEW connections. */ if (!nf_ct_is_confirmed(ct)) state |= OVS_CS_F_NEW; /* OVS persists the related flag for the duration of the * connection. */ if (ct->master) state |= OVS_CS_F_RELATED; if (keep_nat_flags) { state |= key->ct_state & OVS_CS_F_NAT_MASK; } else { if (ct->status & IPS_SRC_NAT) state |= OVS_CS_F_SRC_NAT; if (ct->status & IPS_DST_NAT) state |= OVS_CS_F_DST_NAT; } zone = nf_ct_zone(ct); } else if (post_ct) { state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID; if (info) zone = &info->zone; } __ovs_ct_update_key(key, state, zone, ct); } /* This is called to initialize CT key fields possibly coming in from the local * stack. */ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key, bool post_ct) { ovs_ct_update_key(skb, NULL, key, post_ct, false); } int ovs_ct_put_key(const struct sw_flow_key *swkey, const struct sw_flow_key *output, struct sk_buff *skb) { if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, output->ct_state)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct_zone)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, output->ct.mark)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && nla_put(skb, OVS_KEY_ATTR_CT_LABELS, sizeof(output->ct.labels), &output->ct.labels)) return -EMSGSIZE; if (swkey->ct_orig_proto) { if (swkey->eth.type == htons(ETH_P_IP)) { struct ovs_key_ct_tuple_ipv4 orig; memset(&orig, 0, sizeof(orig)); orig.ipv4_src = output->ipv4.ct_orig.src; orig.ipv4_dst = output->ipv4.ct_orig.dst; orig.src_port = output->ct.orig_tp.src; orig.dst_port = output->ct.orig_tp.dst; orig.ipv4_proto = output->ct_orig_proto; if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, sizeof(orig), &orig)) return -EMSGSIZE; } else if (swkey->eth.type == htons(ETH_P_IPV6)) { struct ovs_key_ct_tuple_ipv6 orig; memset(&orig, 0, sizeof(orig)); memcpy(orig.ipv6_src, output->ipv6.ct_orig.src.s6_addr32, sizeof(orig.ipv6_src)); memcpy(orig.ipv6_dst, output->ipv6.ct_orig.dst.s6_addr32, sizeof(orig.ipv6_dst)); orig.src_port = output->ct.orig_tp.src; orig.dst_port = output->ct.orig_tp.dst; orig.ipv6_proto = output->ct_orig_proto; if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, sizeof(orig), &orig)) return -EMSGSIZE; } } return 0; } static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key, u32 ct_mark, u32 mask) { #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) u32 new_mark; new_mark = ct_mark | (READ_ONCE(ct->mark) & ~(mask)); if (READ_ONCE(ct->mark) != new_mark) { WRITE_ONCE(ct->mark, new_mark); if (nf_ct_is_confirmed(ct)) nf_conntrack_event_cache(IPCT_MARK, ct); key->ct.mark = new_mark; } return 0; #else return -ENOTSUPP; #endif } static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct) { struct nf_conn_labels *cl; cl = nf_ct_labels_find(ct); if (!cl) { nf_ct_labels_ext_add(ct); cl = nf_ct_labels_find(ct); } return cl; } /* Initialize labels for a new, yet to be committed conntrack entry. Note that * since the new connection is not yet confirmed, and thus no-one else has * access to it's labels, we simply write them over. */ static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key, const struct ovs_key_ct_labels *labels, const struct ovs_key_ct_labels *mask) { struct nf_conn_labels *cl, *master_cl; bool have_mask = labels_nonzero(mask); /* Inherit master's labels to the related connection? */ master_cl = ct->master ? nf_ct_labels_find(ct->master) : NULL; if (!master_cl && !have_mask) return 0; /* Nothing to do. */ cl = ovs_ct_get_conn_labels(ct); if (!cl) return -ENOSPC; /* Inherit the master's labels, if any. */ if (master_cl) *cl = *master_cl; if (have_mask) { u32 *dst = (u32 *)cl->bits; int i; for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | (labels->ct_labels_32[i] & mask->ct_labels_32[i]); } /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the * IPCT_LABEL bit is set in the event cache. */ nf_conntrack_event_cache(IPCT_LABEL, ct); memcpy(&key->ct.labels, cl->bits, OVS_CT_LABELS_LEN); return 0; } static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, const struct ovs_key_ct_labels *labels, const struct ovs_key_ct_labels *mask) { struct nf_conn_labels *cl; int err; cl = ovs_ct_get_conn_labels(ct); if (!cl) return -ENOSPC; err = nf_connlabels_replace(ct, labels->ct_labels_32, mask->ct_labels_32, OVS_CT_LABELS_LEN_32); if (err) return err; memcpy(&key->ct.labels, cl->bits, OVS_CT_LABELS_LEN); return 0; } static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, u16 zone, int family, struct sk_buff *skb) { struct ovs_skb_cb ovs_cb = *OVS_CB(skb); int err; err = nf_ct_handle_fragments(net, skb, zone, family, &key->ip.proto, &ovs_cb.mru); if (err) return err; /* The key extracted from the fragment that completed this datagram * likely didn't have an L4 header, so regenerate it. */ ovs_flow_key_update_l3l4(skb, key); key->ip.frag = OVS_FRAG_TYPE_NONE; *OVS_CB(skb) = ovs_cb; return 0; } /* This replicates logic from nf_conntrack_core.c that is not exported. */ static enum ip_conntrack_info ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) { const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) return IP_CT_ESTABLISHED_REPLY; /* Once we've had two way comms, always ESTABLISHED. */ if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) return IP_CT_ESTABLISHED; if (test_bit(IPS_EXPECTED_BIT, &ct->status)) return IP_CT_RELATED; return IP_CT_NEW; } /* Find an existing connection which this packet belongs to without * re-attributing statistics or modifying the connection state. This allows an * skb->_nfct lost due to an upcall to be recovered during actions execution. * * Must be called with rcu_read_lock. * * On success, populates skb->_nfct and returns the connection. Returns NULL * if there is no existing entry. */ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, u8 l3num, struct sk_buff *skb, bool natted) { struct nf_conntrack_tuple tuple; struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), l3num, net, &tuple)) { pr_debug("ovs_ct_find_existing: Can't get tuple\n"); return NULL; } /* Must invert the tuple if skb has been transformed by NAT. */ if (natted) { struct nf_conntrack_tuple inverse; if (!nf_ct_invert_tuple(&inverse, &tuple)) { pr_debug("ovs_ct_find_existing: Inversion failed!\n"); return NULL; } tuple = inverse; } /* look for tuple match */ h = nf_conntrack_find_get(net, zone, &tuple); if (!h) return NULL; /* Not found. */ ct = nf_ct_tuplehash_to_ctrack(h); /* Inverted packet tuple matches the reverse direction conntrack tuple, * select the other tuplehash to get the right 'ctinfo' bits for this * packet. */ if (natted) h = &ct->tuplehash[!h->tuple.dst.dir]; nf_ct_set(skb, ct, ovs_ct_get_info(h)); return ct; } static struct nf_conn *ovs_ct_executed(struct net *net, const struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb, bool *ct_executed) { struct nf_conn *ct = NULL; /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct * due to an upcall, or if the direction is being forced. If the * connection was not confirmed, it is not cached and needs to be run * through conntrack again. */ *ct_executed = (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && (key->ct_zone == info->zone.id); if (*ct_executed || (!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); } return ct; } /* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool skb_nfct_cached(struct net *net, const struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; bool ct_executed = true; ct = nf_ct_get(skb, &ctinfo); if (!ct) ct = ovs_ct_executed(net, key, info, skb, &ct_executed); if (ct) nf_ct_get(skb, &ctinfo); else return false; if (!net_eq(net, read_pnet(&ct->ct_net))) return false; if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct))) return false; if (info->helper) { struct nf_conn_help *help; help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); if (help && rcu_access_pointer(help->helper) != info->helper) return false; } if (info->nf_ct_timeout) { struct nf_conn_timeout *timeout_ext; timeout_ext = nf_ct_timeout_find(ct); if (!timeout_ext || info->nf_ct_timeout != rcu_dereference(timeout_ext->timeout)) return false; } /* Force conntrack entry direction to the current packet? */ if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { /* Delete the conntrack entry if confirmed, else just release * the reference. */ if (nf_ct_is_confirmed(ct)) nf_ct_delete(ct, 0, 0); nf_ct_put(ct); nf_ct_set(skb, NULL, 0); return false; } return ct_executed; } #if IS_ENABLED(CONFIG_NF_NAT) static void ovs_nat_update_key(struct sw_flow_key *key, const struct sk_buff *skb, enum nf_nat_manip_type maniptype) { if (maniptype == NF_NAT_MANIP_SRC) { __be16 src; key->ct_state |= OVS_CS_F_SRC_NAT; if (key->eth.type == htons(ETH_P_IP)) key->ipv4.addr.src = ip_hdr(skb)->saddr; else if (key->eth.type == htons(ETH_P_IPV6)) memcpy(&key->ipv6.addr.src, &ipv6_hdr(skb)->saddr, sizeof(key->ipv6.addr.src)); else return; if (key->ip.proto == IPPROTO_UDP) src = udp_hdr(skb)->source; else if (key->ip.proto == IPPROTO_TCP) src = tcp_hdr(skb)->source; else if (key->ip.proto == IPPROTO_SCTP) src = sctp_hdr(skb)->source; else return; key->tp.src = src; } else { __be16 dst; key->ct_state |= OVS_CS_F_DST_NAT; if (key->eth.type == htons(ETH_P_IP)) key->ipv4.addr.dst = ip_hdr(skb)->daddr; else if (key->eth.type == htons(ETH_P_IPV6)) memcpy(&key->ipv6.addr.dst, &ipv6_hdr(skb)->daddr, sizeof(key->ipv6.addr.dst)); else return; if (key->ip.proto == IPPROTO_UDP) dst = udp_hdr(skb)->dest; else if (key->ip.proto == IPPROTO_TCP) dst = tcp_hdr(skb)->dest; else if (key->ip.proto == IPPROTO_SCTP) dst = sctp_hdr(skb)->dest; else return; key->tp.dst = dst; } } /* Returns NF_DROP if the packet should be dropped, NF_ACCEPT otherwise. */ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { int err, action = 0; if (!(info->nat & OVS_CT_NAT)) return NF_ACCEPT; if (info->nat & OVS_CT_SRC_NAT) action |= BIT(NF_NAT_MANIP_SRC); if (info->nat & OVS_CT_DST_NAT) action |= BIT(NF_NAT_MANIP_DST); err = nf_ct_nat(skb, ct, ctinfo, &action, &info->range, info->commit); if (err != NF_ACCEPT) return err; if (action & BIT(NF_NAT_MANIP_SRC)) ovs_nat_update_key(key, skb, NF_NAT_MANIP_SRC); if (action & BIT(NF_NAT_MANIP_DST)) ovs_nat_update_key(key, skb, NF_NAT_MANIP_DST); return err; } #else /* !CONFIG_NF_NAT */ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { return NF_ACCEPT; } #endif static int verdict_to_errno(unsigned int verdict) { switch (verdict & NF_VERDICT_MASK) { case NF_ACCEPT: return 0; case NF_DROP: return -EINVAL; case NF_STOLEN: return -EINPROGRESS; default: break; } return -EINVAL; } /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if * not done already. Update key with new CT state after passing the packet * through conntrack. * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be * set to NULL and 0 will be returned. */ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { /* If we are recirculating packets to match on conntrack fields and * committing with a separate conntrack action, then we don't need to * actually run the packet through conntrack twice unless it's for a * different zone. */ bool cached = skb_nfct_cached(net, key, info, skb); enum ip_conntrack_info ctinfo; struct nf_conn *ct; if (!cached) { struct nf_hook_state state = { .hook = NF_INET_PRE_ROUTING, .pf = info->family, .net = net, }; struct nf_conn *tmpl = info->ct; int err; /* Associate skb with specified zone. */ if (tmpl) { ct = nf_ct_get(skb, &ctinfo); nf_ct_put(ct); nf_conntrack_get(&tmpl->ct_general); nf_ct_set(skb, tmpl, IP_CT_NEW); } err = nf_conntrack_in(skb, &state); if (err != NF_ACCEPT) return verdict_to_errno(err); /* Clear CT state NAT flags to mark that we have not yet done * NAT after the nf_conntrack_in() call. We can actually clear * the whole state, as it will be re-initialized below. */ key->ct_state = 0; /* Update the key, but keep the NAT flags. */ ovs_ct_update_key(skb, info, key, true, true); } ct = nf_ct_get(skb, &ctinfo); if (ct) { bool add_helper = false; /* Packets starting a new connection must be NATted before the * helper, so that the helper knows about the NAT. We enforce * this by delaying both NAT and helper calls for unconfirmed * connections until the committing CT action. For later * packets NAT and Helper may be called in either order. * * NAT will be done only if the CT action has NAT, and only * once per packet (per zone), as guarded by the NAT bits in * the key->ct_state. */ if (info->nat && !(key->ct_state & OVS_CS_F_NAT_MASK) && (nf_ct_is_confirmed(ct) || info->commit)) { int err = ovs_ct_nat(net, key, info, skb, ct, ctinfo); err = verdict_to_errno(err); if (err) return err; } /* Userspace may decide to perform a ct lookup without a helper * specified followed by a (recirculate and) commit with one, * or attach a helper in a later commit. Therefore, for * connections which we will commit, we may need to attach * the helper here. */ if (!nf_ct_is_confirmed(ct) && info->commit && info->helper && !nfct_help(ct)) { int err = __nf_ct_try_assign_helper(ct, info->ct, GFP_ATOMIC); if (err) return err; add_helper = true; /* helper installed, add seqadj if NAT is required */ if (info->nat && !nfct_seqadj(ct)) { if (!nfct_seqadj_ext_add(ct)) return -EINVAL; } } /* Call the helper only if: * - nf_conntrack_in() was executed above ("!cached") or a * helper was just attached ("add_helper") for a confirmed * connection, or * - When committing an unconfirmed connection. */ if ((nf_ct_is_confirmed(ct) ? !cached || add_helper : info->commit)) { int err = nf_ct_helper(skb, ct, ctinfo, info->family); err = verdict_to_errno(err); if (err) return err; } if (nf_ct_protonum(ct) == IPPROTO_TCP && nf_ct_is_confirmed(ct) && nf_conntrack_tcp_established(ct)) { /* Be liberal for tcp packets so that out-of-window * packets are not marked invalid. */ nf_ct_set_tcp_be_liberal(ct); } nf_conn_act_ct_ext_fill(skb, ct, ctinfo); } return 0; } /* Lookup connection and read fields into key. */ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { struct nf_conn *ct; int err; err = __ovs_ct_lookup(net, key, info, skb); if (err) return err; ct = (struct nf_conn *)skb_nfct(skb); if (ct) nf_ct_deliver_cached_events(ct); return 0; } static bool labels_nonzero(const struct ovs_key_ct_labels *labels) { size_t i; for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) if (labels->ct_labels_32[i]) return true; return false; } #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) static struct hlist_head *ct_limit_hash_bucket( const struct ovs_ct_limit_info *info, u16 zone) { return &info->limits[zone & (CT_LIMIT_HASH_BUCKETS - 1)]; } /* Call with ovs_mutex */ static void ct_limit_set(const struct ovs_ct_limit_info *info, struct ovs_ct_limit *new_ct_limit) { struct ovs_ct_limit *ct_limit; struct hlist_head *head; head = ct_limit_hash_bucket(info, new_ct_limit->zone); hlist_for_each_entry_rcu(ct_limit, head, hlist_node) { if (ct_limit->zone == new_ct_limit->zone) { hlist_replace_rcu(&ct_limit->hlist_node, &new_ct_limit->hlist_node); kfree_rcu(ct_limit, rcu); return; } } hlist_add_head_rcu(&new_ct_limit->hlist_node, head); } /* Call with ovs_mutex */ static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone) { struct ovs_ct_limit *ct_limit; struct hlist_head *head; struct hlist_node *n; head = ct_limit_hash_bucket(info, zone); hlist_for_each_entry_safe(ct_limit, n, head, hlist_node) { if (ct_limit->zone == zone) { hlist_del_rcu(&ct_limit->hlist_node); kfree_rcu(ct_limit, rcu); return; } } } /* Call with RCU read lock */ static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone) { struct ovs_ct_limit *ct_limit; struct hlist_head *head; head = ct_limit_hash_bucket(info, zone); hlist_for_each_entry_rcu(ct_limit, head, hlist_node) { if (ct_limit->zone == zone) return ct_limit->limit; } return info->default_limit; } static int ovs_ct_check_limit(struct net *net, const struct ovs_conntrack_info *info, const struct nf_conntrack_tuple *tuple) { struct ovs_net *ovs_net = net_generic(net, ovs_net_id); const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info; u32 per_zone_limit, connections; u32 conncount_key; conncount_key = info->zone.id; per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id); if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) return 0; connections = nf_conncount_count(net, ct_limit_info->data, &conncount_key, tuple, &info->zone); if (connections > per_zone_limit) return -ENOMEM; return 0; } #endif /* Lookup connection and confirm if unconfirmed. */ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; int err; err = __ovs_ct_lookup(net, key, info, skb); if (err) return err; /* The connection could be invalid, in which case this is a no-op.*/ ct = nf_ct_get(skb, &ctinfo); if (!ct) return 0; #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) if (static_branch_unlikely(&ovs_ct_limit_enabled)) { if (!nf_ct_is_confirmed(ct)) { err = ovs_ct_check_limit(net, info, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); if (err) { net_warn_ratelimited("openvswitch: zone: %u " "exceeds conntrack limit\n", info->zone.id); return err; } } } #endif /* Set the conntrack event mask if given. NEW and DELETE events have * their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener * typically would receive many kinds of updates. Setting the event * mask allows those events to be filtered. The set event mask will * remain in effect for the lifetime of the connection unless changed * by a further CT action with both the commit flag and the eventmask * option. */ if (info->have_eventmask) { struct nf_conntrack_ecache *cache = nf_ct_ecache_find(ct); if (cache) cache->ctmask = info->eventmask; } /* Apply changes before confirming the connection so that the initial * conntrack NEW netlink event carries the values given in the CT * action. */ if (info->mark.mask) { err = ovs_ct_set_mark(ct, key, info->mark.value, info->mark.mask); if (err) return err; } if (!nf_ct_is_confirmed(ct)) { err = ovs_ct_init_labels(ct, key, &info->labels.value, &info->labels.mask); if (err) return err; nf_conn_act_ct_ext_add(skb, ct, ctinfo); } else if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && labels_nonzero(&info->labels.mask)) { err = ovs_ct_set_labels(ct, key, &info->labels.value, &info->labels.mask); if (err) return err; } /* This will take care of sending queued events even if the connection * is already confirmed. */ err = nf_conntrack_confirm(skb); return verdict_to_errno(err); } /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ int ovs_ct_execute(struct net *net, struct sk_buff *skb, struct sw_flow_key *key, const struct ovs_conntrack_info *info) { int nh_ofs; int err; /* The conntrack module expects to be working at L3. */ nh_ofs = skb_network_offset(skb); skb_pull_rcsum(skb, nh_ofs); err = nf_ct_skb_network_trim(skb, info->family); if (err) { kfree_skb(skb); return err; } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { err = ovs_ct_handle_fragments(net, key, info->zone.id, info->family, skb); if (err) return err; } if (info->commit) err = ovs_ct_commit(net, key, info, skb); else err = ovs_ct_lookup(net, key, info, skb); /* conntrack core returned NF_STOLEN */ if (err == -EINPROGRESS) return err; skb_push_rcsum(skb, nh_ofs); if (err) ovs_kfree_skb_reason(skb, OVS_DROP_CONNTRACK); return err; } int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); nf_ct_put(ct); nf_ct_set(skb, NULL, IP_CT_UNTRACKED); if (key) ovs_ct_fill_key(skb, key, false); return 0; } #if IS_ENABLED(CONFIG_NF_NAT) static int parse_nat(const struct nlattr *attr, struct ovs_conntrack_info *info, bool log) { struct nlattr *a; int rem; bool have_ip_max = false; bool have_proto_max = false; bool ip_vers = (info->family == NFPROTO_IPV6); nla_for_each_nested(a, attr, rem) { static const int ovs_nat_attr_lens[OVS_NAT_ATTR_MAX + 1][2] = { [OVS_NAT_ATTR_SRC] = {0, 0}, [OVS_NAT_ATTR_DST] = {0, 0}, [OVS_NAT_ATTR_IP_MIN] = {sizeof(struct in_addr), sizeof(struct in6_addr)}, [OVS_NAT_ATTR_IP_MAX] = {sizeof(struct in_addr), sizeof(struct in6_addr)}, [OVS_NAT_ATTR_PROTO_MIN] = {sizeof(u16), sizeof(u16)}, [OVS_NAT_ATTR_PROTO_MAX] = {sizeof(u16), sizeof(u16)}, [OVS_NAT_ATTR_PERSISTENT] = {0, 0}, [OVS_NAT_ATTR_PROTO_HASH] = {0, 0}, [OVS_NAT_ATTR_PROTO_RANDOM] = {0, 0}, }; int type = nla_type(a); if (type > OVS_NAT_ATTR_MAX) { OVS_NLERR(log, "Unknown NAT attribute (type=%d, max=%d)", type, OVS_NAT_ATTR_MAX); return -EINVAL; } if (nla_len(a) != ovs_nat_attr_lens[type][ip_vers]) { OVS_NLERR(log, "NAT attribute type %d has unexpected length (%d != %d)", type, nla_len(a), ovs_nat_attr_lens[type][ip_vers]); return -EINVAL; } switch (type) { case OVS_NAT_ATTR_SRC: case OVS_NAT_ATTR_DST: if (info->nat) { OVS_NLERR(log, "Only one type of NAT may be specified"); return -ERANGE; } info->nat |= OVS_CT_NAT; info->nat |= ((type == OVS_NAT_ATTR_SRC) ? OVS_CT_SRC_NAT : OVS_CT_DST_NAT); break; case OVS_NAT_ATTR_IP_MIN: nla_memcpy(&info->range.min_addr, a, sizeof(info->range.min_addr)); info->range.flags |= NF_NAT_RANGE_MAP_IPS; break; case OVS_NAT_ATTR_IP_MAX: have_ip_max = true; nla_memcpy(&info->range.max_addr, a, sizeof(info->range.max_addr)); info->range.flags |= NF_NAT_RANGE_MAP_IPS; break; case OVS_NAT_ATTR_PROTO_MIN: info->range.min_proto.all = htons(nla_get_u16(a)); info->range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; break; case OVS_NAT_ATTR_PROTO_MAX: have_proto_max = true; info->range.max_proto.all = htons(nla_get_u16(a)); info->range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; break; case OVS_NAT_ATTR_PERSISTENT: info->range.flags |= NF_NAT_RANGE_PERSISTENT; break; case OVS_NAT_ATTR_PROTO_HASH: info->range.flags |= NF_NAT_RANGE_PROTO_RANDOM; break; case OVS_NAT_ATTR_PROTO_RANDOM: info->range.flags |= NF_NAT_RANGE_PROTO_RANDOM_FULLY; break; default: OVS_NLERR(log, "Unknown nat attribute (%d)", type); return -EINVAL; } } if (rem > 0) { OVS_NLERR(log, "NAT attribute has %d unknown bytes", rem); return -EINVAL; } if (!info->nat) { /* Do not allow flags if no type is given. */ if (info->range.flags) { OVS_NLERR(log, "NAT flags may be given only when NAT range (SRC or DST) is also specified." ); return -EINVAL; } info->nat = OVS_CT_NAT; /* NAT existing connections. */ } else if (!info->commit) { OVS_NLERR(log, "NAT attributes may be specified only when CT COMMIT flag is also specified." ); return -EINVAL; } /* Allow missing IP_MAX. */ if (info->range.flags & NF_NAT_RANGE_MAP_IPS && !have_ip_max) { memcpy(&info->range.max_addr, &info->range.min_addr, sizeof(info->range.max_addr)); } /* Allow missing PROTO_MAX. */ if (info->range.flags & NF_NAT_RANGE_PROTO_SPECIFIED && !have_proto_max) { info->range.max_proto.all = info->range.min_proto.all; } return 0; } #endif static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = { [OVS_CT_ATTR_COMMIT] = { .minlen = 0, .maxlen = 0 }, [OVS_CT_ATTR_FORCE_COMMIT] = { .minlen = 0, .maxlen = 0 }, [OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16), .maxlen = sizeof(u16) }, [OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark), .maxlen = sizeof(struct md_mark) }, [OVS_CT_ATTR_LABELS] = { .minlen = sizeof(struct md_labels), .maxlen = sizeof(struct md_labels) }, [OVS_CT_ATTR_HELPER] = { .minlen = 1, .maxlen = NF_CT_HELPER_NAME_LEN }, #if IS_ENABLED(CONFIG_NF_NAT) /* NAT length is checked when parsing the nested attributes. */ [OVS_CT_ATTR_NAT] = { .minlen = 0, .maxlen = INT_MAX }, #endif [OVS_CT_ATTR_EVENTMASK] = { .minlen = sizeof(u32), .maxlen = sizeof(u32) }, [OVS_CT_ATTR_TIMEOUT] = { .minlen = 1, .maxlen = CTNL_TIMEOUT_NAME_MAX }, }; static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, const char **helper, bool log) { struct nlattr *a; int rem; nla_for_each_nested(a, attr, rem) { int type = nla_type(a); int maxlen; int minlen; if (type > OVS_CT_ATTR_MAX) { OVS_NLERR(log, "Unknown conntrack attr (type=%d, max=%d)", type, OVS_CT_ATTR_MAX); return -EINVAL; } maxlen = ovs_ct_attr_lens[type].maxlen; minlen = ovs_ct_attr_lens[type].minlen; if (nla_len(a) < minlen || nla_len(a) > maxlen) { OVS_NLERR(log, "Conntrack attr type has unexpected length (type=%d, length=%d, expected=%d)", type, nla_len(a), maxlen); return -EINVAL; } switch (type) { case OVS_CT_ATTR_FORCE_COMMIT: info->force = true; fallthrough; case OVS_CT_ATTR_COMMIT: info->commit = true; break; #ifdef CONFIG_NF_CONNTRACK_ZONES case OVS_CT_ATTR_ZONE: info->zone.id = nla_get_u16(a); break; #endif #ifdef CONFIG_NF_CONNTRACK_MARK case OVS_CT_ATTR_MARK: { struct md_mark *mark = nla_data(a); if (!mark->mask) { OVS_NLERR(log, "ct_mark mask cannot be 0"); return -EINVAL; } info->mark = *mark; break; } #endif #ifdef CONFIG_NF_CONNTRACK_LABELS case OVS_CT_ATTR_LABELS: { struct md_labels *labels = nla_data(a); if (!labels_nonzero(&labels->mask)) { OVS_NLERR(log, "ct_labels mask cannot be 0"); return -EINVAL; } info->labels = *labels; break; } #endif case OVS_CT_ATTR_HELPER: *helper = nla_data(a); if (!string_is_terminated(*helper, nla_len(a))) { OVS_NLERR(log, "Invalid conntrack helper"); return -EINVAL; } break; #if IS_ENABLED(CONFIG_NF_NAT) case OVS_CT_ATTR_NAT: { int err = parse_nat(a, info, log); if (err) return err; break; } #endif case OVS_CT_ATTR_EVENTMASK: info->have_eventmask = true; info->eventmask = nla_get_u32(a); break; #ifdef CONFIG_NF_CONNTRACK_TIMEOUT case OVS_CT_ATTR_TIMEOUT: memcpy(info->timeout, nla_data(a), nla_len(a)); if (!string_is_terminated(info->timeout, nla_len(a))) { OVS_NLERR(log, "Invalid conntrack timeout"); return -EINVAL; } break; #endif default: OVS_NLERR(log, "Unknown conntrack attr (%d)", type); return -EINVAL; } } #ifdef CONFIG_NF_CONNTRACK_MARK if (!info->commit && info->mark.mask) { OVS_NLERR(log, "Setting conntrack mark requires 'commit' flag."); return -EINVAL; } #endif #ifdef CONFIG_NF_CONNTRACK_LABELS if (!info->commit && labels_nonzero(&info->labels.mask)) { OVS_NLERR(log, "Setting conntrack labels requires 'commit' flag."); return -EINVAL; } #endif if (rem > 0) { OVS_NLERR(log, "Conntrack attr has %d unknown bytes", rem); return -EINVAL; } return 0; } bool ovs_ct_verify(struct net *net, enum ovs_key_attr attr) { if (attr == OVS_KEY_ATTR_CT_STATE) return true; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && attr == OVS_KEY_ATTR_CT_ZONE) return true; if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && attr == OVS_KEY_ATTR_CT_MARK) return true; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && attr == OVS_KEY_ATTR_CT_LABELS) return true; return false; } int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, bool log) { unsigned int n_bits = sizeof(struct ovs_key_ct_labels) * BITS_PER_BYTE; struct ovs_conntrack_info ct_info; const char *helper = NULL; u16 family; int err; family = key_to_nfproto(key); if (family == NFPROTO_UNSPEC) { OVS_NLERR(log, "ct family unspecified"); return -EINVAL; } memset(&ct_info, 0, sizeof(ct_info)); ct_info.family = family; nf_ct_zone_init(&ct_info.zone, NF_CT_DEFAULT_ZONE_ID, NF_CT_DEFAULT_ZONE_DIR, 0); err = parse_ct(attr, &ct_info, &helper, log); if (err) return err; /* Set up template for tracking connections in specific zones. */ ct_info.ct = nf_ct_tmpl_alloc(net, &ct_info.zone, GFP_KERNEL); if (!ct_info.ct) { OVS_NLERR(log, "Failed to allocate conntrack template"); return -ENOMEM; } if (nf_connlabels_get(net, n_bits - 1)) { nf_ct_tmpl_free(ct_info.ct); OVS_NLERR(log, "Failed to set connlabel length"); return -EOPNOTSUPP; } if (ct_info.timeout[0]) { if (nf_ct_set_timeout(net, ct_info.ct, family, key->ip.proto, ct_info.timeout)) OVS_NLERR(log, "Failed to associated timeout policy '%s'", ct_info.timeout); else ct_info.nf_ct_timeout = rcu_dereference( nf_ct_timeout_find(ct_info.ct)->timeout); } if (helper) { err = nf_ct_add_helper(ct_info.ct, helper, ct_info.family, key->ip.proto, ct_info.nat, &ct_info.helper); if (err) { OVS_NLERR(log, "Failed to add %s helper %d", helper, err); goto err_free_ct; } } err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, &ct_info, sizeof(ct_info), log); if (err) goto err_free_ct; if (ct_info.commit) __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status); return 0; err_free_ct: __ovs_ct_free_action(&ct_info); return err; } #if IS_ENABLED(CONFIG_NF_NAT) static bool ovs_ct_nat_to_attr(const struct ovs_conntrack_info *info, struct sk_buff *skb) { struct nlattr *start; start = nla_nest_start_noflag(skb, OVS_CT_ATTR_NAT); if (!start) return false; if (info->nat & OVS_CT_SRC_NAT) { if (nla_put_flag(skb, OVS_NAT_ATTR_SRC)) return false; } else if (info->nat & OVS_CT_DST_NAT) { if (nla_put_flag(skb, OVS_NAT_ATTR_DST)) return false; } else { goto out; } if (info->range.flags & NF_NAT_RANGE_MAP_IPS) { if (IS_ENABLED(CONFIG_NF_NAT) && info->family == NFPROTO_IPV4) { if (nla_put_in_addr(skb, OVS_NAT_ATTR_IP_MIN, info->range.min_addr.ip) || (info->range.max_addr.ip != info->range.min_addr.ip && (nla_put_in_addr(skb, OVS_NAT_ATTR_IP_MAX, info->range.max_addr.ip)))) return false; } else if (IS_ENABLED(CONFIG_IPV6) && info->family == NFPROTO_IPV6) { if (nla_put_in6_addr(skb, OVS_NAT_ATTR_IP_MIN, &info->range.min_addr.in6) || (memcmp(&info->range.max_addr.in6, &info->range.min_addr.in6, sizeof(info->range.max_addr.in6)) && (nla_put_in6_addr(skb, OVS_NAT_ATTR_IP_MAX, &info->range.max_addr.in6)))) return false; } else { return false; } } if (info->range.flags & NF_NAT_RANGE_PROTO_SPECIFIED && (nla_put_u16(skb, OVS_NAT_ATTR_PROTO_MIN, ntohs(info->range.min_proto.all)) || (info->range.max_proto.all != info->range.min_proto.all && nla_put_u16(skb, OVS_NAT_ATTR_PROTO_MAX, ntohs(info->range.max_proto.all))))) return false; if (info->range.flags & NF_NAT_RANGE_PERSISTENT && nla_put_flag(skb, OVS_NAT_ATTR_PERSISTENT)) return false; if (info->range.flags & NF_NAT_RANGE_PROTO_RANDOM && nla_put_flag(skb, OVS_NAT_ATTR_PROTO_HASH)) return false; if (info->range.flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY && nla_put_flag(skb, OVS_NAT_ATTR_PROTO_RANDOM)) return false; out: nla_nest_end(skb, start); return true; } #endif int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info, struct sk_buff *skb) { struct nlattr *start; start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_CT); if (!start) return -EMSGSIZE; if (ct_info->commit && nla_put_flag(skb, ct_info->force ? OVS_CT_ATTR_FORCE_COMMIT : OVS_CT_ATTR_COMMIT)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && ct_info->mark.mask && nla_put(skb, OVS_CT_ATTR_MARK, sizeof(ct_info->mark), &ct_info->mark)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && labels_nonzero(&ct_info->labels.mask) && nla_put(skb, OVS_CT_ATTR_LABELS, sizeof(ct_info->labels), &ct_info->labels)) return -EMSGSIZE; if (ct_info->helper) { if (nla_put_string(skb, OVS_CT_ATTR_HELPER, ct_info->helper->name)) return -EMSGSIZE; } if (ct_info->have_eventmask && nla_put_u32(skb, OVS_CT_ATTR_EVENTMASK, ct_info->eventmask)) return -EMSGSIZE; if (ct_info->timeout[0]) { if (nla_put_string(skb, OVS_CT_ATTR_TIMEOUT, ct_info->timeout)) return -EMSGSIZE; } #if IS_ENABLED(CONFIG_NF_NAT) if (ct_info->nat && !ovs_ct_nat_to_attr(ct_info, skb)) return -EMSGSIZE; #endif nla_nest_end(skb, start); return 0; } void ovs_ct_free_action(const struct nlattr *a) { struct ovs_conntrack_info *ct_info = nla_data(a); __ovs_ct_free_action(ct_info); } static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info) { if (ct_info->helper) { #if IS_ENABLED(CONFIG_NF_NAT) if (ct_info->nat) nf_nat_helper_put(ct_info->helper); #endif nf_conntrack_helper_put(ct_info->helper); } if (ct_info->ct) { if (ct_info->timeout[0]) nf_ct_destroy_timeout(ct_info->ct); nf_connlabels_put(nf_ct_net(ct_info->ct)); nf_ct_tmpl_free(ct_info->ct); } } #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net) { int i, err; ovs_net->ct_limit_info = kmalloc(sizeof(*ovs_net->ct_limit_info), GFP_KERNEL); if (!ovs_net->ct_limit_info) return -ENOMEM; ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT; ovs_net->ct_limit_info->limits = kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head), GFP_KERNEL); if (!ovs_net->ct_limit_info->limits) { kfree(ovs_net->ct_limit_info); return -ENOMEM; } for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++) INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]); ovs_net->ct_limit_info->data = nf_conncount_init(net, sizeof(u32)); if (IS_ERR(ovs_net->ct_limit_info->data)) { err = PTR_ERR(ovs_net->ct_limit_info->data); kfree(ovs_net->ct_limit_info->limits); kfree(ovs_net->ct_limit_info); pr_err("openvswitch: failed to init nf_conncount %d\n", err); return err; } return 0; } static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net) { const struct ovs_ct_limit_info *info = ovs_net->ct_limit_info; int i; nf_conncount_destroy(net, info->data); for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) { struct hlist_head *head = &info->limits[i]; struct ovs_ct_limit *ct_limit; struct hlist_node *next; hlist_for_each_entry_safe(ct_limit, next, head, hlist_node) kfree_rcu(ct_limit, rcu); } kfree(info->limits); kfree(info); } static struct sk_buff * ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd, struct ovs_header **ovs_reply_header) { struct ovs_header *ovs_header = genl_info_userhdr(info); struct sk_buff *skb; skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!skb) return ERR_PTR(-ENOMEM); *ovs_reply_header = genlmsg_put(skb, info->snd_portid, info->snd_seq, &dp_ct_limit_genl_family, 0, cmd); if (!*ovs_reply_header) { nlmsg_free(skb); return ERR_PTR(-EMSGSIZE); } (*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex; return skb; } static bool check_zone_id(int zone_id, u16 *pzone) { if (zone_id >= 0 && zone_id <= 65535) { *pzone = (u16)zone_id; return true; } return false; } static int ovs_ct_limit_set_zone_limit(struct nlattr *nla_zone_limit, struct ovs_ct_limit_info *info) { struct ovs_zone_limit *zone_limit; int rem; u16 zone; rem = NLA_ALIGN(nla_len(nla_zone_limit)); zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit); while (rem >= sizeof(*zone_limit)) { if (unlikely(zone_limit->zone_id == OVS_ZONE_LIMIT_DEFAULT_ZONE)) { ovs_lock(); info->default_limit = zone_limit->limit; ovs_unlock(); } else if (unlikely(!check_zone_id( zone_limit->zone_id, &zone))) { OVS_NLERR(true, "zone id is out of range"); } else { struct ovs_ct_limit *ct_limit; ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL_ACCOUNT); if (!ct_limit) return -ENOMEM; ct_limit->zone = zone; ct_limit->limit = zone_limit->limit; ovs_lock(); ct_limit_set(info, ct_limit); ovs_unlock(); } rem -= NLA_ALIGN(sizeof(*zone_limit)); zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit + NLA_ALIGN(sizeof(*zone_limit))); } if (rem) OVS_NLERR(true, "set zone limit has %d unknown bytes", rem); return 0; } static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit, struct ovs_ct_limit_info *info) { struct ovs_zone_limit *zone_limit; int rem; u16 zone; rem = NLA_ALIGN(nla_len(nla_zone_limit)); zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit); while (rem >= sizeof(*zone_limit)) { if (unlikely(zone_limit->zone_id == OVS_ZONE_LIMIT_DEFAULT_ZONE)) { ovs_lock(); info->default_limit = OVS_CT_LIMIT_DEFAULT; ovs_unlock(); } else if (unlikely(!check_zone_id( zone_limit->zone_id, &zone))) { OVS_NLERR(true, "zone id is out of range"); } else { ovs_lock(); ct_limit_del(info, zone); ovs_unlock(); } rem -= NLA_ALIGN(sizeof(*zone_limit)); zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit + NLA_ALIGN(sizeof(*zone_limit))); } if (rem) OVS_NLERR(true, "del zone limit has %d unknown bytes", rem); return 0; } static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info, struct sk_buff *reply) { struct ovs_zone_limit zone_limit = { .zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE, .limit = info->default_limit, }; return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit); } static int __ovs_ct_limit_get_zone_limit(struct net *net, struct nf_conncount_data *data, u16 zone_id, u32 limit, struct sk_buff *reply) { struct nf_conntrack_zone ct_zone; struct ovs_zone_limit zone_limit; u32 conncount_key = zone_id; zone_limit.zone_id = zone_id; zone_limit.limit = limit; nf_ct_zone_init(&ct_zone, zone_id, NF_CT_DEFAULT_ZONE_DIR, 0); zone_limit.count = nf_conncount_count(net, data, &conncount_key, NULL, &ct_zone); return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit); } static int ovs_ct_limit_get_zone_limit(struct net *net, struct nlattr *nla_zone_limit, struct ovs_ct_limit_info *info, struct sk_buff *reply) { struct ovs_zone_limit *zone_limit; int rem, err; u32 limit; u16 zone; rem = NLA_ALIGN(nla_len(nla_zone_limit)); zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit); while (rem >= sizeof(*zone_limit)) { if (unlikely(zone_limit->zone_id == OVS_ZONE_LIMIT_DEFAULT_ZONE)) { err = ovs_ct_limit_get_default_limit(info, reply); if (err) return err; } else if (unlikely(!check_zone_id(zone_limit->zone_id, &zone))) { OVS_NLERR(true, "zone id is out of range"); } else { rcu_read_lock(); limit = ct_limit_get(info, zone); rcu_read_unlock(); err = __ovs_ct_limit_get_zone_limit( net, info->data, zone, limit, reply); if (err) return err; } rem -= NLA_ALIGN(sizeof(*zone_limit)); zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit + NLA_ALIGN(sizeof(*zone_limit))); } if (rem) OVS_NLERR(true, "get zone limit has %d unknown bytes", rem); return 0; } static int ovs_ct_limit_get_all_zone_limit(struct net *net, struct ovs_ct_limit_info *info, struct sk_buff *reply) { struct ovs_ct_limit *ct_limit; struct hlist_head *head; int i, err = 0; err = ovs_ct_limit_get_default_limit(info, reply); if (err) return err; rcu_read_lock(); for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) { head = &info->limits[i]; hlist_for_each_entry_rcu(ct_limit, head, hlist_node) { err = __ovs_ct_limit_get_zone_limit(net, info->data, ct_limit->zone, ct_limit->limit, reply); if (err) goto exit_err; } } exit_err: rcu_read_unlock(); return err; } static int ovs_ct_limit_cmd_set(struct sk_buff *skb, struct genl_info *info) { struct nlattr **a = info->attrs; struct sk_buff *reply; struct ovs_header *ovs_reply_header; struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id); struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info; int err; reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_SET, &ovs_reply_header); if (IS_ERR(reply)) return PTR_ERR(reply); if (!a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) { err = -EINVAL; goto exit_err; } err = ovs_ct_limit_set_zone_limit(a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT], ct_limit_info); if (err) goto exit_err; static_branch_enable(&ovs_ct_limit_enabled); genlmsg_end(reply, ovs_reply_header); return genlmsg_reply(reply, info); exit_err: nlmsg_free(reply); return err; } static int ovs_ct_limit_cmd_del(struct sk_buff *skb, struct genl_info *info) { struct nlattr **a = info->attrs; struct sk_buff *reply; struct ovs_header *ovs_reply_header; struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id); struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info; int err; reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_DEL, &ovs_reply_header); if (IS_ERR(reply)) return PTR_ERR(reply); if (!a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) { err = -EINVAL; goto exit_err; } err = ovs_ct_limit_del_zone_limit(a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT], ct_limit_info); if (err) goto exit_err; genlmsg_end(reply, ovs_reply_header); return genlmsg_reply(reply, info); exit_err: nlmsg_free(reply); return err; } static int ovs_ct_limit_cmd_get(struct sk_buff *skb, struct genl_info *info) { struct nlattr **a = info->attrs; struct nlattr *nla_reply; struct sk_buff *reply; struct ovs_header *ovs_reply_header; struct net *net = sock_net(skb->sk); struct ovs_net *ovs_net = net_generic(net, ovs_net_id); struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info; int err; reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_GET, &ovs_reply_header); if (IS_ERR(reply)) return PTR_ERR(reply); nla_reply = nla_nest_start_noflag(reply, OVS_CT_LIMIT_ATTR_ZONE_LIMIT); if (!nla_reply) { err = -EMSGSIZE; goto exit_err; } if (a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) { err = ovs_ct_limit_get_zone_limit( net, a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT], ct_limit_info, reply); if (err) goto exit_err; } else { err = ovs_ct_limit_get_all_zone_limit(net, ct_limit_info, reply); if (err) goto exit_err; } nla_nest_end(reply, nla_reply); genlmsg_end(reply, ovs_reply_header); return genlmsg_reply(reply, info); exit_err: nlmsg_free(reply); return err; } static const struct genl_small_ops ct_limit_genl_ops[] = { { .cmd = OVS_CT_LIMIT_CMD_SET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN * privilege. */ .doit = ovs_ct_limit_cmd_set, }, { .cmd = OVS_CT_LIMIT_CMD_DEL, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN * privilege. */ .doit = ovs_ct_limit_cmd_del, }, { .cmd = OVS_CT_LIMIT_CMD_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = 0, /* OK for unprivileged users. */ .doit = ovs_ct_limit_cmd_get, }, }; static const struct genl_multicast_group ovs_ct_limit_multicast_group = { .name = OVS_CT_LIMIT_MCGROUP, }; struct genl_family dp_ct_limit_genl_family __ro_after_init = { .hdrsize = sizeof(struct ovs_header), .name = OVS_CT_LIMIT_FAMILY, .version = OVS_CT_LIMIT_VERSION, .maxattr = OVS_CT_LIMIT_ATTR_MAX, .policy = ct_limit_policy, .netnsok = true, .parallel_ops = true, .small_ops = ct_limit_genl_ops, .n_small_ops = ARRAY_SIZE(ct_limit_genl_ops), .resv_start_op = OVS_CT_LIMIT_CMD_GET + 1, .mcgrps = &ovs_ct_limit_multicast_group, .n_mcgrps = 1, .module = THIS_MODULE, }; #endif int ovs_ct_init(struct net *net) { #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) struct ovs_net *ovs_net = net_generic(net, ovs_net_id); return ovs_ct_limit_init(net, ovs_net); #else return 0; #endif } void ovs_ct_exit(struct net *net) { #if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT) struct ovs_net *ovs_net = net_generic(net, ovs_net_id); ovs_ct_limit_exit(net, ovs_net); #endif } |
9 5 8 5 8 4 4 4 2 3 4 9 1 1 3 1 3 9 10 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 | // SPDX-License-Identifier: GPL-2.0+ /* * F81532/F81534 USB to Serial Ports Bridge * * F81532 => 2 Serial Ports * F81534 => 4 Serial Ports * * Copyright (C) 2016 Feature Integration Technology Inc., (Fintek) * Copyright (C) 2016 Tom Tsai (Tom_Tsai@fintek.com.tw) * Copyright (C) 2016 Peter Hong (Peter_Hong@fintek.com.tw) * * The F81532/F81534 had 1 control endpoint for setting, 1 endpoint bulk-out * for all serial port TX and 1 endpoint bulk-in for all serial port read in * (Read Data/MSR/LSR). * * Write URB is fixed with 512bytes, per serial port used 128Bytes. * It can be described by f81534_prepare_write_buffer() * * Read URB is 512Bytes max, per serial port used 128Bytes. * It can be described by f81534_process_read_urb() and maybe received with * 128x1,2,3,4 bytes. * */ #include <linux/slab.h> #include <linux/tty.h> #include <linux/tty_flip.h> #include <linux/usb.h> #include <linux/usb/serial.h> #include <linux/serial_reg.h> #include <linux/module.h> #include <linux/uaccess.h> /* Serial Port register Address */ #define F81534_UART_BASE_ADDRESS 0x1200 #define F81534_UART_OFFSET 0x10 #define F81534_DIVISOR_LSB_REG (0x00 + F81534_UART_BASE_ADDRESS) #define F81534_DIVISOR_MSB_REG (0x01 + F81534_UART_BASE_ADDRESS) #define F81534_INTERRUPT_ENABLE_REG (0x01 + F81534_UART_BASE_ADDRESS) #define F81534_FIFO_CONTROL_REG (0x02 + F81534_UART_BASE_ADDRESS) #define F81534_LINE_CONTROL_REG (0x03 + F81534_UART_BASE_ADDRESS) #define F81534_MODEM_CONTROL_REG (0x04 + F81534_UART_BASE_ADDRESS) #define F81534_LINE_STATUS_REG (0x05 + F81534_UART_BASE_ADDRESS) #define F81534_MODEM_STATUS_REG (0x06 + F81534_UART_BASE_ADDRESS) #define F81534_CLOCK_REG (0x08 + F81534_UART_BASE_ADDRESS) #define F81534_CONFIG1_REG (0x09 + F81534_UART_BASE_ADDRESS) #define F81534_DEF_CONF_ADDRESS_START 0x3000 #define F81534_DEF_CONF_SIZE 12 #define F81534_CUSTOM_ADDRESS_START 0x2f00 #define F81534_CUSTOM_DATA_SIZE 0x10 #define F81534_CUSTOM_NO_CUSTOM_DATA 0xff #define F81534_CUSTOM_VALID_TOKEN 0xf0 #define F81534_CONF_OFFSET 1 #define F81534_CONF_INIT_GPIO_OFFSET 4 #define F81534_CONF_WORK_GPIO_OFFSET 8 #define F81534_CONF_GPIO_SHUTDOWN 7 #define F81534_CONF_GPIO_RS232 1 #define F81534_MAX_DATA_BLOCK 64 #define F81534_MAX_BUS_RETRY 20 /* Default URB timeout for USB operations */ #define F81534_USB_MAX_RETRY 10 #define F81534_USB_TIMEOUT 2000 #define F81534_SET_GET_REGISTER 0xA0 #define F81534_NUM_PORT 4 #define F81534_UNUSED_PORT 0xff #define F81534_WRITE_BUFFER_SIZE 512 #define DRIVER_DESC "Fintek F81532/F81534" #define FINTEK_VENDOR_ID_1 0x1934 #define FINTEK_VENDOR_ID_2 0x2C42 #define FINTEK_DEVICE_ID 0x1202 #define F81534_MAX_TX_SIZE 124 #define F81534_MAX_RX_SIZE 124 #define F81534_RECEIVE_BLOCK_SIZE 128 #define F81534_MAX_RECEIVE_BLOCK_SIZE 512 #define F81534_TOKEN_RECEIVE 0x01 #define F81534_TOKEN_WRITE 0x02 #define F81534_TOKEN_TX_EMPTY 0x03 #define F81534_TOKEN_MSR_CHANGE 0x04 /* * We used interal SPI bus to access FLASH section. We must wait the SPI bus to * idle if we performed any command. * * SPI Bus status register: F81534_BUS_REG_STATUS * Bit 0/1 : BUSY * Bit 2 : IDLE */ #define F81534_BUS_BUSY (BIT(0) | BIT(1)) #define F81534_BUS_IDLE BIT(2) #define F81534_BUS_READ_DATA 0x1004 #define F81534_BUS_REG_STATUS 0x1003 #define F81534_BUS_REG_START 0x1002 #define F81534_BUS_REG_END 0x1001 #define F81534_CMD_READ 0x03 #define F81534_DEFAULT_BAUD_RATE 9600 #define F81534_PORT_CONF_RS232 0 #define F81534_PORT_CONF_RS485 BIT(0) #define F81534_PORT_CONF_RS485_INVERT (BIT(0) | BIT(1)) #define F81534_PORT_CONF_MODE_MASK GENMASK(1, 0) #define F81534_PORT_CONF_DISABLE_PORT BIT(3) #define F81534_PORT_CONF_NOT_EXIST_PORT BIT(7) #define F81534_PORT_UNAVAILABLE \ (F81534_PORT_CONF_DISABLE_PORT | F81534_PORT_CONF_NOT_EXIST_PORT) #define F81534_1X_RXTRIGGER 0xc3 #define F81534_8X_RXTRIGGER 0xcf /* * F81532/534 Clock registers (offset +08h) * * Bit0: UART Enable (always on) * Bit2-1: Clock source selector * 00: 1.846MHz. * 01: 18.46MHz. * 10: 24MHz. * 11: 14.77MHz. * Bit4: Auto direction(RTS) control (RTS pin Low when TX) * Bit5: Invert direction(RTS) when Bit4 enabled (RTS pin high when TX) */ #define F81534_UART_EN BIT(0) #define F81534_CLK_1_846_MHZ 0 #define F81534_CLK_18_46_MHZ BIT(1) #define F81534_CLK_24_MHZ BIT(2) #define F81534_CLK_14_77_MHZ (BIT(1) | BIT(2)) #define F81534_CLK_MASK GENMASK(2, 1) #define F81534_CLK_TX_DELAY_1BIT BIT(3) #define F81534_CLK_RS485_MODE BIT(4) #define F81534_CLK_RS485_INVERT BIT(5) static const struct usb_device_id f81534_id_table[] = { { USB_DEVICE(FINTEK_VENDOR_ID_1, FINTEK_DEVICE_ID) }, { USB_DEVICE(FINTEK_VENDOR_ID_2, FINTEK_DEVICE_ID) }, {} /* Terminating entry */ }; #define F81534_TX_EMPTY_BIT 0 struct f81534_serial_private { u8 conf_data[F81534_DEF_CONF_SIZE]; int tty_idx[F81534_NUM_PORT]; u8 setting_idx; int opened_port; struct mutex urb_mutex; }; struct f81534_port_private { struct mutex mcr_mutex; struct mutex lcr_mutex; struct work_struct lsr_work; struct usb_serial_port *port; unsigned long tx_empty; spinlock_t msr_lock; u32 baud_base; u8 shadow_mcr; u8 shadow_lcr; u8 shadow_msr; u8 shadow_clk; u8 phy_num; }; struct f81534_pin_data { const u16 reg_addr; const u8 reg_mask; }; struct f81534_port_out_pin { struct f81534_pin_data pin[3]; }; /* Pin output value for M2/M1/M0(SD) */ static const struct f81534_port_out_pin f81534_port_out_pins[] = { { { { 0x2ae8, BIT(7) }, { 0x2a90, BIT(5) }, { 0x2a90, BIT(4) } } }, { { { 0x2ae8, BIT(6) }, { 0x2ae8, BIT(0) }, { 0x2ae8, BIT(3) } } }, { { { 0x2a90, BIT(0) }, { 0x2ae8, BIT(2) }, { 0x2a80, BIT(6) } } }, { { { 0x2a90, BIT(3) }, { 0x2a90, BIT(2) }, { 0x2a90, BIT(1) } } }, }; static u32 const baudrate_table[] = { 115200, 921600, 1152000, 1500000 }; static u8 const clock_table[] = { F81534_CLK_1_846_MHZ, F81534_CLK_14_77_MHZ, F81534_CLK_18_46_MHZ, F81534_CLK_24_MHZ }; static int f81534_logic_to_phy_port(struct usb_serial *serial, struct usb_serial_port *port) { struct f81534_serial_private *serial_priv = usb_get_serial_data(port->serial); int count = 0; int i; for (i = 0; i < F81534_NUM_PORT; ++i) { if (serial_priv->conf_data[i] & F81534_PORT_UNAVAILABLE) continue; if (port->port_number == count) return i; ++count; } return -ENODEV; } static int f81534_set_register(struct usb_serial *serial, u16 reg, u8 data) { struct usb_interface *interface = serial->interface; struct usb_device *dev = serial->dev; size_t count = F81534_USB_MAX_RETRY; int status; u8 *tmp; tmp = kmalloc(sizeof(u8), GFP_KERNEL); if (!tmp) return -ENOMEM; *tmp = data; /* * Our device maybe not reply when heavily loading, We'll retry for * F81534_USB_MAX_RETRY times. */ while (count--) { status = usb_control_msg(dev, usb_sndctrlpipe(dev, 0), F81534_SET_GET_REGISTER, USB_TYPE_VENDOR | USB_DIR_OUT, reg, 0, tmp, sizeof(u8), F81534_USB_TIMEOUT); if (status == sizeof(u8)) { status = 0; break; } } if (status < 0) { dev_err(&interface->dev, "%s: reg: %x data: %x failed: %d\n", __func__, reg, data, status); } kfree(tmp); return status; } static int f81534_get_register(struct usb_serial *serial, u16 reg, u8 *data) { struct usb_interface *interface = serial->interface; struct usb_device *dev = serial->dev; size_t count = F81534_USB_MAX_RETRY; int status; u8 *tmp; tmp = kmalloc(sizeof(u8), GFP_KERNEL); if (!tmp) return -ENOMEM; /* * Our device maybe not reply when heavily loading, We'll retry for * F81534_USB_MAX_RETRY times. */ while (count--) { status = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), F81534_SET_GET_REGISTER, USB_TYPE_VENDOR | USB_DIR_IN, reg, 0, tmp, sizeof(u8), F81534_USB_TIMEOUT); if (status > 0) { status = 0; break; } else if (status == 0) { status = -EIO; } } if (status < 0) { dev_err(&interface->dev, "%s: reg: %x failed: %d\n", __func__, reg, status); goto end; } *data = *tmp; end: kfree(tmp); return status; } static int f81534_set_mask_register(struct usb_serial *serial, u16 reg, u8 mask, u8 data) { int status; u8 tmp; status = f81534_get_register(serial, reg, &tmp); if (status) return status; tmp &= ~mask; tmp |= (mask & data); return f81534_set_register(serial, reg, tmp); } static int f81534_set_phy_port_register(struct usb_serial *serial, int phy, u16 reg, u8 data) { return f81534_set_register(serial, reg + F81534_UART_OFFSET * phy, data); } static int f81534_get_phy_port_register(struct usb_serial *serial, int phy, u16 reg, u8 *data) { return f81534_get_register(serial, reg + F81534_UART_OFFSET * phy, data); } static int f81534_set_port_register(struct usb_serial_port *port, u16 reg, u8 data) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); return f81534_set_register(port->serial, reg + port_priv->phy_num * F81534_UART_OFFSET, data); } static int f81534_get_port_register(struct usb_serial_port *port, u16 reg, u8 *data) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); return f81534_get_register(port->serial, reg + port_priv->phy_num * F81534_UART_OFFSET, data); } /* * If we try to access the internal flash via SPI bus, we should check the bus * status for every command. e.g., F81534_BUS_REG_START/F81534_BUS_REG_END */ static int f81534_wait_for_spi_idle(struct usb_serial *serial) { size_t count = F81534_MAX_BUS_RETRY; u8 tmp; int status; do { status = f81534_get_register(serial, F81534_BUS_REG_STATUS, &tmp); if (status) return status; if (tmp & F81534_BUS_BUSY) continue; if (tmp & F81534_BUS_IDLE) break; } while (--count); if (!count) { dev_err(&serial->interface->dev, "%s: timed out waiting for idle SPI bus\n", __func__); return -EIO; } return f81534_set_register(serial, F81534_BUS_REG_STATUS, tmp & ~F81534_BUS_IDLE); } static int f81534_get_spi_register(struct usb_serial *serial, u16 reg, u8 *data) { int status; status = f81534_get_register(serial, reg, data); if (status) return status; return f81534_wait_for_spi_idle(serial); } static int f81534_set_spi_register(struct usb_serial *serial, u16 reg, u8 data) { int status; status = f81534_set_register(serial, reg, data); if (status) return status; return f81534_wait_for_spi_idle(serial); } static int f81534_read_flash(struct usb_serial *serial, u32 address, size_t size, u8 *buf) { u8 tmp_buf[F81534_MAX_DATA_BLOCK]; size_t block = 0; size_t read_size; size_t count; int status; int offset; u16 reg_tmp; status = f81534_set_spi_register(serial, F81534_BUS_REG_START, F81534_CMD_READ); if (status) return status; status = f81534_set_spi_register(serial, F81534_BUS_REG_START, (address >> 16) & 0xff); if (status) return status; status = f81534_set_spi_register(serial, F81534_BUS_REG_START, (address >> 8) & 0xff); if (status) return status; status = f81534_set_spi_register(serial, F81534_BUS_REG_START, (address >> 0) & 0xff); if (status) return status; /* Continuous read mode */ do { read_size = min_t(size_t, F81534_MAX_DATA_BLOCK, size); for (count = 0; count < read_size; ++count) { /* To write F81534_BUS_REG_END when final byte */ if (size <= F81534_MAX_DATA_BLOCK && read_size == count + 1) reg_tmp = F81534_BUS_REG_END; else reg_tmp = F81534_BUS_REG_START; /* * Dummy code, force IC to generate a read pulse, the * set of value 0xf1 is dont care (any value is ok) */ status = f81534_set_spi_register(serial, reg_tmp, 0xf1); if (status) return status; status = f81534_get_spi_register(serial, F81534_BUS_READ_DATA, &tmp_buf[count]); if (status) return status; offset = count + block * F81534_MAX_DATA_BLOCK; buf[offset] = tmp_buf[count]; } size -= read_size; ++block; } while (size); return 0; } static void f81534_prepare_write_buffer(struct usb_serial_port *port, u8 *buf) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int phy_num = port_priv->phy_num; u8 tx_len; int i; /* * The block layout is fixed with 4x128 Bytes, per 128 Bytes a port. * index 0: port phy idx (e.g., 0,1,2,3) * index 1: only F81534_TOKEN_WRITE * index 2: serial TX out length * index 3: fix to 0 * index 4~127: serial out data block */ for (i = 0; i < F81534_NUM_PORT; ++i) { buf[i * F81534_RECEIVE_BLOCK_SIZE] = i; buf[i * F81534_RECEIVE_BLOCK_SIZE + 1] = F81534_TOKEN_WRITE; buf[i * F81534_RECEIVE_BLOCK_SIZE + 2] = 0; buf[i * F81534_RECEIVE_BLOCK_SIZE + 3] = 0; } tx_len = kfifo_out_locked(&port->write_fifo, &buf[phy_num * F81534_RECEIVE_BLOCK_SIZE + 4], F81534_MAX_TX_SIZE, &port->lock); buf[phy_num * F81534_RECEIVE_BLOCK_SIZE + 2] = tx_len; } static int f81534_submit_writer(struct usb_serial_port *port, gfp_t mem_flags) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); struct urb *urb; unsigned long flags; int result; /* Check is any data in write_fifo */ spin_lock_irqsave(&port->lock, flags); if (kfifo_is_empty(&port->write_fifo)) { spin_unlock_irqrestore(&port->lock, flags); return 0; } spin_unlock_irqrestore(&port->lock, flags); /* Check H/W is TXEMPTY */ if (!test_and_clear_bit(F81534_TX_EMPTY_BIT, &port_priv->tx_empty)) return 0; urb = port->write_urbs[0]; f81534_prepare_write_buffer(port, port->bulk_out_buffers[0]); urb->transfer_buffer_length = F81534_WRITE_BUFFER_SIZE; result = usb_submit_urb(urb, mem_flags); if (result) { set_bit(F81534_TX_EMPTY_BIT, &port_priv->tx_empty); dev_err(&port->dev, "%s: submit failed: %d\n", __func__, result); return result; } usb_serial_port_softint(port); return 0; } static u32 f81534_calc_baud_divisor(u32 baudrate, u32 clockrate) { /* Round to nearest divisor */ return DIV_ROUND_CLOSEST(clockrate, baudrate); } static int f81534_find_clk(u32 baudrate) { int idx; for (idx = 0; idx < ARRAY_SIZE(baudrate_table); ++idx) { if (baudrate <= baudrate_table[idx] && baudrate_table[idx] % baudrate == 0) return idx; } return -EINVAL; } static int f81534_set_port_config(struct usb_serial_port *port, struct tty_struct *tty, u32 baudrate, u32 old_baudrate, u8 lcr) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); u32 divisor; int status; int i; int idx; u8 value; u32 baud_list[] = {baudrate, old_baudrate, F81534_DEFAULT_BAUD_RATE}; for (i = 0; i < ARRAY_SIZE(baud_list); ++i) { baudrate = baud_list[i]; if (baudrate == 0) { tty_encode_baud_rate(tty, 0, 0); return 0; } idx = f81534_find_clk(baudrate); if (idx >= 0) { tty_encode_baud_rate(tty, baudrate, baudrate); break; } } if (idx < 0) return -EINVAL; port_priv->baud_base = baudrate_table[idx]; port_priv->shadow_clk &= ~F81534_CLK_MASK; port_priv->shadow_clk |= clock_table[idx]; status = f81534_set_port_register(port, F81534_CLOCK_REG, port_priv->shadow_clk); if (status) { dev_err(&port->dev, "CLOCK_REG setting failed\n"); return status; } if (baudrate <= 1200) value = F81534_1X_RXTRIGGER; /* 128 FIFO & TL: 1x */ else value = F81534_8X_RXTRIGGER; /* 128 FIFO & TL: 8x */ status = f81534_set_port_register(port, F81534_CONFIG1_REG, value); if (status) { dev_err(&port->dev, "%s: CONFIG1 setting failed\n", __func__); return status; } if (baudrate <= 1200) value = UART_FCR_TRIGGER_1 | UART_FCR_ENABLE_FIFO; /* TL: 1 */ else value = UART_FCR_TRIGGER_8 | UART_FCR_ENABLE_FIFO; /* TL: 8 */ status = f81534_set_port_register(port, F81534_FIFO_CONTROL_REG, value); if (status) { dev_err(&port->dev, "%s: FCR setting failed\n", __func__); return status; } divisor = f81534_calc_baud_divisor(baudrate, port_priv->baud_base); mutex_lock(&port_priv->lcr_mutex); value = UART_LCR_DLAB; status = f81534_set_port_register(port, F81534_LINE_CONTROL_REG, value); if (status) { dev_err(&port->dev, "%s: set LCR failed\n", __func__); goto out_unlock; } value = divisor & 0xff; status = f81534_set_port_register(port, F81534_DIVISOR_LSB_REG, value); if (status) { dev_err(&port->dev, "%s: set DLAB LSB failed\n", __func__); goto out_unlock; } value = (divisor >> 8) & 0xff; status = f81534_set_port_register(port, F81534_DIVISOR_MSB_REG, value); if (status) { dev_err(&port->dev, "%s: set DLAB MSB failed\n", __func__); goto out_unlock; } value = lcr | (port_priv->shadow_lcr & UART_LCR_SBC); status = f81534_set_port_register(port, F81534_LINE_CONTROL_REG, value); if (status) { dev_err(&port->dev, "%s: set LCR failed\n", __func__); goto out_unlock; } port_priv->shadow_lcr = value; out_unlock: mutex_unlock(&port_priv->lcr_mutex); return status; } static int f81534_break_ctl(struct tty_struct *tty, int break_state) { struct usb_serial_port *port = tty->driver_data; struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int status; mutex_lock(&port_priv->lcr_mutex); if (break_state) port_priv->shadow_lcr |= UART_LCR_SBC; else port_priv->shadow_lcr &= ~UART_LCR_SBC; status = f81534_set_port_register(port, F81534_LINE_CONTROL_REG, port_priv->shadow_lcr); if (status) dev_err(&port->dev, "set break failed: %d\n", status); mutex_unlock(&port_priv->lcr_mutex); return status; } static int f81534_update_mctrl(struct usb_serial_port *port, unsigned int set, unsigned int clear) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int status; u8 tmp; if (((set | clear) & (TIOCM_DTR | TIOCM_RTS)) == 0) return 0; /* no change */ mutex_lock(&port_priv->mcr_mutex); /* 'Set' takes precedence over 'Clear' */ clear &= ~set; /* Always enable UART_MCR_OUT2 */ tmp = UART_MCR_OUT2 | port_priv->shadow_mcr; if (clear & TIOCM_DTR) tmp &= ~UART_MCR_DTR; if (clear & TIOCM_RTS) tmp &= ~UART_MCR_RTS; if (set & TIOCM_DTR) tmp |= UART_MCR_DTR; if (set & TIOCM_RTS) tmp |= UART_MCR_RTS; status = f81534_set_port_register(port, F81534_MODEM_CONTROL_REG, tmp); if (status < 0) { dev_err(&port->dev, "%s: MCR write failed\n", __func__); mutex_unlock(&port_priv->mcr_mutex); return status; } port_priv->shadow_mcr = tmp; mutex_unlock(&port_priv->mcr_mutex); return 0; } /* * This function will search the data area with token F81534_CUSTOM_VALID_TOKEN * for latest configuration index. If nothing found * (*index = F81534_CUSTOM_NO_CUSTOM_DATA), We'll load default configure in * F81534_DEF_CONF_ADDRESS_START section. * * Due to we only use block0 to save data, so *index should be 0 or * F81534_CUSTOM_NO_CUSTOM_DATA. */ static int f81534_find_config_idx(struct usb_serial *serial, u8 *index) { u8 tmp; int status; status = f81534_read_flash(serial, F81534_CUSTOM_ADDRESS_START, 1, &tmp); if (status) { dev_err(&serial->interface->dev, "%s: read failed: %d\n", __func__, status); return status; } /* We'll use the custom data when the data is valid. */ if (tmp == F81534_CUSTOM_VALID_TOKEN) *index = 0; else *index = F81534_CUSTOM_NO_CUSTOM_DATA; return 0; } /* * The F81532/534 will not report serial port to USB serial subsystem when * H/W DCD/DSR/CTS/RI/RX pin connected to ground. * * To detect RX pin status, we'll enable MCR interal loopback, disable it and * delayed for 60ms. It connected to ground If LSR register report UART_LSR_BI. */ static bool f81534_check_port_hw_disabled(struct usb_serial *serial, int phy) { int status; u8 old_mcr; u8 msr; u8 lsr; u8 msr_mask; msr_mask = UART_MSR_DCD | UART_MSR_RI | UART_MSR_DSR | UART_MSR_CTS; status = f81534_get_phy_port_register(serial, phy, F81534_MODEM_STATUS_REG, &msr); if (status) return false; if ((msr & msr_mask) != msr_mask) return false; status = f81534_set_phy_port_register(serial, phy, F81534_FIFO_CONTROL_REG, UART_FCR_ENABLE_FIFO | UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT); if (status) return false; status = f81534_get_phy_port_register(serial, phy, F81534_MODEM_CONTROL_REG, &old_mcr); if (status) return false; status = f81534_set_phy_port_register(serial, phy, F81534_MODEM_CONTROL_REG, UART_MCR_LOOP); if (status) return false; status = f81534_set_phy_port_register(serial, phy, F81534_MODEM_CONTROL_REG, 0x0); if (status) return false; msleep(60); status = f81534_get_phy_port_register(serial, phy, F81534_LINE_STATUS_REG, &lsr); if (status) return false; status = f81534_set_phy_port_register(serial, phy, F81534_MODEM_CONTROL_REG, old_mcr); if (status) return false; if ((lsr & UART_LSR_BI) == UART_LSR_BI) return true; return false; } /* * We had 2 generation of F81532/534 IC. All has an internal storage. * * 1st is pure USB-to-TTL RS232 IC and designed for 4 ports only, no any * internal data will used. All mode and gpio control should manually set * by AP or Driver and all storage space value are 0xff. The * f81534_calc_num_ports() will run to final we marked as "oldest version" * for this IC. * * 2rd is designed to more generic to use any transceiver and this is our * mass production type. We'll save data in F81534_CUSTOM_ADDRESS_START * (0x2f00) with 9bytes. The 1st byte is a indicater. If the token is * F81534_CUSTOM_VALID_TOKEN(0xf0), the IC is 2nd gen type, the following * 4bytes save port mode (0:RS232/1:RS485 Invert/2:RS485), and the last * 4bytes save GPIO state(value from 0~7 to represent 3 GPIO output pin). * The f81534_calc_num_ports() will run to "new style" with checking * F81534_PORT_UNAVAILABLE section. */ static int f81534_calc_num_ports(struct usb_serial *serial, struct usb_serial_endpoints *epds) { struct f81534_serial_private *serial_priv; struct device *dev = &serial->interface->dev; int size_bulk_in = usb_endpoint_maxp(epds->bulk_in[0]); int size_bulk_out = usb_endpoint_maxp(epds->bulk_out[0]); u8 num_port = 0; int index = 0; int status; int i; if (size_bulk_out != F81534_WRITE_BUFFER_SIZE || size_bulk_in != F81534_MAX_RECEIVE_BLOCK_SIZE) { dev_err(dev, "unsupported endpoint max packet size\n"); return -ENODEV; } serial_priv = devm_kzalloc(&serial->interface->dev, sizeof(*serial_priv), GFP_KERNEL); if (!serial_priv) return -ENOMEM; usb_set_serial_data(serial, serial_priv); mutex_init(&serial_priv->urb_mutex); /* Check had custom setting */ status = f81534_find_config_idx(serial, &serial_priv->setting_idx); if (status) { dev_err(&serial->interface->dev, "%s: find idx failed: %d\n", __func__, status); return status; } /* * We'll read custom data only when data available, otherwise we'll * read default value instead. */ if (serial_priv->setting_idx != F81534_CUSTOM_NO_CUSTOM_DATA) { status = f81534_read_flash(serial, F81534_CUSTOM_ADDRESS_START + F81534_CONF_OFFSET, sizeof(serial_priv->conf_data), serial_priv->conf_data); if (status) { dev_err(&serial->interface->dev, "%s: get custom data failed: %d\n", __func__, status); return status; } dev_dbg(&serial->interface->dev, "%s: read config from block: %d\n", __func__, serial_priv->setting_idx); } else { /* Read default board setting */ status = f81534_read_flash(serial, F81534_DEF_CONF_ADDRESS_START, sizeof(serial_priv->conf_data), serial_priv->conf_data); if (status) { dev_err(&serial->interface->dev, "%s: read failed: %d\n", __func__, status); return status; } dev_dbg(&serial->interface->dev, "%s: read default config\n", __func__); } /* New style, find all possible ports */ for (i = 0; i < F81534_NUM_PORT; ++i) { if (f81534_check_port_hw_disabled(serial, i)) serial_priv->conf_data[i] |= F81534_PORT_UNAVAILABLE; if (serial_priv->conf_data[i] & F81534_PORT_UNAVAILABLE) continue; ++num_port; } if (!num_port) { dev_warn(&serial->interface->dev, "no config found, assuming 4 ports\n"); num_port = 4; /* Nothing found, oldest version IC */ } /* Assign phy-to-logic mapping */ for (i = 0; i < F81534_NUM_PORT; ++i) { if (serial_priv->conf_data[i] & F81534_PORT_UNAVAILABLE) continue; serial_priv->tty_idx[i] = index++; dev_dbg(&serial->interface->dev, "%s: phy_num: %d, tty_idx: %d\n", __func__, i, serial_priv->tty_idx[i]); } /* * Setup bulk-out endpoint multiplexing. All ports share the same * bulk-out endpoint. */ BUILD_BUG_ON(ARRAY_SIZE(epds->bulk_out) < F81534_NUM_PORT); for (i = 1; i < num_port; ++i) epds->bulk_out[i] = epds->bulk_out[0]; epds->num_bulk_out = num_port; return num_port; } static void f81534_set_termios(struct tty_struct *tty, struct usb_serial_port *port, const struct ktermios *old_termios) { u8 new_lcr = 0; int status; u32 baud; u32 old_baud; if (C_BAUD(tty) == B0) f81534_update_mctrl(port, 0, TIOCM_DTR | TIOCM_RTS); else if (old_termios && (old_termios->c_cflag & CBAUD) == B0) f81534_update_mctrl(port, TIOCM_DTR | TIOCM_RTS, 0); if (C_PARENB(tty)) { new_lcr |= UART_LCR_PARITY; if (!C_PARODD(tty)) new_lcr |= UART_LCR_EPAR; if (C_CMSPAR(tty)) new_lcr |= UART_LCR_SPAR; } if (C_CSTOPB(tty)) new_lcr |= UART_LCR_STOP; new_lcr |= UART_LCR_WLEN(tty_get_char_size(tty->termios.c_cflag)); baud = tty_get_baud_rate(tty); if (!baud) return; if (old_termios) old_baud = tty_termios_baud_rate(old_termios); else old_baud = F81534_DEFAULT_BAUD_RATE; dev_dbg(&port->dev, "%s: baud: %d\n", __func__, baud); status = f81534_set_port_config(port, tty, baud, old_baud, new_lcr); if (status < 0) { dev_err(&port->dev, "%s: set port config failed: %d\n", __func__, status); } } static int f81534_submit_read_urb(struct usb_serial *serial, gfp_t flags) { return usb_serial_generic_submit_read_urbs(serial->port[0], flags); } static void f81534_msr_changed(struct usb_serial_port *port, u8 msr) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); struct tty_struct *tty; unsigned long flags; u8 old_msr; if (!(msr & UART_MSR_ANY_DELTA)) return; spin_lock_irqsave(&port_priv->msr_lock, flags); old_msr = port_priv->shadow_msr; port_priv->shadow_msr = msr; spin_unlock_irqrestore(&port_priv->msr_lock, flags); dev_dbg(&port->dev, "%s: MSR from %02x to %02x\n", __func__, old_msr, msr); /* Update input line counters */ if (msr & UART_MSR_DCTS) port->icount.cts++; if (msr & UART_MSR_DDSR) port->icount.dsr++; if (msr & UART_MSR_DDCD) port->icount.dcd++; if (msr & UART_MSR_TERI) port->icount.rng++; wake_up_interruptible(&port->port.delta_msr_wait); if (!(msr & UART_MSR_DDCD)) return; dev_dbg(&port->dev, "%s: DCD Changed: phy_num: %d from %x to %x\n", __func__, port_priv->phy_num, old_msr, msr); tty = tty_port_tty_get(&port->port); if (!tty) return; usb_serial_handle_dcd_change(port, tty, msr & UART_MSR_DCD); tty_kref_put(tty); } static int f81534_read_msr(struct usb_serial_port *port) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); unsigned long flags; int status; u8 msr; /* Get MSR initial value */ status = f81534_get_port_register(port, F81534_MODEM_STATUS_REG, &msr); if (status) return status; /* Force update current state */ spin_lock_irqsave(&port_priv->msr_lock, flags); port_priv->shadow_msr = msr; spin_unlock_irqrestore(&port_priv->msr_lock, flags); return 0; } static int f81534_open(struct tty_struct *tty, struct usb_serial_port *port) { struct f81534_serial_private *serial_priv = usb_get_serial_data(port->serial); struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int status; status = f81534_set_port_register(port, F81534_FIFO_CONTROL_REG, UART_FCR_ENABLE_FIFO | UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT); if (status) { dev_err(&port->dev, "%s: Clear FIFO failed: %d\n", __func__, status); return status; } if (tty) f81534_set_termios(tty, port, NULL); status = f81534_read_msr(port); if (status) return status; mutex_lock(&serial_priv->urb_mutex); /* Submit Read URBs for first port opened */ if (!serial_priv->opened_port) { status = f81534_submit_read_urb(port->serial, GFP_KERNEL); if (status) goto exit; } serial_priv->opened_port++; exit: mutex_unlock(&serial_priv->urb_mutex); set_bit(F81534_TX_EMPTY_BIT, &port_priv->tx_empty); return status; } static void f81534_close(struct usb_serial_port *port) { struct f81534_serial_private *serial_priv = usb_get_serial_data(port->serial); struct usb_serial_port *port0 = port->serial->port[0]; unsigned long flags; size_t i; usb_kill_urb(port->write_urbs[0]); spin_lock_irqsave(&port->lock, flags); kfifo_reset_out(&port->write_fifo); spin_unlock_irqrestore(&port->lock, flags); /* Kill Read URBs when final port closed */ mutex_lock(&serial_priv->urb_mutex); serial_priv->opened_port--; if (!serial_priv->opened_port) { for (i = 0; i < ARRAY_SIZE(port0->read_urbs); ++i) usb_kill_urb(port0->read_urbs[i]); } mutex_unlock(&serial_priv->urb_mutex); } static void f81534_get_serial_info(struct tty_struct *tty, struct serial_struct *ss) { struct usb_serial_port *port = tty->driver_data; struct f81534_port_private *port_priv; port_priv = usb_get_serial_port_data(port); ss->baud_base = port_priv->baud_base; } static void f81534_process_per_serial_block(struct usb_serial_port *port, u8 *data) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int phy_num = data[0]; size_t read_size = 0; size_t i; char tty_flag; int status; u8 lsr; /* * The block layout is 128 Bytes * index 0: port phy idx (e.g., 0,1,2,3), * index 1: It's could be * F81534_TOKEN_RECEIVE * F81534_TOKEN_TX_EMPTY * F81534_TOKEN_MSR_CHANGE * index 2: serial in size (data+lsr, must be even) * meaningful for F81534_TOKEN_RECEIVE only * index 3: current MSR with this device * index 4~127: serial in data block (data+lsr, must be even) */ switch (data[1]) { case F81534_TOKEN_TX_EMPTY: set_bit(F81534_TX_EMPTY_BIT, &port_priv->tx_empty); /* Try to submit writer */ status = f81534_submit_writer(port, GFP_ATOMIC); if (status) dev_err(&port->dev, "%s: submit failed\n", __func__); return; case F81534_TOKEN_MSR_CHANGE: f81534_msr_changed(port, data[3]); return; case F81534_TOKEN_RECEIVE: read_size = data[2]; if (read_size > F81534_MAX_RX_SIZE) { dev_err(&port->dev, "%s: phy: %d read_size: %zu larger than: %d\n", __func__, phy_num, read_size, F81534_MAX_RX_SIZE); return; } break; default: dev_warn(&port->dev, "%s: unknown token: %02x\n", __func__, data[1]); return; } for (i = 4; i < 4 + read_size; i += 2) { tty_flag = TTY_NORMAL; lsr = data[i + 1]; if (lsr & UART_LSR_BRK_ERROR_BITS) { if (lsr & UART_LSR_BI) { tty_flag = TTY_BREAK; port->icount.brk++; usb_serial_handle_break(port); } else if (lsr & UART_LSR_PE) { tty_flag = TTY_PARITY; port->icount.parity++; } else if (lsr & UART_LSR_FE) { tty_flag = TTY_FRAME; port->icount.frame++; } if (lsr & UART_LSR_OE) { port->icount.overrun++; tty_insert_flip_char(&port->port, 0, TTY_OVERRUN); } schedule_work(&port_priv->lsr_work); } if (port->sysrq) { if (usb_serial_handle_sysrq_char(port, data[i])) continue; } tty_insert_flip_char(&port->port, data[i], tty_flag); } tty_flip_buffer_push(&port->port); } static void f81534_process_read_urb(struct urb *urb) { struct f81534_serial_private *serial_priv; struct usb_serial_port *port; struct usb_serial *serial; u8 *buf; int phy_port_num; int tty_port_num; size_t i; if (!urb->actual_length || urb->actual_length % F81534_RECEIVE_BLOCK_SIZE) { return; } port = urb->context; serial = port->serial; buf = urb->transfer_buffer; serial_priv = usb_get_serial_data(serial); for (i = 0; i < urb->actual_length; i += F81534_RECEIVE_BLOCK_SIZE) { phy_port_num = buf[i]; if (phy_port_num >= F81534_NUM_PORT) { dev_err(&port->dev, "%s: phy_port_num: %d larger than: %d\n", __func__, phy_port_num, F81534_NUM_PORT); continue; } tty_port_num = serial_priv->tty_idx[phy_port_num]; port = serial->port[tty_port_num]; if (tty_port_initialized(&port->port)) f81534_process_per_serial_block(port, &buf[i]); } } static void f81534_write_usb_callback(struct urb *urb) { struct usb_serial_port *port = urb->context; switch (urb->status) { case 0: break; case -ENOENT: case -ECONNRESET: case -ESHUTDOWN: dev_dbg(&port->dev, "%s - urb stopped: %d\n", __func__, urb->status); return; case -EPIPE: dev_err(&port->dev, "%s - urb stopped: %d\n", __func__, urb->status); return; default: dev_dbg(&port->dev, "%s - nonzero urb status: %d\n", __func__, urb->status); break; } } static void f81534_lsr_worker(struct work_struct *work) { struct f81534_port_private *port_priv; struct usb_serial_port *port; int status; u8 tmp; port_priv = container_of(work, struct f81534_port_private, lsr_work); port = port_priv->port; status = f81534_get_port_register(port, F81534_LINE_STATUS_REG, &tmp); if (status) dev_warn(&port->dev, "read LSR failed: %d\n", status); } static int f81534_set_port_output_pin(struct usb_serial_port *port) { struct f81534_serial_private *serial_priv; struct f81534_port_private *port_priv; struct usb_serial *serial; const struct f81534_port_out_pin *pins; int status; int i; u8 value; u8 idx; serial = port->serial; serial_priv = usb_get_serial_data(serial); port_priv = usb_get_serial_port_data(port); idx = F81534_CONF_INIT_GPIO_OFFSET + port_priv->phy_num; value = serial_priv->conf_data[idx]; if (value >= F81534_CONF_GPIO_SHUTDOWN) { /* * Newer IC configure will make transceiver in shutdown mode on * initial power on. We need enable it before using UARTs. */ idx = F81534_CONF_WORK_GPIO_OFFSET + port_priv->phy_num; value = serial_priv->conf_data[idx]; if (value >= F81534_CONF_GPIO_SHUTDOWN) value = F81534_CONF_GPIO_RS232; } pins = &f81534_port_out_pins[port_priv->phy_num]; for (i = 0; i < ARRAY_SIZE(pins->pin); ++i) { status = f81534_set_mask_register(serial, pins->pin[i].reg_addr, pins->pin[i].reg_mask, value & BIT(i) ? pins->pin[i].reg_mask : 0); if (status) return status; } dev_dbg(&port->dev, "Output pin (M0/M1/M2): %d\n", value); return 0; } static int f81534_port_probe(struct usb_serial_port *port) { struct f81534_serial_private *serial_priv; struct f81534_port_private *port_priv; int ret; u8 value; serial_priv = usb_get_serial_data(port->serial); port_priv = devm_kzalloc(&port->dev, sizeof(*port_priv), GFP_KERNEL); if (!port_priv) return -ENOMEM; /* * We'll make tx frame error when baud rate from 384~500kps. So we'll * delay all tx data frame with 1bit. */ port_priv->shadow_clk = F81534_UART_EN | F81534_CLK_TX_DELAY_1BIT; spin_lock_init(&port_priv->msr_lock); mutex_init(&port_priv->mcr_mutex); mutex_init(&port_priv->lcr_mutex); INIT_WORK(&port_priv->lsr_work, f81534_lsr_worker); /* Assign logic-to-phy mapping */ ret = f81534_logic_to_phy_port(port->serial, port); if (ret < 0) return ret; port_priv->phy_num = ret; port_priv->port = port; usb_set_serial_port_data(port, port_priv); dev_dbg(&port->dev, "%s: port_number: %d, phy_num: %d\n", __func__, port->port_number, port_priv->phy_num); /* * The F81532/534 will hang-up when enable LSR interrupt in IER and * occur data overrun. So we'll disable the LSR interrupt in probe() * and submit the LSR worker to clear LSR state when reported LSR error * bit with bulk-in data in f81534_process_per_serial_block(). */ ret = f81534_set_port_register(port, F81534_INTERRUPT_ENABLE_REG, UART_IER_RDI | UART_IER_THRI | UART_IER_MSI); if (ret) return ret; value = serial_priv->conf_data[port_priv->phy_num]; switch (value & F81534_PORT_CONF_MODE_MASK) { case F81534_PORT_CONF_RS485_INVERT: port_priv->shadow_clk |= F81534_CLK_RS485_MODE | F81534_CLK_RS485_INVERT; dev_dbg(&port->dev, "RS485 invert mode\n"); break; case F81534_PORT_CONF_RS485: port_priv->shadow_clk |= F81534_CLK_RS485_MODE; dev_dbg(&port->dev, "RS485 mode\n"); break; default: case F81534_PORT_CONF_RS232: dev_dbg(&port->dev, "RS232 mode\n"); break; } return f81534_set_port_output_pin(port); } static void f81534_port_remove(struct usb_serial_port *port) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); flush_work(&port_priv->lsr_work); } static int f81534_tiocmget(struct tty_struct *tty) { struct usb_serial_port *port = tty->driver_data; struct f81534_port_private *port_priv = usb_get_serial_port_data(port); int status; int r; u8 msr; u8 mcr; /* Read current MSR from device */ status = f81534_get_port_register(port, F81534_MODEM_STATUS_REG, &msr); if (status) return status; mutex_lock(&port_priv->mcr_mutex); mcr = port_priv->shadow_mcr; mutex_unlock(&port_priv->mcr_mutex); r = (mcr & UART_MCR_DTR ? TIOCM_DTR : 0) | (mcr & UART_MCR_RTS ? TIOCM_RTS : 0) | (msr & UART_MSR_CTS ? TIOCM_CTS : 0) | (msr & UART_MSR_DCD ? TIOCM_CAR : 0) | (msr & UART_MSR_RI ? TIOCM_RI : 0) | (msr & UART_MSR_DSR ? TIOCM_DSR : 0); return r; } static int f81534_tiocmset(struct tty_struct *tty, unsigned int set, unsigned int clear) { struct usb_serial_port *port = tty->driver_data; return f81534_update_mctrl(port, set, clear); } static void f81534_dtr_rts(struct usb_serial_port *port, int on) { if (on) f81534_update_mctrl(port, TIOCM_DTR | TIOCM_RTS, 0); else f81534_update_mctrl(port, 0, TIOCM_DTR | TIOCM_RTS); } static int f81534_write(struct tty_struct *tty, struct usb_serial_port *port, const u8 *buf, int count) { int bytes_out, status; if (!count) return 0; bytes_out = kfifo_in_locked(&port->write_fifo, buf, count, &port->lock); status = f81534_submit_writer(port, GFP_ATOMIC); if (status) { dev_err(&port->dev, "%s: submit failed\n", __func__); return status; } return bytes_out; } static bool f81534_tx_empty(struct usb_serial_port *port) { struct f81534_port_private *port_priv = usb_get_serial_port_data(port); return test_bit(F81534_TX_EMPTY_BIT, &port_priv->tx_empty); } static int f81534_resume(struct usb_serial *serial) { struct f81534_serial_private *serial_priv = usb_get_serial_data(serial); struct usb_serial_port *port; int error = 0; int status; size_t i; /* * We'll register port 0 bulkin when port had opened, It'll take all * port received data, MSR register change and TX_EMPTY information. */ mutex_lock(&serial_priv->urb_mutex); if (serial_priv->opened_port) { status = f81534_submit_read_urb(serial, GFP_NOIO); if (status) { mutex_unlock(&serial_priv->urb_mutex); return status; } } mutex_unlock(&serial_priv->urb_mutex); for (i = 0; i < serial->num_ports; i++) { port = serial->port[i]; if (!tty_port_initialized(&port->port)) continue; status = f81534_submit_writer(port, GFP_NOIO); if (status) { dev_err(&port->dev, "%s: submit failed\n", __func__); ++error; } } if (error) return -EIO; return 0; } static struct usb_serial_driver f81534_device = { .driver = { .name = "f81534", }, .description = DRIVER_DESC, .id_table = f81534_id_table, .num_bulk_in = 1, .num_bulk_out = 1, .open = f81534_open, .close = f81534_close, .write = f81534_write, .tx_empty = f81534_tx_empty, .calc_num_ports = f81534_calc_num_ports, .port_probe = f81534_port_probe, .port_remove = f81534_port_remove, .break_ctl = f81534_break_ctl, .dtr_rts = f81534_dtr_rts, .process_read_urb = f81534_process_read_urb, .get_serial = f81534_get_serial_info, .tiocmget = f81534_tiocmget, .tiocmset = f81534_tiocmset, .write_bulk_callback = f81534_write_usb_callback, .set_termios = f81534_set_termios, .resume = f81534_resume, }; static struct usb_serial_driver *const serial_drivers[] = { &f81534_device, NULL }; module_usb_serial_driver(serial_drivers, f81534_id_table); MODULE_DEVICE_TABLE(usb, f81534_id_table); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_AUTHOR("Peter Hong <Peter_Hong@fintek.com.tw>"); MODULE_AUTHOR("Tom Tsai <Tom_Tsai@fintek.com.tw>"); MODULE_LICENSE("GPL"); |
1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 | // SPDX-License-Identifier: GPL-2.0-only /* Industrial I/O event handling * * Copyright (c) 2008 Jonathan Cameron * * Based on elements of hwmon and input subsystems. */ #include <linux/anon_inodes.h> #include <linux/device.h> #include <linux/fs.h> #include <linux/kernel.h> #include <linux/kfifo.h> #include <linux/module.h> #include <linux/poll.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/wait.h> #include <linux/iio/iio.h> #include <linux/iio/iio-opaque.h> #include "iio_core.h" #include <linux/iio/sysfs.h> #include <linux/iio/events.h> /** * struct iio_event_interface - chrdev interface for an event line * @wait: wait queue to allow blocking reads of events * @det_events: list of detected events * @dev_attr_list: list of event interface sysfs attribute * @flags: file operations related flags including busy flag. * @group: event interface sysfs attribute group * @read_lock: lock to protect kfifo read operations * @ioctl_handler: handler for event ioctl() calls */ struct iio_event_interface { wait_queue_head_t wait; DECLARE_KFIFO(det_events, struct iio_event_data, 16); struct list_head dev_attr_list; unsigned long flags; struct attribute_group group; struct mutex read_lock; struct iio_ioctl_handler ioctl_handler; }; bool iio_event_enabled(const struct iio_event_interface *ev_int) { return !!test_bit(IIO_BUSY_BIT_POS, &ev_int->flags); } /** * iio_push_event() - try to add event to the list for userspace reading * @indio_dev: IIO device structure * @ev_code: What event * @timestamp: When the event occurred * * Note: The caller must make sure that this function is not running * concurrently for the same indio_dev more than once. * * This function may be safely used as soon as a valid reference to iio_dev has * been obtained via iio_device_alloc(), but any events that are submitted * before iio_device_register() has successfully completed will be silently * discarded. **/ int iio_push_event(struct iio_dev *indio_dev, u64 ev_code, s64 timestamp) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; struct iio_event_data ev; int copied; if (!ev_int) return 0; /* Does anyone care? */ if (iio_event_enabled(ev_int)) { ev.id = ev_code; ev.timestamp = timestamp; copied = kfifo_put(&ev_int->det_events, ev); if (copied != 0) wake_up_poll(&ev_int->wait, EPOLLIN); } return 0; } EXPORT_SYMBOL(iio_push_event); /** * iio_event_poll() - poll the event queue to find out if it has data * @filep: File structure pointer to identify the device * @wait: Poll table pointer to add the wait queue on * * Return: (EPOLLIN | EPOLLRDNORM) if data is available for reading * or a negative error code on failure */ static __poll_t iio_event_poll(struct file *filep, struct poll_table_struct *wait) { struct iio_dev *indio_dev = filep->private_data; struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; __poll_t events = 0; if (!indio_dev->info) return events; poll_wait(filep, &ev_int->wait, wait); if (!kfifo_is_empty(&ev_int->det_events)) events = EPOLLIN | EPOLLRDNORM; return events; } static ssize_t iio_event_chrdev_read(struct file *filep, char __user *buf, size_t count, loff_t *f_ps) { struct iio_dev *indio_dev = filep->private_data; struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; unsigned int copied; int ret; if (!indio_dev->info) return -ENODEV; if (count < sizeof(struct iio_event_data)) return -EINVAL; do { if (kfifo_is_empty(&ev_int->det_events)) { if (filep->f_flags & O_NONBLOCK) return -EAGAIN; ret = wait_event_interruptible(ev_int->wait, !kfifo_is_empty(&ev_int->det_events) || indio_dev->info == NULL); if (ret) return ret; if (indio_dev->info == NULL) return -ENODEV; } if (mutex_lock_interruptible(&ev_int->read_lock)) return -ERESTARTSYS; ret = kfifo_to_user(&ev_int->det_events, buf, count, &copied); mutex_unlock(&ev_int->read_lock); if (ret) return ret; /* * If we couldn't read anything from the fifo (a different * thread might have been faster) we either return -EAGAIN if * the file descriptor is non-blocking, otherwise we go back to * sleep and wait for more data to arrive. */ if (copied == 0 && (filep->f_flags & O_NONBLOCK)) return -EAGAIN; } while (copied == 0); return copied; } static int iio_event_chrdev_release(struct inode *inode, struct file *filep) { struct iio_dev *indio_dev = filep->private_data; struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; clear_bit(IIO_BUSY_BIT_POS, &ev_int->flags); iio_device_put(indio_dev); return 0; } static const struct file_operations iio_event_chrdev_fileops = { .read = iio_event_chrdev_read, .poll = iio_event_poll, .release = iio_event_chrdev_release, .owner = THIS_MODULE, .llseek = noop_llseek, }; static int iio_event_getfd(struct iio_dev *indio_dev) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; int fd; if (ev_int == NULL) return -ENODEV; fd = mutex_lock_interruptible(&iio_dev_opaque->mlock); if (fd) return fd; if (test_and_set_bit(IIO_BUSY_BIT_POS, &ev_int->flags)) { fd = -EBUSY; goto unlock; } iio_device_get(indio_dev); fd = anon_inode_getfd("iio:event", &iio_event_chrdev_fileops, indio_dev, O_RDONLY | O_CLOEXEC); if (fd < 0) { clear_bit(IIO_BUSY_BIT_POS, &ev_int->flags); iio_device_put(indio_dev); } else { kfifo_reset_out(&ev_int->det_events); } unlock: mutex_unlock(&iio_dev_opaque->mlock); return fd; } static const char * const iio_ev_type_text[] = { [IIO_EV_TYPE_THRESH] = "thresh", [IIO_EV_TYPE_MAG] = "mag", [IIO_EV_TYPE_ROC] = "roc", [IIO_EV_TYPE_THRESH_ADAPTIVE] = "thresh_adaptive", [IIO_EV_TYPE_MAG_ADAPTIVE] = "mag_adaptive", [IIO_EV_TYPE_CHANGE] = "change", [IIO_EV_TYPE_MAG_REFERENCED] = "mag_referenced", [IIO_EV_TYPE_GESTURE] = "gesture", }; static const char * const iio_ev_dir_text[] = { [IIO_EV_DIR_EITHER] = "either", [IIO_EV_DIR_RISING] = "rising", [IIO_EV_DIR_FALLING] = "falling", [IIO_EV_DIR_SINGLETAP] = "singletap", [IIO_EV_DIR_DOUBLETAP] = "doubletap", }; static const char * const iio_ev_info_text[] = { [IIO_EV_INFO_ENABLE] = "en", [IIO_EV_INFO_VALUE] = "value", [IIO_EV_INFO_HYSTERESIS] = "hysteresis", [IIO_EV_INFO_PERIOD] = "period", [IIO_EV_INFO_HIGH_PASS_FILTER_3DB] = "high_pass_filter_3db", [IIO_EV_INFO_LOW_PASS_FILTER_3DB] = "low_pass_filter_3db", [IIO_EV_INFO_TIMEOUT] = "timeout", [IIO_EV_INFO_RESET_TIMEOUT] = "reset_timeout", [IIO_EV_INFO_TAP2_MIN_DELAY] = "tap2_min_delay", [IIO_EV_INFO_RUNNING_PERIOD] = "runningperiod", [IIO_EV_INFO_RUNNING_COUNT] = "runningcount", }; static enum iio_event_direction iio_ev_attr_dir(struct iio_dev_attr *attr) { return attr->c->event_spec[attr->address & 0xffff].dir; } static enum iio_event_type iio_ev_attr_type(struct iio_dev_attr *attr) { return attr->c->event_spec[attr->address & 0xffff].type; } static enum iio_event_info iio_ev_attr_info(struct iio_dev_attr *attr) { return (attr->address >> 16) & 0xffff; } static ssize_t iio_ev_state_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct iio_dev_attr *this_attr = to_iio_dev_attr(attr); int ret; bool val; ret = kstrtobool(buf, &val); if (ret < 0) return ret; if (!indio_dev->info->write_event_config) return -EINVAL; ret = indio_dev->info->write_event_config(indio_dev, this_attr->c, iio_ev_attr_type(this_attr), iio_ev_attr_dir(this_attr), val); return (ret < 0) ? ret : len; } static ssize_t iio_ev_state_show(struct device *dev, struct device_attribute *attr, char *buf) { struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct iio_dev_attr *this_attr = to_iio_dev_attr(attr); int val; if (!indio_dev->info->read_event_config) return -EINVAL; val = indio_dev->info->read_event_config(indio_dev, this_attr->c, iio_ev_attr_type(this_attr), iio_ev_attr_dir(this_attr)); if (val < 0) return val; else return sysfs_emit(buf, "%d\n", val); } static ssize_t iio_ev_value_show(struct device *dev, struct device_attribute *attr, char *buf) { struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct iio_dev_attr *this_attr = to_iio_dev_attr(attr); int val, val2, val_arr[2]; int ret; if (!indio_dev->info->read_event_value) return -EINVAL; ret = indio_dev->info->read_event_value(indio_dev, this_attr->c, iio_ev_attr_type(this_attr), iio_ev_attr_dir(this_attr), iio_ev_attr_info(this_attr), &val, &val2); if (ret < 0) return ret; val_arr[0] = val; val_arr[1] = val2; return iio_format_value(buf, ret, 2, val_arr); } static ssize_t iio_ev_value_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct iio_dev_attr *this_attr = to_iio_dev_attr(attr); int val, val2; int ret; if (!indio_dev->info->write_event_value) return -EINVAL; ret = iio_str_to_fixpoint(buf, 100000, &val, &val2); if (ret) return ret; ret = indio_dev->info->write_event_value(indio_dev, this_attr->c, iio_ev_attr_type(this_attr), iio_ev_attr_dir(this_attr), iio_ev_attr_info(this_attr), val, val2); if (ret < 0) return ret; return len; } static ssize_t iio_ev_label_show(struct device *dev, struct device_attribute *attr, char *buf) { struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct iio_dev_attr *this_attr = to_iio_dev_attr(attr); if (indio_dev->info->read_event_label) return indio_dev->info->read_event_label(indio_dev, this_attr->c, iio_ev_attr_type(this_attr), iio_ev_attr_dir(this_attr), buf); return -EINVAL; } static int iio_device_add_event(struct iio_dev *indio_dev, const struct iio_chan_spec *chan, unsigned int spec_index, enum iio_event_type type, enum iio_event_direction dir, enum iio_shared_by shared_by, const unsigned long *mask) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); ssize_t (*store)(struct device *dev, struct device_attribute *attr, const char *buf, size_t len); unsigned int attrcount = 0; unsigned int i; char *postfix; int ret; for_each_set_bit(i, mask, sizeof(*mask)*8) { if (i >= ARRAY_SIZE(iio_ev_info_text)) return -EINVAL; if (dir != IIO_EV_DIR_NONE) postfix = kasprintf(GFP_KERNEL, "%s_%s_%s", iio_ev_type_text[type], iio_ev_dir_text[dir], iio_ev_info_text[i]); else postfix = kasprintf(GFP_KERNEL, "%s_%s", iio_ev_type_text[type], iio_ev_info_text[i]); if (postfix == NULL) return -ENOMEM; if (i == IIO_EV_INFO_ENABLE) { show = iio_ev_state_show; store = iio_ev_state_store; } else { show = iio_ev_value_show; store = iio_ev_value_store; } ret = __iio_add_chan_devattr(postfix, chan, show, store, (i << 16) | spec_index, shared_by, &indio_dev->dev, NULL, &iio_dev_opaque->event_interface->dev_attr_list); kfree(postfix); if ((ret == -EBUSY) && (shared_by != IIO_SEPARATE)) continue; if (ret) return ret; attrcount++; } return attrcount; } static int iio_device_add_event_label(struct iio_dev *indio_dev, const struct iio_chan_spec *chan, unsigned int spec_index, enum iio_event_type type, enum iio_event_direction dir) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); char *postfix; int ret; if (!indio_dev->info->read_event_label) return 0; if (dir != IIO_EV_DIR_NONE) postfix = kasprintf(GFP_KERNEL, "%s_%s_label", iio_ev_type_text[type], iio_ev_dir_text[dir]); else postfix = kasprintf(GFP_KERNEL, "%s_label", iio_ev_type_text[type]); if (postfix == NULL) return -ENOMEM; ret = __iio_add_chan_devattr(postfix, chan, &iio_ev_label_show, NULL, spec_index, IIO_SEPARATE, &indio_dev->dev, NULL, &iio_dev_opaque->event_interface->dev_attr_list); kfree(postfix); if (ret < 0) return ret; return 1; } static int iio_device_add_event_sysfs(struct iio_dev *indio_dev, struct iio_chan_spec const *chan) { int ret = 0, i, attrcount = 0; enum iio_event_direction dir; enum iio_event_type type; for (i = 0; i < chan->num_event_specs; i++) { type = chan->event_spec[i].type; dir = chan->event_spec[i].dir; ret = iio_device_add_event(indio_dev, chan, i, type, dir, IIO_SEPARATE, &chan->event_spec[i].mask_separate); if (ret < 0) return ret; attrcount += ret; ret = iio_device_add_event(indio_dev, chan, i, type, dir, IIO_SHARED_BY_TYPE, &chan->event_spec[i].mask_shared_by_type); if (ret < 0) return ret; attrcount += ret; ret = iio_device_add_event(indio_dev, chan, i, type, dir, IIO_SHARED_BY_DIR, &chan->event_spec[i].mask_shared_by_dir); if (ret < 0) return ret; attrcount += ret; ret = iio_device_add_event(indio_dev, chan, i, type, dir, IIO_SHARED_BY_ALL, &chan->event_spec[i].mask_shared_by_all); if (ret < 0) return ret; attrcount += ret; ret = iio_device_add_event_label(indio_dev, chan, i, type, dir); if (ret < 0) return ret; attrcount += ret; } ret = attrcount; return ret; } static inline int __iio_add_event_config_attrs(struct iio_dev *indio_dev) { int j, ret, attrcount = 0; /* Dynamically created from the channels array */ for (j = 0; j < indio_dev->num_channels; j++) { ret = iio_device_add_event_sysfs(indio_dev, &indio_dev->channels[j]); if (ret < 0) return ret; attrcount += ret; } return attrcount; } static bool iio_check_for_dynamic_events(struct iio_dev *indio_dev) { int j; for (j = 0; j < indio_dev->num_channels; j++) { if (indio_dev->channels[j].num_event_specs != 0) return true; } return false; } static void iio_setup_ev_int(struct iio_event_interface *ev_int) { INIT_KFIFO(ev_int->det_events); init_waitqueue_head(&ev_int->wait); mutex_init(&ev_int->read_lock); } static long iio_event_ioctl(struct iio_dev *indio_dev, struct file *filp, unsigned int cmd, unsigned long arg) { int __user *ip = (int __user *)arg; int fd; if (cmd == IIO_GET_EVENT_FD_IOCTL) { fd = iio_event_getfd(indio_dev); if (fd < 0) return fd; if (copy_to_user(ip, &fd, sizeof(fd))) return -EFAULT; return 0; } return IIO_IOCTL_UNHANDLED; } static const char *iio_event_group_name = "events"; int iio_device_register_eventset(struct iio_dev *indio_dev) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int; struct iio_dev_attr *p; int ret = 0, attrcount_orig = 0, attrcount, attrn; struct attribute **attr; if (!(indio_dev->info->event_attrs || iio_check_for_dynamic_events(indio_dev))) return 0; ev_int = kzalloc(sizeof(*ev_int), GFP_KERNEL); if (!ev_int) return -ENOMEM; iio_dev_opaque->event_interface = ev_int; INIT_LIST_HEAD(&ev_int->dev_attr_list); iio_setup_ev_int(ev_int); if (indio_dev->info->event_attrs != NULL) { attr = indio_dev->info->event_attrs->attrs; while (*attr++ != NULL) attrcount_orig++; } attrcount = attrcount_orig; if (indio_dev->channels) { ret = __iio_add_event_config_attrs(indio_dev); if (ret < 0) goto error_free_setup_event_lines; attrcount += ret; } ev_int->group.name = iio_event_group_name; ev_int->group.attrs = kcalloc(attrcount + 1, sizeof(ev_int->group.attrs[0]), GFP_KERNEL); if (ev_int->group.attrs == NULL) { ret = -ENOMEM; goto error_free_setup_event_lines; } if (indio_dev->info->event_attrs) memcpy(ev_int->group.attrs, indio_dev->info->event_attrs->attrs, sizeof(ev_int->group.attrs[0]) * attrcount_orig); attrn = attrcount_orig; /* Add all elements from the list. */ list_for_each_entry(p, &ev_int->dev_attr_list, l) ev_int->group.attrs[attrn++] = &p->dev_attr.attr; ret = iio_device_register_sysfs_group(indio_dev, &ev_int->group); if (ret) goto error_free_group_attrs; ev_int->ioctl_handler.ioctl = iio_event_ioctl; iio_device_ioctl_handler_register(&iio_dev_opaque->indio_dev, &ev_int->ioctl_handler); return 0; error_free_group_attrs: kfree(ev_int->group.attrs); error_free_setup_event_lines: iio_free_chan_devattr_list(&ev_int->dev_attr_list); kfree(ev_int); iio_dev_opaque->event_interface = NULL; return ret; } /** * iio_device_wakeup_eventset - Wakes up the event waitqueue * @indio_dev: The IIO device * * Wakes up the event waitqueue used for poll() and blocking read(). * Should usually be called when the device is unregistered. */ void iio_device_wakeup_eventset(struct iio_dev *indio_dev) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); if (iio_dev_opaque->event_interface == NULL) return; wake_up(&iio_dev_opaque->event_interface->wait); } void iio_device_unregister_eventset(struct iio_dev *indio_dev) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); struct iio_event_interface *ev_int = iio_dev_opaque->event_interface; if (ev_int == NULL) return; iio_device_ioctl_handler_unregister(&ev_int->ioctl_handler); iio_free_chan_devattr_list(&ev_int->dev_attr_list); kfree(ev_int->group.attrs); kfree(ev_int); iio_dev_opaque->event_interface = NULL; } |
3 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _RDS_RDS_H #define _RDS_RDS_H #include <net/sock.h> #include <linux/scatterlist.h> #include <linux/highmem.h> #include <rdma/rdma_cm.h> #include <linux/mutex.h> #include <linux/rds.h> #include <linux/rhashtable.h> #include <linux/refcount.h> #include <linux/in6.h> #include "info.h" /* * RDS Network protocol version */ #define RDS_PROTOCOL_3_0 0x0300 #define RDS_PROTOCOL_3_1 0x0301 #define RDS_PROTOCOL_4_0 0x0400 #define RDS_PROTOCOL_4_1 0x0401 #define RDS_PROTOCOL_VERSION RDS_PROTOCOL_3_1 #define RDS_PROTOCOL_MAJOR(v) ((v) >> 8) #define RDS_PROTOCOL_MINOR(v) ((v) & 255) #define RDS_PROTOCOL(maj, min) (((maj) << 8) | min) #define RDS_PROTOCOL_COMPAT_VERSION RDS_PROTOCOL_3_1 /* The following ports, 16385, 18634, 18635, are registered with IANA as * the ports to be used for RDS over TCP and UDP. Currently, only RDS over * TCP and RDS over IB/RDMA are implemented. 18634 is the historical value * used for the RDMA_CM listener port. RDS/TCP uses port 16385. After * IPv6 work, RDMA_CM also uses 16385 as the listener port. 18634 is kept * to ensure compatibility with older RDS modules. Those ports are defined * in each transport's header file. */ #define RDS_PORT 18634 #ifdef ATOMIC64_INIT #define KERNEL_HAS_ATOMIC64 #endif #ifdef RDS_DEBUG #define rdsdebug(fmt, args...) pr_debug("%s(): " fmt, __func__ , ##args) #else /* sigh, pr_debug() causes unused variable warnings */ static inline __printf(1, 2) void rdsdebug(char *fmt, ...) { } #endif #define RDS_FRAG_SHIFT 12 #define RDS_FRAG_SIZE ((unsigned int)(1 << RDS_FRAG_SHIFT)) /* Used to limit both RDMA and non-RDMA RDS message to 1MB */ #define RDS_MAX_MSG_SIZE ((unsigned int)(1 << 20)) #define RDS_CONG_MAP_BYTES (65536 / 8) #define RDS_CONG_MAP_PAGES (PAGE_ALIGN(RDS_CONG_MAP_BYTES) / PAGE_SIZE) #define RDS_CONG_MAP_PAGE_BITS (PAGE_SIZE * 8) struct rds_cong_map { struct rb_node m_rb_node; struct in6_addr m_addr; wait_queue_head_t m_waitq; struct list_head m_conn_list; unsigned long m_page_addrs[RDS_CONG_MAP_PAGES]; }; /* * This is how we will track the connection state: * A connection is always in one of the following * states. Updates to the state are atomic and imply * a memory barrier. */ enum { RDS_CONN_DOWN = 0, RDS_CONN_CONNECTING, RDS_CONN_DISCONNECTING, RDS_CONN_UP, RDS_CONN_RESETTING, RDS_CONN_ERROR, }; /* Bits for c_flags */ #define RDS_LL_SEND_FULL 0 #define RDS_RECONNECT_PENDING 1 #define RDS_IN_XMIT 2 #define RDS_RECV_REFILL 3 #define RDS_DESTROY_PENDING 4 /* Max number of multipaths per RDS connection. Must be a power of 2 */ #define RDS_MPATH_WORKERS 8 #define RDS_MPATH_HASH(rs, n) (jhash_1word((rs)->rs_bound_port, \ (rs)->rs_hash_initval) & ((n) - 1)) #define IS_CANONICAL(laddr, faddr) (htonl(laddr) < htonl(faddr)) /* Per mpath connection state */ struct rds_conn_path { struct rds_connection *cp_conn; struct rds_message *cp_xmit_rm; unsigned long cp_xmit_sg; unsigned int cp_xmit_hdr_off; unsigned int cp_xmit_data_off; unsigned int cp_xmit_atomic_sent; unsigned int cp_xmit_rdma_sent; unsigned int cp_xmit_data_sent; spinlock_t cp_lock; /* protect msg queues */ u64 cp_next_tx_seq; struct list_head cp_send_queue; struct list_head cp_retrans; u64 cp_next_rx_seq; void *cp_transport_data; atomic_t cp_state; unsigned long cp_send_gen; unsigned long cp_flags; unsigned long cp_reconnect_jiffies; struct delayed_work cp_send_w; struct delayed_work cp_recv_w; struct delayed_work cp_conn_w; struct work_struct cp_down_w; struct mutex cp_cm_lock; /* protect cp_state & cm */ wait_queue_head_t cp_waitq; unsigned int cp_unacked_packets; unsigned int cp_unacked_bytes; unsigned int cp_index; }; /* One rds_connection per RDS address pair */ struct rds_connection { struct hlist_node c_hash_node; struct in6_addr c_laddr; struct in6_addr c_faddr; int c_dev_if; /* ifindex used for this conn */ int c_bound_if; /* ifindex of c_laddr */ unsigned int c_loopback:1, c_isv6:1, c_ping_triggered:1, c_pad_to_32:29; int c_npaths; struct rds_connection *c_passive; struct rds_transport *c_trans; struct rds_cong_map *c_lcong; struct rds_cong_map *c_fcong; /* Protocol version */ unsigned int c_proposed_version; unsigned int c_version; possible_net_t c_net; /* TOS */ u8 c_tos; struct list_head c_map_item; unsigned long c_map_queued; struct rds_conn_path *c_path; wait_queue_head_t c_hs_waitq; /* handshake waitq */ u32 c_my_gen_num; u32 c_peer_gen_num; }; static inline struct net *rds_conn_net(struct rds_connection *conn) { return read_pnet(&conn->c_net); } static inline void rds_conn_net_set(struct rds_connection *conn, struct net *net) { write_pnet(&conn->c_net, net); } #define RDS_FLAG_CONG_BITMAP 0x01 #define RDS_FLAG_ACK_REQUIRED 0x02 #define RDS_FLAG_RETRANSMITTED 0x04 #define RDS_MAX_ADV_CREDIT 255 /* RDS_FLAG_PROBE_PORT is the reserved sport used for sending a ping * probe to exchange control information before establishing a connection. * Currently the control information that is exchanged is the number of * supported paths. If the peer is a legacy (older kernel revision) peer, * it would return a pong message without additional control information * that would then alert the sender that the peer was an older rev. */ #define RDS_FLAG_PROBE_PORT 1 #define RDS_HS_PROBE(sport, dport) \ ((sport == RDS_FLAG_PROBE_PORT && dport == 0) || \ (sport == 0 && dport == RDS_FLAG_PROBE_PORT)) /* * Maximum space available for extension headers. */ #define RDS_HEADER_EXT_SPACE 16 struct rds_header { __be64 h_sequence; __be64 h_ack; __be32 h_len; __be16 h_sport; __be16 h_dport; u8 h_flags; u8 h_credit; u8 h_padding[4]; __sum16 h_csum; u8 h_exthdr[RDS_HEADER_EXT_SPACE]; }; /* * Reserved - indicates end of extensions */ #define RDS_EXTHDR_NONE 0 /* * This extension header is included in the very * first message that is sent on a new connection, * and identifies the protocol level. This will help * rolling updates if a future change requires breaking * the protocol. * NB: This is no longer true for IB, where we do a version * negotiation during the connection setup phase (protocol * version information is included in the RDMA CM private data). */ #define RDS_EXTHDR_VERSION 1 struct rds_ext_header_version { __be32 h_version; }; /* * This extension header is included in the RDS message * chasing an RDMA operation. */ #define RDS_EXTHDR_RDMA 2 struct rds_ext_header_rdma { __be32 h_rdma_rkey; }; /* * This extension header tells the peer about the * destination <R_Key,offset> of the requested RDMA * operation. */ #define RDS_EXTHDR_RDMA_DEST 3 struct rds_ext_header_rdma_dest { __be32 h_rdma_rkey; __be32 h_rdma_offset; }; /* Extension header announcing number of paths. * Implicit length = 2 bytes. */ #define RDS_EXTHDR_NPATHS 5 #define RDS_EXTHDR_GEN_NUM 6 #define __RDS_EXTHDR_MAX 16 /* for now */ #define RDS_RX_MAX_TRACES (RDS_MSG_RX_DGRAM_TRACE_MAX + 1) #define RDS_MSG_RX_HDR 0 #define RDS_MSG_RX_START 1 #define RDS_MSG_RX_END 2 #define RDS_MSG_RX_CMSG 3 /* The following values are whitelisted for usercopy */ struct rds_inc_usercopy { rds_rdma_cookie_t rdma_cookie; ktime_t rx_tstamp; }; struct rds_incoming { refcount_t i_refcount; struct list_head i_item; struct rds_connection *i_conn; struct rds_conn_path *i_conn_path; struct rds_header i_hdr; unsigned long i_rx_jiffies; struct in6_addr i_saddr; struct rds_inc_usercopy i_usercopy; u64 i_rx_lat_trace[RDS_RX_MAX_TRACES]; }; struct rds_mr { struct rb_node r_rb_node; struct kref r_kref; u32 r_key; /* A copy of the creation flags */ unsigned int r_use_once:1; unsigned int r_invalidate:1; unsigned int r_write:1; struct rds_sock *r_sock; /* back pointer to the socket that owns us */ struct rds_transport *r_trans; void *r_trans_private; }; static inline rds_rdma_cookie_t rds_rdma_make_cookie(u32 r_key, u32 offset) { return r_key | (((u64) offset) << 32); } static inline u32 rds_rdma_cookie_key(rds_rdma_cookie_t cookie) { return cookie; } static inline u32 rds_rdma_cookie_offset(rds_rdma_cookie_t cookie) { return cookie >> 32; } /* atomic operation types */ #define RDS_ATOMIC_TYPE_CSWP 0 #define RDS_ATOMIC_TYPE_FADD 1 /* * m_sock_item and m_conn_item are on lists that are serialized under * conn->c_lock. m_sock_item has additional meaning in that once it is empty * the message will not be put back on the retransmit list after being sent. * messages that are canceled while being sent rely on this. * * m_inc is used by loopback so that it can pass an incoming message straight * back up into the rx path. It embeds a wire header which is also used by * the send path, which is kind of awkward. * * m_sock_item indicates the message's presence on a socket's send or receive * queue. m_rs will point to that socket. * * m_daddr is used by cancellation to prune messages to a given destination. * * The RDS_MSG_ON_SOCK and RDS_MSG_ON_CONN flags are used to avoid lock * nesting. As paths iterate over messages on a sock, or conn, they must * also lock the conn, or sock, to remove the message from those lists too. * Testing the flag to determine if the message is still on the lists lets * us avoid testing the list_head directly. That means each path can use * the message's list_head to keep it on a local list while juggling locks * without confusing the other path. * * m_ack_seq is an optional field set by transports who need a different * sequence number range to invalidate. They can use this in a callback * that they pass to rds_send_drop_acked() to see if each message has been * acked. The HAS_ACK_SEQ flag can be used to detect messages which haven't * had ack_seq set yet. */ #define RDS_MSG_ON_SOCK 1 #define RDS_MSG_ON_CONN 2 #define RDS_MSG_HAS_ACK_SEQ 3 #define RDS_MSG_ACK_REQUIRED 4 #define RDS_MSG_RETRANSMITTED 5 #define RDS_MSG_MAPPED 6 #define RDS_MSG_PAGEVEC 7 #define RDS_MSG_FLUSH 8 struct rds_znotifier { struct mmpin z_mmp; u32 z_cookie; }; struct rds_msg_zcopy_info { struct list_head rs_zcookie_next; union { struct rds_znotifier znotif; struct rds_zcopy_cookies zcookies; }; }; struct rds_msg_zcopy_queue { struct list_head zcookie_head; spinlock_t lock; /* protects zcookie_head queue */ }; static inline void rds_message_zcopy_queue_init(struct rds_msg_zcopy_queue *q) { spin_lock_init(&q->lock); INIT_LIST_HEAD(&q->zcookie_head); } struct rds_iov_vector { struct rds_iovec *iov; int len; }; struct rds_iov_vector_arr { struct rds_iov_vector *vec; int len; int indx; int incr; }; struct rds_message { refcount_t m_refcount; struct list_head m_sock_item; struct list_head m_conn_item; struct rds_incoming m_inc; u64 m_ack_seq; struct in6_addr m_daddr; unsigned long m_flags; /* Never access m_rs without holding m_rs_lock. * Lock nesting is * rm->m_rs_lock * -> rs->rs_lock */ spinlock_t m_rs_lock; wait_queue_head_t m_flush_wait; struct rds_sock *m_rs; /* cookie to send to remote, in rds header */ rds_rdma_cookie_t m_rdma_cookie; unsigned int m_used_sgs; unsigned int m_total_sgs; void *m_final_op; struct { struct rm_atomic_op { int op_type; union { struct { uint64_t compare; uint64_t swap; uint64_t compare_mask; uint64_t swap_mask; } op_m_cswp; struct { uint64_t add; uint64_t nocarry_mask; } op_m_fadd; }; u32 op_rkey; u64 op_remote_addr; unsigned int op_notify:1; unsigned int op_recverr:1; unsigned int op_mapped:1; unsigned int op_silent:1; unsigned int op_active:1; struct scatterlist *op_sg; struct rds_notifier *op_notifier; struct rds_mr *op_rdma_mr; } atomic; struct rm_rdma_op { u32 op_rkey; u64 op_remote_addr; unsigned int op_write:1; unsigned int op_fence:1; unsigned int op_notify:1; unsigned int op_recverr:1; unsigned int op_mapped:1; unsigned int op_silent:1; unsigned int op_active:1; unsigned int op_bytes; unsigned int op_nents; unsigned int op_count; struct scatterlist *op_sg; struct rds_notifier *op_notifier; struct rds_mr *op_rdma_mr; u64 op_odp_addr; struct rds_mr *op_odp_mr; } rdma; struct rm_data_op { unsigned int op_active:1; unsigned int op_nents; unsigned int op_count; unsigned int op_dmasg; unsigned int op_dmaoff; struct rds_znotifier *op_mmp_znotifier; struct scatterlist *op_sg; } data; }; struct rds_conn_path *m_conn_path; }; /* * The RDS notifier is used (optionally) to tell the application about * completed RDMA operations. Rather than keeping the whole rds message * around on the queue, we allocate a small notifier that is put on the * socket's notifier_list. Notifications are delivered to the application * through control messages. */ struct rds_notifier { struct list_head n_list; uint64_t n_user_token; int n_status; }; /* Available as part of RDS core, so doesn't need to participate * in get_preferred transport etc */ #define RDS_TRANS_LOOP 3 /** * struct rds_transport - transport specific behavioural hooks * * @xmit: .xmit is called by rds_send_xmit() to tell the transport to send * part of a message. The caller serializes on the send_sem so this * doesn't need to be reentrant for a given conn. The header must be * sent before the data payload. .xmit must be prepared to send a * message with no data payload. .xmit should return the number of * bytes that were sent down the connection, including header bytes. * Returning 0 tells the caller that it doesn't need to perform any * additional work now. This is usually the case when the transport has * filled the sending queue for its connection and will handle * triggering the rds thread to continue the send when space becomes * available. Returning -EAGAIN tells the caller to retry the send * immediately. Returning -ENOMEM tells the caller to retry the send at * some point in the future. * * @conn_shutdown: conn_shutdown stops traffic on the given connection. Once * it returns the connection can not call rds_recv_incoming(). * This will only be called once after conn_connect returns * non-zero success and will The caller serializes this with * the send and connecting paths (xmit_* and conn_*). The * transport is responsible for other serialization, including * rds_recv_incoming(). This is called in process context but * should try hard not to block. */ struct rds_transport { char t_name[TRANSNAMSIZ]; struct list_head t_item; struct module *t_owner; unsigned int t_prefer_loopback:1, t_mp_capable:1; unsigned int t_type; int (*laddr_check)(struct net *net, const struct in6_addr *addr, __u32 scope_id); int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp); void (*conn_free)(void *data); int (*conn_path_connect)(struct rds_conn_path *cp); void (*conn_path_shutdown)(struct rds_conn_path *conn); void (*xmit_path_prepare)(struct rds_conn_path *cp); void (*xmit_path_complete)(struct rds_conn_path *cp); int (*xmit)(struct rds_connection *conn, struct rds_message *rm, unsigned int hdr_off, unsigned int sg, unsigned int off); int (*xmit_rdma)(struct rds_connection *conn, struct rm_rdma_op *op); int (*xmit_atomic)(struct rds_connection *conn, struct rm_atomic_op *op); int (*recv_path)(struct rds_conn_path *cp); int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to); void (*inc_free)(struct rds_incoming *inc); int (*cm_handle_connect)(struct rdma_cm_id *cm_id, struct rdma_cm_event *event, bool isv6); int (*cm_initiate_connect)(struct rdma_cm_id *cm_id, bool isv6); void (*cm_connect_complete)(struct rds_connection *conn, struct rdma_cm_event *event); unsigned int (*stats_info_copy)(struct rds_info_iterator *iter, unsigned int avail); void (*exit)(void); void *(*get_mr)(struct scatterlist *sg, unsigned long nr_sg, struct rds_sock *rs, u32 *key_ret, struct rds_connection *conn, u64 start, u64 length, int need_odp); void (*sync_mr)(void *trans_private, int direction); void (*free_mr)(void *trans_private, int invalidate); void (*flush_mrs)(void); bool (*t_unloading)(struct rds_connection *conn); u8 (*get_tos_map)(u8 tos); }; /* Bind hash table key length. It is the sum of the size of a struct * in6_addr, a scope_id and a port. */ #define RDS_BOUND_KEY_LEN \ (sizeof(struct in6_addr) + sizeof(__u32) + sizeof(__be16)) struct rds_sock { struct sock rs_sk; u64 rs_user_addr; u64 rs_user_bytes; /* * bound_addr used for both incoming and outgoing, no INADDR_ANY * support. */ struct rhash_head rs_bound_node; u8 rs_bound_key[RDS_BOUND_KEY_LEN]; struct sockaddr_in6 rs_bound_sin6; #define rs_bound_addr rs_bound_sin6.sin6_addr #define rs_bound_addr_v4 rs_bound_sin6.sin6_addr.s6_addr32[3] #define rs_bound_port rs_bound_sin6.sin6_port #define rs_bound_scope_id rs_bound_sin6.sin6_scope_id struct in6_addr rs_conn_addr; #define rs_conn_addr_v4 rs_conn_addr.s6_addr32[3] __be16 rs_conn_port; struct rds_transport *rs_transport; /* * rds_sendmsg caches the conn it used the last time around. * This helps avoid costly lookups. */ struct rds_connection *rs_conn; /* flag indicating we were congested or not */ int rs_congested; /* seen congestion (ENOBUFS) when sending? */ int rs_seen_congestion; /* rs_lock protects all these adjacent members before the newline */ spinlock_t rs_lock; struct list_head rs_send_queue; u32 rs_snd_bytes; int rs_rcv_bytes; struct list_head rs_notify_queue; /* currently used for failed RDMAs */ /* Congestion wake_up. If rs_cong_monitor is set, we use cong_mask * to decide whether the application should be woken up. * If not set, we use rs_cong_track to find out whether a cong map * update arrived. */ uint64_t rs_cong_mask; uint64_t rs_cong_notify; struct list_head rs_cong_list; unsigned long rs_cong_track; /* * rs_recv_lock protects the receive queue, and is * used to serialize with rds_release. */ rwlock_t rs_recv_lock; struct list_head rs_recv_queue; /* just for stats reporting */ struct list_head rs_item; /* these have their own lock */ spinlock_t rs_rdma_lock; struct rb_root rs_rdma_keys; /* Socket options - in case there will be more */ unsigned char rs_recverr, rs_cong_monitor; u32 rs_hash_initval; /* Socket receive path trace points*/ u8 rs_rx_traces; u8 rs_rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX]; struct rds_msg_zcopy_queue rs_zcookie_queue; u8 rs_tos; }; static inline struct rds_sock *rds_sk_to_rs(const struct sock *sk) { return container_of(sk, struct rds_sock, rs_sk); } static inline struct sock *rds_rs_to_sk(struct rds_sock *rs) { return &rs->rs_sk; } /* * The stack assigns sk_sndbuf and sk_rcvbuf to twice the specified value * to account for overhead. We don't account for overhead, we just apply * the number of payload bytes to the specified value. */ static inline int rds_sk_sndbuf(struct rds_sock *rs) { return rds_rs_to_sk(rs)->sk_sndbuf / 2; } static inline int rds_sk_rcvbuf(struct rds_sock *rs) { return rds_rs_to_sk(rs)->sk_rcvbuf / 2; } struct rds_statistics { uint64_t s_conn_reset; uint64_t s_recv_drop_bad_checksum; uint64_t s_recv_drop_old_seq; uint64_t s_recv_drop_no_sock; uint64_t s_recv_drop_dead_sock; uint64_t s_recv_deliver_raced; uint64_t s_recv_delivered; uint64_t s_recv_queued; uint64_t s_recv_immediate_retry; uint64_t s_recv_delayed_retry; uint64_t s_recv_ack_required; uint64_t s_recv_rdma_bytes; uint64_t s_recv_ping; uint64_t s_send_queue_empty; uint64_t s_send_queue_full; uint64_t s_send_lock_contention; uint64_t s_send_lock_queue_raced; uint64_t s_send_immediate_retry; uint64_t s_send_delayed_retry; uint64_t s_send_drop_acked; uint64_t s_send_ack_required; uint64_t s_send_queued; uint64_t s_send_rdma; uint64_t s_send_rdma_bytes; uint64_t s_send_pong; uint64_t s_page_remainder_hit; uint64_t s_page_remainder_miss; uint64_t s_copy_to_user; uint64_t s_copy_from_user; uint64_t s_cong_update_queued; uint64_t s_cong_update_received; uint64_t s_cong_send_error; uint64_t s_cong_send_blocked; uint64_t s_recv_bytes_added_to_socket; uint64_t s_recv_bytes_removed_from_socket; uint64_t s_send_stuck_rm; }; /* af_rds.c */ void rds_sock_addref(struct rds_sock *rs); void rds_sock_put(struct rds_sock *rs); void rds_wake_sk_sleep(struct rds_sock *rs); static inline void __rds_wake_sk_sleep(struct sock *sk) { wait_queue_head_t *waitq = sk_sleep(sk); if (!sock_flag(sk, SOCK_DEAD) && waitq) wake_up(waitq); } extern wait_queue_head_t rds_poll_waitq; /* bind.c */ int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); void rds_remove_bound(struct rds_sock *rs); struct rds_sock *rds_find_bound(const struct in6_addr *addr, __be16 port, __u32 scope_id); int rds_bind_lock_init(void); void rds_bind_lock_destroy(void); /* cong.c */ int rds_cong_get_maps(struct rds_connection *conn); void rds_cong_add_conn(struct rds_connection *conn); void rds_cong_remove_conn(struct rds_connection *conn); void rds_cong_set_bit(struct rds_cong_map *map, __be16 port); void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port); int rds_cong_wait(struct rds_cong_map *map, __be16 port, int nonblock, struct rds_sock *rs); void rds_cong_queue_updates(struct rds_cong_map *map); void rds_cong_map_updated(struct rds_cong_map *map, uint64_t); int rds_cong_updated_since(unsigned long *recent); void rds_cong_add_socket(struct rds_sock *); void rds_cong_remove_socket(struct rds_sock *); void rds_cong_exit(void); struct rds_message *rds_cong_update_alloc(struct rds_connection *conn); /* connection.c */ extern u32 rds_gen_num; int rds_conn_init(void); void rds_conn_exit(void); struct rds_connection *rds_conn_create(struct net *net, const struct in6_addr *laddr, const struct in6_addr *faddr, struct rds_transport *trans, u8 tos, gfp_t gfp, int dev_if); struct rds_connection *rds_conn_create_outgoing(struct net *net, const struct in6_addr *laddr, const struct in6_addr *faddr, struct rds_transport *trans, u8 tos, gfp_t gfp, int dev_if); void rds_conn_shutdown(struct rds_conn_path *cpath); void rds_conn_destroy(struct rds_connection *conn); void rds_conn_drop(struct rds_connection *conn); void rds_conn_path_drop(struct rds_conn_path *cpath, bool destroy); void rds_conn_connect_if_down(struct rds_connection *conn); void rds_conn_path_connect_if_down(struct rds_conn_path *cp); void rds_check_all_paths(struct rds_connection *conn); void rds_for_each_conn_info(struct socket *sock, unsigned int len, struct rds_info_iterator *iter, struct rds_info_lengths *lens, int (*visitor)(struct rds_connection *, void *), u64 *buffer, size_t item_len); __printf(2, 3) void __rds_conn_path_error(struct rds_conn_path *cp, const char *, ...); #define rds_conn_path_error(cp, fmt...) \ __rds_conn_path_error(cp, KERN_WARNING "RDS: " fmt) static inline int rds_conn_path_transition(struct rds_conn_path *cp, int old, int new) { return atomic_cmpxchg(&cp->cp_state, old, new) == old; } static inline int rds_conn_transition(struct rds_connection *conn, int old, int new) { WARN_ON(conn->c_trans->t_mp_capable); return rds_conn_path_transition(&conn->c_path[0], old, new); } static inline int rds_conn_path_state(struct rds_conn_path *cp) { return atomic_read(&cp->cp_state); } static inline int rds_conn_state(struct rds_connection *conn) { WARN_ON(conn->c_trans->t_mp_capable); return rds_conn_path_state(&conn->c_path[0]); } static inline int rds_conn_path_up(struct rds_conn_path *cp) { return atomic_read(&cp->cp_state) == RDS_CONN_UP; } static inline int rds_conn_path_down(struct rds_conn_path *cp) { return atomic_read(&cp->cp_state) == RDS_CONN_DOWN; } static inline int rds_conn_up(struct rds_connection *conn) { WARN_ON(conn->c_trans->t_mp_capable); return rds_conn_path_up(&conn->c_path[0]); } static inline int rds_conn_path_connecting(struct rds_conn_path *cp) { return atomic_read(&cp->cp_state) == RDS_CONN_CONNECTING; } static inline int rds_conn_connecting(struct rds_connection *conn) { WARN_ON(conn->c_trans->t_mp_capable); return rds_conn_path_connecting(&conn->c_path[0]); } /* message.c */ struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents); int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, bool zcopy); struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len); void rds_message_populate_header(struct rds_header *hdr, __be16 sport, __be16 dport, u64 seq); int rds_message_add_extension(struct rds_header *hdr, unsigned int type, const void *data, unsigned int len); int rds_message_next_extension(struct rds_header *hdr, unsigned int *pos, void *buf, unsigned int *buflen); int rds_message_add_rdma_dest_extension(struct rds_header *hdr, u32 r_key, u32 offset); int rds_message_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to); void rds_message_addref(struct rds_message *rm); void rds_message_put(struct rds_message *rm); void rds_message_wait(struct rds_message *rm); void rds_message_unmapped(struct rds_message *rm); void rds_notify_msg_zcopy_purge(struct rds_msg_zcopy_queue *info); static inline void rds_message_make_checksum(struct rds_header *hdr) { hdr->h_csum = 0; hdr->h_csum = ip_fast_csum((void *) hdr, sizeof(*hdr) >> 2); } static inline int rds_message_verify_checksum(const struct rds_header *hdr) { return !hdr->h_csum || ip_fast_csum((void *) hdr, sizeof(*hdr) >> 2) == 0; } /* page.c */ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes, gfp_t gfp); void rds_page_exit(void); /* recv.c */ void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn, struct in6_addr *saddr); void rds_inc_path_init(struct rds_incoming *inc, struct rds_conn_path *conn, struct in6_addr *saddr); void rds_inc_put(struct rds_incoming *inc); void rds_recv_incoming(struct rds_connection *conn, struct in6_addr *saddr, struct in6_addr *daddr, struct rds_incoming *inc, gfp_t gfp); int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int msg_flags); void rds_clear_recv_queue(struct rds_sock *rs); int rds_notify_queue_get(struct rds_sock *rs, struct msghdr *msg); void rds_inc_info_copy(struct rds_incoming *inc, struct rds_info_iterator *iter, __be32 saddr, __be32 daddr, int flip); void rds6_inc_info_copy(struct rds_incoming *inc, struct rds_info_iterator *iter, struct in6_addr *saddr, struct in6_addr *daddr, int flip); /* send.c */ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len); void rds_send_path_reset(struct rds_conn_path *conn); int rds_send_xmit(struct rds_conn_path *cp); struct sockaddr_in; void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in6 *dest); typedef int (*is_acked_func)(struct rds_message *rm, uint64_t ack); void rds_send_drop_acked(struct rds_connection *conn, u64 ack, is_acked_func is_acked); void rds_send_path_drop_acked(struct rds_conn_path *cp, u64 ack, is_acked_func is_acked); void rds_send_ping(struct rds_connection *conn, int cp_index); int rds_send_pong(struct rds_conn_path *cp, __be16 dport); /* rdma.c */ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force); int rds_get_mr(struct rds_sock *rs, sockptr_t optval, int optlen); int rds_get_mr_for_dest(struct rds_sock *rs, sockptr_t optval, int optlen); int rds_free_mr(struct rds_sock *rs, sockptr_t optval, int optlen); void rds_rdma_drop_keys(struct rds_sock *rs); int rds_rdma_extra_size(struct rds_rdma_args *args, struct rds_iov_vector *iov); int rds_cmsg_rdma_dest(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg); int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg, struct rds_iov_vector *vec); int rds_cmsg_rdma_map(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg); void rds_rdma_free_op(struct rm_rdma_op *ro); void rds_atomic_free_op(struct rm_atomic_op *ao); void rds_rdma_send_complete(struct rds_message *rm, int wc_status); void rds_atomic_send_complete(struct rds_message *rm, int wc_status); int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg); void __rds_put_mr_final(struct kref *kref); static inline bool rds_destroy_pending(struct rds_connection *conn) { return !check_net(rds_conn_net(conn)) || (conn->c_trans->t_unloading && conn->c_trans->t_unloading(conn)); } enum { ODP_NOT_NEEDED, ODP_ZEROBASED, ODP_VIRTUAL }; /* stats.c */ DECLARE_PER_CPU_SHARED_ALIGNED(struct rds_statistics, rds_stats); #define rds_stats_inc_which(which, member) do { \ per_cpu(which, get_cpu()).member++; \ put_cpu(); \ } while (0) #define rds_stats_inc(member) rds_stats_inc_which(rds_stats, member) #define rds_stats_add_which(which, member, count) do { \ per_cpu(which, get_cpu()).member += count; \ put_cpu(); \ } while (0) #define rds_stats_add(member, count) rds_stats_add_which(rds_stats, member, count) int rds_stats_init(void); void rds_stats_exit(void); void rds_stats_info_copy(struct rds_info_iterator *iter, uint64_t *values, const char *const *names, size_t nr); /* sysctl.c */ int rds_sysctl_init(void); void rds_sysctl_exit(void); extern unsigned long rds_sysctl_sndbuf_min; extern unsigned long rds_sysctl_sndbuf_default; extern unsigned long rds_sysctl_sndbuf_max; extern unsigned long rds_sysctl_reconnect_min_jiffies; extern unsigned long rds_sysctl_reconnect_max_jiffies; extern unsigned int rds_sysctl_max_unacked_packets; extern unsigned int rds_sysctl_max_unacked_bytes; extern unsigned int rds_sysctl_ping_enable; extern unsigned long rds_sysctl_trace_flags; extern unsigned int rds_sysctl_trace_level; /* threads.c */ int rds_threads_init(void); void rds_threads_exit(void); extern struct workqueue_struct *rds_wq; void rds_queue_reconnect(struct rds_conn_path *cp); void rds_connect_worker(struct work_struct *); void rds_shutdown_worker(struct work_struct *); void rds_send_worker(struct work_struct *); void rds_recv_worker(struct work_struct *); void rds_connect_path_complete(struct rds_conn_path *conn, int curr); void rds_connect_complete(struct rds_connection *conn); int rds_addr_cmp(const struct in6_addr *a1, const struct in6_addr *a2); /* transport.c */ void rds_trans_register(struct rds_transport *trans); void rds_trans_unregister(struct rds_transport *trans); struct rds_transport *rds_trans_get_preferred(struct net *net, const struct in6_addr *addr, __u32 scope_id); void rds_trans_put(struct rds_transport *trans); unsigned int rds_trans_stats_info_copy(struct rds_info_iterator *iter, unsigned int avail); struct rds_transport *rds_trans_get(int t_type); #endif |
12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SCHED_WAKE_Q_H #define _LINUX_SCHED_WAKE_Q_H /* * Wake-queues are lists of tasks with a pending wakeup, whose * callers have already marked the task as woken internally, * and can thus carry on. A common use case is being able to * do the wakeups once the corresponding user lock as been * released. * * We hold reference to each task in the list across the wakeup, * thus guaranteeing that the memory is still valid by the time * the actual wakeups are performed in wake_up_q(). * * One per task suffices, because there's never a need for a task to be * in two wake queues simultaneously; it is forbidden to abandon a task * in a wake queue (a call to wake_up_q() _must_ follow), so if a task is * already in a wake queue, the wakeup will happen soon and the second * waker can just skip it. * * The DEFINE_WAKE_Q macro declares and initializes the list head. * wake_up_q() does NOT reinitialize the list; it's expected to be * called near the end of a function. Otherwise, the list can be * re-initialized for later re-use by wake_q_init(). * * NOTE that this can cause spurious wakeups. schedule() callers * must ensure the call is done inside a loop, confirming that the * wakeup condition has in fact occurred. * * NOTE that there is no guarantee the wakeup will happen any later than the * wake_q_add() location. Therefore task must be ready to be woken at the * location of the wake_q_add(). */ #include <linux/sched.h> struct wake_q_head { struct wake_q_node *first; struct wake_q_node **lastp; }; #define WAKE_Q_TAIL ((struct wake_q_node *) 0x01) #define WAKE_Q_HEAD_INITIALIZER(name) \ { WAKE_Q_TAIL, &name.first } #define DEFINE_WAKE_Q(name) \ struct wake_q_head name = WAKE_Q_HEAD_INITIALIZER(name) static inline void wake_q_init(struct wake_q_head *head) { head->first = WAKE_Q_TAIL; head->lastp = &head->first; } static inline bool wake_q_empty(struct wake_q_head *head) { return head->first == WAKE_Q_TAIL; } extern void wake_q_add(struct wake_q_head *head, struct task_struct *task); extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task); extern void wake_up_q(struct wake_q_head *head); /* Spin unlock helpers to unlock and call wake_up_q with preempt disabled */ static inline void raw_spin_unlock_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q) { guard(preempt)(); raw_spin_unlock(lock); if (wake_q) { wake_up_q(wake_q); wake_q_init(wake_q); } } static inline void raw_spin_unlock_irq_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q) { guard(preempt)(); raw_spin_unlock_irq(lock); if (wake_q) { wake_up_q(wake_q); wake_q_init(wake_q); } } static inline void raw_spin_unlock_irqrestore_wake(raw_spinlock_t *lock, unsigned long flags, struct wake_q_head *wake_q) { guard(preempt)(); raw_spin_unlock_irqrestore(lock, flags); if (wake_q) { wake_up_q(wake_q); wake_q_init(wake_q); } } #endif /* _LINUX_SCHED_WAKE_Q_H */ |
6 1 1 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2006 Patrick McHardy <kaber@trash.net> */ #include <linux/module.h> #include <linux/init.h> #include <linux/skbuff.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_NFLOG.h> #include <net/netfilter/nf_log.h> MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_DESCRIPTION("Xtables: packet logging to netlink using NFLOG"); MODULE_LICENSE("GPL"); MODULE_ALIAS("ipt_NFLOG"); MODULE_ALIAS("ip6t_NFLOG"); static unsigned int nflog_tg(struct sk_buff *skb, const struct xt_action_param *par) { const struct xt_nflog_info *info = par->targinfo; struct net *net = xt_net(par); struct nf_loginfo li; li.type = NF_LOG_TYPE_ULOG; li.u.ulog.copy_len = info->len; li.u.ulog.group = info->group; li.u.ulog.qthreshold = info->threshold; li.u.ulog.flags = 0; if (info->flags & XT_NFLOG_F_COPY_LEN) li.u.ulog.flags |= NF_LOG_F_COPY_LEN; nf_log_packet(net, xt_family(par), xt_hooknum(par), skb, xt_in(par), xt_out(par), &li, "%s", info->prefix); return XT_CONTINUE; } static int nflog_tg_check(const struct xt_tgchk_param *par) { const struct xt_nflog_info *info = par->targinfo; int ret; if (info->flags & ~XT_NFLOG_MASK) return -EINVAL; if (info->prefix[sizeof(info->prefix) - 1] != '\0') return -EINVAL; ret = nf_logger_find_get(par->family, NF_LOG_TYPE_ULOG); if (ret != 0 && !par->nft_compat) { request_module("%s", "nfnetlink_log"); ret = nf_logger_find_get(par->family, NF_LOG_TYPE_ULOG); } return ret; } static void nflog_tg_destroy(const struct xt_tgdtor_param *par) { nf_logger_put(par->family, NF_LOG_TYPE_ULOG); } static struct xt_target nflog_tg_reg[] __read_mostly = { { .name = "NFLOG", .revision = 0, .family = NFPROTO_IPV4, .checkentry = nflog_tg_check, .destroy = nflog_tg_destroy, .target = nflog_tg, .targetsize = sizeof(struct xt_nflog_info), .me = THIS_MODULE, }, #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) { .name = "NFLOG", .revision = 0, .family = NFPROTO_IPV6, .checkentry = nflog_tg_check, .destroy = nflog_tg_destroy, .target = nflog_tg, .targetsize = sizeof(struct xt_nflog_info), .me = THIS_MODULE, }, #endif }; static int __init nflog_tg_init(void) { return xt_register_targets(nflog_tg_reg, ARRAY_SIZE(nflog_tg_reg)); } static void __exit nflog_tg_exit(void) { xt_unregister_targets(nflog_tg_reg, ARRAY_SIZE(nflog_tg_reg)); } module_init(nflog_tg_init); module_exit(nflog_tg_exit); MODULE_SOFTDEP("pre: nfnetlink_log"); |
469 131 1 331 313 342 312 2 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __KVM_X86_LAPIC_H #define __KVM_X86_LAPIC_H #include <kvm/iodev.h> #include <linux/kvm_host.h> #include "hyperv.h" #include "smm.h" #define KVM_APIC_INIT 0 #define KVM_APIC_SIPI 1 #define APIC_SHORT_MASK 0xc0000 #define APIC_DEST_NOSHORT 0x0 #define APIC_DEST_MASK 0x800 #define APIC_BUS_CYCLE_NS_DEFAULT 1 #define APIC_BROADCAST 0xFF #define X2APIC_BROADCAST 0xFFFFFFFFul enum lapic_mode { LAPIC_MODE_DISABLED = 0, LAPIC_MODE_INVALID = X2APIC_ENABLE, LAPIC_MODE_XAPIC = MSR_IA32_APICBASE_ENABLE, LAPIC_MODE_X2APIC = MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE, }; enum lapic_lvt_entry { LVT_TIMER, LVT_THERMAL_MONITOR, LVT_PERFORMANCE_COUNTER, LVT_LINT0, LVT_LINT1, LVT_ERROR, LVT_CMCI, KVM_APIC_MAX_NR_LVT_ENTRIES, }; #define APIC_LVTx(x) ((x) == LVT_CMCI ? APIC_LVTCMCI : APIC_LVTT + 0x10 * (x)) struct kvm_timer { struct hrtimer timer; s64 period; /* unit: ns */ ktime_t target_expiration; u32 timer_mode; u32 timer_mode_mask; u64 tscdeadline; u64 expired_tscdeadline; u32 timer_advance_ns; atomic_t pending; /* accumulated triggered timers */ bool hv_timer_in_use; }; struct kvm_lapic { unsigned long base_address; struct kvm_io_device dev; struct kvm_timer lapic_timer; u32 divide_count; struct kvm_vcpu *vcpu; bool apicv_active; bool sw_enabled; bool irr_pending; bool lvt0_in_nmi_mode; /* Number of bits set in ISR. */ s16 isr_count; /* The highest vector set in ISR; if -1 - invalid, must scan ISR. */ int highest_isr_cache; /** * APIC register page. The layout matches the register layout seen by * the guest 1:1, because it is accessed by the vmx microcode. * Note: Only one register, the TPR, is used by the microcode. */ void *regs; gpa_t vapic_addr; struct gfn_to_hva_cache vapic_cache; unsigned long pending_events; unsigned int sipi_vector; int nr_lvt_entries; }; struct dest_map; int kvm_create_lapic(struct kvm_vcpu *vcpu); void kvm_free_lapic(struct kvm_vcpu *vcpu); int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu); void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector); int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu); int kvm_apic_accept_events(struct kvm_vcpu *vcpu); void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event); u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu); bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, int shorthand, unsigned int dest, int dest_mode); int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2); void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec); bool __kvm_apic_update_irr(u32 *pir, void *regs, int *max_irr); bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr); void kvm_apic_update_ppr(struct kvm_vcpu *vcpu); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, struct dest_map *dest_map); int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type); void kvm_apic_update_apicv(struct kvm_vcpu *vcpu); int kvm_alloc_apic_access_page(struct kvm *kvm); void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu); bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map); void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high); int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated); int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data); void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset); void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector); int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu); void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu); int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data); int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data); int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data); int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len); void kvm_lapic_exit(void); u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic); #define VEC_POS(v) ((v) & (32 - 1)) #define REG_POS(v) (((v) >> 5) << 4) static inline void kvm_lapic_clear_vector(int vec, void *bitmap) { clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); } static inline void kvm_lapic_set_vector(int vec, void *bitmap) { set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); } static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic) { kvm_lapic_set_vector(vec, apic->regs + APIC_IRR); /* * irr_pending must be true if any interrupt is pending; set it after * APIC_IRR to avoid race with apic_clear_irr */ apic->irr_pending = true; } static inline u32 __kvm_lapic_get_reg(char *regs, int reg_off) { return *((u32 *) (regs + reg_off)); } static inline u32 kvm_lapic_get_reg(struct kvm_lapic *apic, int reg_off) { return __kvm_lapic_get_reg(apic->regs, reg_off); } DECLARE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); static inline bool lapic_in_kernel(struct kvm_vcpu *vcpu) { if (static_branch_unlikely(&kvm_has_noapic_vcpu)) return vcpu->arch.apic; return true; } extern struct static_key_false_deferred apic_hw_disabled; static inline bool kvm_apic_hw_enabled(struct kvm_lapic *apic) { if (static_branch_unlikely(&apic_hw_disabled.key)) return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE; return true; } extern struct static_key_false_deferred apic_sw_disabled; static inline bool kvm_apic_sw_enabled(struct kvm_lapic *apic) { if (static_branch_unlikely(&apic_sw_disabled.key)) return apic->sw_enabled; return true; } static inline bool kvm_apic_present(struct kvm_vcpu *vcpu) { return lapic_in_kernel(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic); } static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu) { return kvm_apic_present(vcpu) && kvm_apic_sw_enabled(vcpu->arch.apic); } static inline int apic_x2apic_mode(struct kvm_lapic *apic) { return apic->vcpu->arch.apic_base & X2APIC_ENABLE; } static inline bool kvm_vcpu_apicv_active(struct kvm_vcpu *vcpu) { return lapic_in_kernel(vcpu) && vcpu->arch.apic->apicv_active; } static inline bool kvm_apic_has_pending_init_or_sipi(struct kvm_vcpu *vcpu) { return lapic_in_kernel(vcpu) && vcpu->arch.apic->pending_events; } static inline bool kvm_apic_init_sipi_allowed(struct kvm_vcpu *vcpu) { return !is_smm(vcpu) && !kvm_x86_call(apic_init_signal_blocked)(vcpu); } static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) { return (irq->delivery_mode == APIC_DM_LOWEST || irq->msi_redir_hint); } static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu) { return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); } bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu); void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq, unsigned long *vcpu_bitmap); bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, struct kvm_vcpu **dest_vcpu); int kvm_vector_to_index(u32 vector, u32 dest_vcpus, const unsigned long *bitmap, u32 bitmap_size); void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu); void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu); void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu); bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu); void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu); bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu); static inline enum lapic_mode kvm_apic_mode(u64 apic_base) { return apic_base & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE); } static inline enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu) { return kvm_apic_mode(vcpu->arch.apic_base); } static inline u8 kvm_xapic_id(struct kvm_lapic *apic) { return kvm_lapic_get_reg(apic, APIC_ID) >> 24; } #endif |
46 249 188 6 4759 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_TIMEKEEPING_H #define _LINUX_TIMEKEEPING_H #include <linux/errno.h> #include <linux/clocksource_ids.h> #include <linux/ktime.h> /* Included from linux/ktime.h */ void timekeeping_init(void); extern int timekeeping_suspended; /* Architecture timer tick functions: */ extern void legacy_timer_tick(unsigned long ticks); /* * Get and set timeofday */ extern int do_settimeofday64(const struct timespec64 *ts); extern int do_sys_settimeofday64(const struct timespec64 *tv, const struct timezone *tz); /* * ktime_get() family - read the current time in a multitude of ways. * * The default time reference is CLOCK_MONOTONIC, starting at * boot time but not counting the time spent in suspend. * For other references, use the functions with "real", "clocktai", * "boottime" and "raw" suffixes. * * To get the time in a different format, use the ones with * "ns", "ts64" and "seconds" suffix. * * See Documentation/core-api/timekeeping.rst for more details. */ /* * timespec64 based interfaces */ extern void ktime_get_raw_ts64(struct timespec64 *ts); extern void ktime_get_ts64(struct timespec64 *ts); extern void ktime_get_real_ts64(struct timespec64 *tv); extern void ktime_get_coarse_ts64(struct timespec64 *ts); extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); /* Multigrain timestamp interfaces */ extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); extern void ktime_get_real_ts64_mg(struct timespec64 *ts); extern unsigned long timekeeping_get_mg_floor_swaps(void); void getboottime64(struct timespec64 *ts); /* * time64_t base interfaces */ extern time64_t ktime_get_seconds(void); extern time64_t __ktime_get_real_seconds(void); extern time64_t ktime_get_real_seconds(void); /* * ktime_t based interfaces */ enum tk_offsets { TK_OFFS_REAL, TK_OFFS_BOOT, TK_OFFS_TAI, TK_OFFS_MAX, }; extern ktime_t ktime_get(void); extern ktime_t ktime_get_with_offset(enum tk_offsets offs); extern ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs); extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs); extern ktime_t ktime_get_raw(void); extern u32 ktime_get_resolution_ns(void); /** * ktime_get_real - get the real (wall-) time in ktime_t format * * Returns: real (wall) time in ktime_t format */ static inline ktime_t ktime_get_real(void) { return ktime_get_with_offset(TK_OFFS_REAL); } static inline ktime_t ktime_get_coarse_real(void) { return ktime_get_coarse_with_offset(TK_OFFS_REAL); } /** * ktime_get_boottime - Get monotonic time since boot in ktime_t format * * This is similar to CLOCK_MONTONIC/ktime_get, but also includes the * time spent in suspend. * * Returns: monotonic time since boot in ktime_t format */ static inline ktime_t ktime_get_boottime(void) { return ktime_get_with_offset(TK_OFFS_BOOT); } static inline ktime_t ktime_get_coarse_boottime(void) { return ktime_get_coarse_with_offset(TK_OFFS_BOOT); } /** * ktime_get_clocktai - Get the TAI time of day in ktime_t format * * Returns: the TAI time of day in ktime_t format */ static inline ktime_t ktime_get_clocktai(void) { return ktime_get_with_offset(TK_OFFS_TAI); } static inline ktime_t ktime_get_coarse_clocktai(void) { return ktime_get_coarse_with_offset(TK_OFFS_TAI); } static inline ktime_t ktime_get_coarse(void) { struct timespec64 ts; ktime_get_coarse_ts64(&ts); return timespec64_to_ktime(ts); } static inline u64 ktime_get_coarse_ns(void) { return ktime_to_ns(ktime_get_coarse()); } static inline u64 ktime_get_coarse_real_ns(void) { return ktime_to_ns(ktime_get_coarse_real()); } static inline u64 ktime_get_coarse_boottime_ns(void) { return ktime_to_ns(ktime_get_coarse_boottime()); } static inline u64 ktime_get_coarse_clocktai_ns(void) { return ktime_to_ns(ktime_get_coarse_clocktai()); } /** * ktime_mono_to_real - Convert monotonic time to clock realtime * @mono: monotonic time to convert * * Returns: time converted to realtime clock */ static inline ktime_t ktime_mono_to_real(ktime_t mono) { return ktime_mono_to_any(mono, TK_OFFS_REAL); } /** * ktime_get_ns - Get the current time in nanoseconds * * Returns: current time converted to nanoseconds */ static inline u64 ktime_get_ns(void) { return ktime_to_ns(ktime_get()); } /** * ktime_get_real_ns - Get the current real/wall time in nanoseconds * * Returns: current real time converted to nanoseconds */ static inline u64 ktime_get_real_ns(void) { return ktime_to_ns(ktime_get_real()); } /** * ktime_get_boottime_ns - Get the monotonic time since boot in nanoseconds * * Returns: current boottime converted to nanoseconds */ static inline u64 ktime_get_boottime_ns(void) { return ktime_to_ns(ktime_get_boottime()); } /** * ktime_get_clocktai_ns - Get the current TAI time of day in nanoseconds * * Returns: current TAI time converted to nanoseconds */ static inline u64 ktime_get_clocktai_ns(void) { return ktime_to_ns(ktime_get_clocktai()); } /** * ktime_get_raw_ns - Get the raw monotonic time in nanoseconds * * Returns: current raw monotonic time converted to nanoseconds */ static inline u64 ktime_get_raw_ns(void) { return ktime_to_ns(ktime_get_raw()); } extern u64 ktime_get_mono_fast_ns(void); extern u64 ktime_get_raw_fast_ns(void); extern u64 ktime_get_boot_fast_ns(void); extern u64 ktime_get_tai_fast_ns(void); extern u64 ktime_get_real_fast_ns(void); /* * timespec64/time64_t interfaces utilizing the ktime based ones * for API completeness, these could be implemented more efficiently * if needed. */ static inline void ktime_get_boottime_ts64(struct timespec64 *ts) { *ts = ktime_to_timespec64(ktime_get_boottime()); } static inline void ktime_get_coarse_boottime_ts64(struct timespec64 *ts) { *ts = ktime_to_timespec64(ktime_get_coarse_boottime()); } static inline time64_t ktime_get_boottime_seconds(void) { return ktime_divns(ktime_get_coarse_boottime(), NSEC_PER_SEC); } static inline void ktime_get_clocktai_ts64(struct timespec64 *ts) { *ts = ktime_to_timespec64(ktime_get_clocktai()); } static inline void ktime_get_coarse_clocktai_ts64(struct timespec64 *ts) { *ts = ktime_to_timespec64(ktime_get_coarse_clocktai()); } static inline time64_t ktime_get_clocktai_seconds(void) { return ktime_divns(ktime_get_coarse_clocktai(), NSEC_PER_SEC); } /* * RTC specific */ extern bool timekeeping_rtc_skipsuspend(void); extern bool timekeeping_rtc_skipresume(void); extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta); /** * struct system_time_snapshot - simultaneous raw/real time capture with * counter value * @cycles: Clocksource counter value to produce the system times * @real: Realtime system time * @boot: Boot time * @raw: Monotonic raw system time * @cs_id: Clocksource ID * @clock_was_set_seq: The sequence number of clock-was-set events * @cs_was_changed_seq: The sequence number of clocksource change events */ struct system_time_snapshot { u64 cycles; ktime_t real; ktime_t boot; ktime_t raw; enum clocksource_ids cs_id; unsigned int clock_was_set_seq; u8 cs_was_changed_seq; }; /** * struct system_device_crosststamp - system/device cross-timestamp * (synchronized capture) * @device: Device time * @sys_realtime: Realtime simultaneous with device time * @sys_monoraw: Monotonic raw simultaneous with device time */ struct system_device_crosststamp { ktime_t device; ktime_t sys_realtime; ktime_t sys_monoraw; }; /** * struct system_counterval_t - system counter value with the ID of the * corresponding clocksource * @cycles: System counter value * @cs_id: Clocksource ID corresponding to system counter value. Used by * timekeeping code to verify comparability of two cycle values. * The default ID, CSID_GENERIC, does not identify a specific * clocksource. * @use_nsecs: @cycles is in nanoseconds. */ struct system_counterval_t { u64 cycles; enum clocksource_ids cs_id; bool use_nsecs; }; extern bool ktime_real_to_base_clock(ktime_t treal, enum clocksource_ids base_id, u64 *cycles); extern bool timekeeping_clocksource_has_base(enum clocksource_ids id); /* * Get cross timestamp between system clock and device clock */ extern int get_device_system_crosststamp( int (*get_time_fn)(ktime_t *device_time, struct system_counterval_t *system_counterval, void *ctx), void *ctx, struct system_time_snapshot *history, struct system_device_crosststamp *xtstamp); /* * Simultaneously snapshot realtime and monotonic raw clocks */ extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot); /* * Persistent clock related interfaces */ extern int persistent_clock_is_local; extern void read_persistent_clock64(struct timespec64 *ts); void read_persistent_wall_and_boot_offset(struct timespec64 *wall_clock, struct timespec64 *boot_offset); #ifdef CONFIG_GENERIC_CMOS_UPDATE extern int update_persistent_clock64(struct timespec64 now); #endif #endif |
129 19 1 2 2 104 119 119 2 1 3 3 9 1 1 1 1 4 96 65 4 64 3 3 3 59 2 2 1 5 32 1 1 19 47 4 4 8 6 24 6 15 3 16 10 1 1 1 2 9 1 1 11 15 3 28 4 24 36 2 23 57 2 8 44 4 7 41 6 6 36 36 35 33 1 2 2 1 17 15 4 28 28 44 4 62 5 33 24 22 41 11 11 1 10 5 3 1 1 99 99 99 3 95 1 4 80 87 3 1 84 9 9 19 19 14 14 14 14 14 4 93 95 95 94 94 19 7 3 2 2 5 5 2 2 4 4 2 3 7 5 5 11 4 6 1 9 1 1 4 3 3 4 1 1 5 5 1 8 1 7 2 5 2 2 6 2 8 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 | // SPDX-License-Identifier: GPL-2.0-or-later /* * V4L2 controls framework uAPI implementation: * * Copyright (C) 2010-2021 Hans Verkuil <hverkuil-cisco@xs4all.nl> */ #define pr_fmt(fmt) "v4l2-ctrls: " fmt #include <linux/export.h> #include <linux/mm.h> #include <linux/slab.h> #include <media/v4l2-ctrls.h> #include <media/v4l2-dev.h> #include <media/v4l2-device.h> #include <media/v4l2-event.h> #include <media/v4l2-ioctl.h> #include "v4l2-ctrls-priv.h" /* Internal temporary helper struct, one for each v4l2_ext_control */ struct v4l2_ctrl_helper { /* Pointer to the control reference of the master control */ struct v4l2_ctrl_ref *mref; /* The control ref corresponding to the v4l2_ext_control ID field. */ struct v4l2_ctrl_ref *ref; /* * v4l2_ext_control index of the next control belonging to the * same cluster, or 0 if there isn't any. */ u32 next; }; /* * Helper functions to copy control payload data from kernel space to * user space and vice versa. */ /* Helper function: copy the given control value back to the caller */ static int ptr_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl, union v4l2_ctrl_ptr ptr) { u32 len; if (ctrl->is_ptr && !ctrl->is_string) return copy_to_user(c->ptr, ptr.p_const, c->size) ? -EFAULT : 0; switch (ctrl->type) { case V4L2_CTRL_TYPE_STRING: len = strlen(ptr.p_char); if (c->size < len + 1) { c->size = ctrl->elem_size; return -ENOSPC; } return copy_to_user(c->string, ptr.p_char, len + 1) ? -EFAULT : 0; case V4L2_CTRL_TYPE_INTEGER64: c->value64 = *ptr.p_s64; break; default: c->value = *ptr.p_s32; break; } return 0; } /* Helper function: copy the current control value back to the caller */ static int cur_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl) { return ptr_to_user(c, ctrl, ctrl->p_cur); } /* Helper function: copy the new control value back to the caller */ static int new_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl) { return ptr_to_user(c, ctrl, ctrl->p_new); } /* Helper function: copy the request value back to the caller */ static int req_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl_ref *ref) { return ptr_to_user(c, ref->ctrl, ref->p_req); } /* Helper function: copy the initial control value back to the caller */ static int def_to_user(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl) { ctrl->type_ops->init(ctrl, 0, ctrl->p_new); return ptr_to_user(c, ctrl, ctrl->p_new); } /* Helper function: copy the caller-provider value as the new control value */ static int user_to_new(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl) { int ret; u32 size; ctrl->is_new = 0; if (ctrl->is_dyn_array && c->size > ctrl->p_array_alloc_elems * ctrl->elem_size) { void *old = ctrl->p_array; void *tmp = kvzalloc(2 * c->size, GFP_KERNEL); if (!tmp) return -ENOMEM; memcpy(tmp, ctrl->p_new.p, ctrl->elems * ctrl->elem_size); memcpy(tmp + c->size, ctrl->p_cur.p, ctrl->elems * ctrl->elem_size); ctrl->p_new.p = tmp; ctrl->p_cur.p = tmp + c->size; ctrl->p_array = tmp; ctrl->p_array_alloc_elems = c->size / ctrl->elem_size; kvfree(old); } if (ctrl->is_ptr && !ctrl->is_string) { unsigned int elems = c->size / ctrl->elem_size; if (copy_from_user(ctrl->p_new.p, c->ptr, c->size)) return -EFAULT; ctrl->is_new = 1; if (ctrl->is_dyn_array) ctrl->new_elems = elems; else if (ctrl->is_array) ctrl->type_ops->init(ctrl, elems, ctrl->p_new); return 0; } switch (ctrl->type) { case V4L2_CTRL_TYPE_INTEGER64: *ctrl->p_new.p_s64 = c->value64; break; case V4L2_CTRL_TYPE_STRING: size = c->size; if (size == 0) return -ERANGE; if (size > ctrl->maximum + 1) size = ctrl->maximum + 1; ret = copy_from_user(ctrl->p_new.p_char, c->string, size) ? -EFAULT : 0; if (!ret) { char last = ctrl->p_new.p_char[size - 1]; ctrl->p_new.p_char[size - 1] = 0; /* * If the string was longer than ctrl->maximum, * then return an error. */ if (strlen(ctrl->p_new.p_char) == ctrl->maximum && last) return -ERANGE; ctrl->is_new = 1; } return ret; default: *ctrl->p_new.p_s32 = c->value; break; } ctrl->is_new = 1; return 0; } /* * VIDIOC_G/TRY/S_EXT_CTRLS implementation */ /* * Some general notes on the atomic requirements of VIDIOC_G/TRY/S_EXT_CTRLS: * * It is not a fully atomic operation, just best-effort only. After all, if * multiple controls have to be set through multiple i2c writes (for example) * then some initial writes may succeed while others fail. Thus leaving the * system in an inconsistent state. The question is how much effort you are * willing to spend on trying to make something atomic that really isn't. * * From the point of view of an application the main requirement is that * when you call VIDIOC_S_EXT_CTRLS and some values are invalid then an * error should be returned without actually affecting any controls. * * If all the values are correct, then it is acceptable to just give up * in case of low-level errors. * * It is important though that the application can tell when only a partial * configuration was done. The way we do that is through the error_idx field * of struct v4l2_ext_controls: if that is equal to the count field then no * controls were affected. Otherwise all controls before that index were * successful in performing their 'get' or 'set' operation, the control at * the given index failed, and you don't know what happened with the controls * after the failed one. Since if they were part of a control cluster they * could have been successfully processed (if a cluster member was encountered * at index < error_idx), they could have failed (if a cluster member was at * error_idx), or they may not have been processed yet (if the first cluster * member appeared after error_idx). * * It is all fairly theoretical, though. In practice all you can do is to * bail out. If error_idx == count, then it is an application bug. If * error_idx < count then it is only an application bug if the error code was * EBUSY. That usually means that something started streaming just when you * tried to set the controls. In all other cases it is a driver/hardware * problem and all you can do is to retry or bail out. * * Note that these rules do not apply to VIDIOC_TRY_EXT_CTRLS: since that * never modifies controls the error_idx is just set to whatever control * has an invalid value. */ /* * Prepare for the extended g/s/try functions. * Find the controls in the control array and do some basic checks. */ static int prepare_ext_ctrls(struct v4l2_ctrl_handler *hdl, struct v4l2_ext_controls *cs, struct v4l2_ctrl_helper *helpers, struct video_device *vdev, bool get) { struct v4l2_ctrl_helper *h; bool have_clusters = false; u32 i; for (i = 0, h = helpers; i < cs->count; i++, h++) { struct v4l2_ext_control *c = &cs->controls[i]; struct v4l2_ctrl_ref *ref; struct v4l2_ctrl *ctrl; u32 id = c->id & V4L2_CTRL_ID_MASK; cs->error_idx = i; if (cs->which && cs->which != V4L2_CTRL_WHICH_DEF_VAL && cs->which != V4L2_CTRL_WHICH_REQUEST_VAL && V4L2_CTRL_ID2WHICH(id) != cs->which) { dprintk(vdev, "invalid which 0x%x or control id 0x%x\n", cs->which, id); return -EINVAL; } /* * Old-style private controls are not allowed for * extended controls. */ if (id >= V4L2_CID_PRIVATE_BASE) { dprintk(vdev, "old-style private controls not allowed\n"); return -EINVAL; } ref = find_ref_lock(hdl, id); if (!ref) { dprintk(vdev, "cannot find control id 0x%x\n", id); return -EINVAL; } h->ref = ref; ctrl = ref->ctrl; if (ctrl->flags & V4L2_CTRL_FLAG_DISABLED) { dprintk(vdev, "control id 0x%x is disabled\n", id); return -EINVAL; } if (ctrl->cluster[0]->ncontrols > 1) have_clusters = true; if (ctrl->cluster[0] != ctrl) ref = find_ref_lock(hdl, ctrl->cluster[0]->id); if (ctrl->is_dyn_array) { unsigned int max_size = ctrl->dims[0] * ctrl->elem_size; unsigned int tot_size = ctrl->elem_size; if (cs->which == V4L2_CTRL_WHICH_REQUEST_VAL) tot_size *= ref->p_req_elems; else tot_size *= ctrl->elems; c->size = ctrl->elem_size * (c->size / ctrl->elem_size); if (get) { if (c->size < tot_size) { c->size = tot_size; return -ENOSPC; } c->size = tot_size; } else { if (c->size > max_size) { c->size = max_size; return -ENOSPC; } if (!c->size) return -EFAULT; } } else if (ctrl->is_ptr && !ctrl->is_string) { unsigned int tot_size = ctrl->elems * ctrl->elem_size; if (c->size < tot_size) { /* * In the get case the application first * queries to obtain the size of the control. */ if (get) { c->size = tot_size; return -ENOSPC; } dprintk(vdev, "pointer control id 0x%x size too small, %d bytes but %d bytes needed\n", id, c->size, tot_size); return -EFAULT; } c->size = tot_size; } /* Store the ref to the master control of the cluster */ h->mref = ref; /* * Initially set next to 0, meaning that there is no other * control in this helper array belonging to the same * cluster. */ h->next = 0; } /* * We are done if there were no controls that belong to a multi- * control cluster. */ if (!have_clusters) return 0; /* * The code below figures out in O(n) time which controls in the list * belong to the same cluster. */ /* This has to be done with the handler lock taken. */ mutex_lock(hdl->lock); /* First zero the helper field in the master control references */ for (i = 0; i < cs->count; i++) helpers[i].mref->helper = NULL; for (i = 0, h = helpers; i < cs->count; i++, h++) { struct v4l2_ctrl_ref *mref = h->mref; /* * If the mref->helper is set, then it points to an earlier * helper that belongs to the same cluster. */ if (mref->helper) { /* * Set the next field of mref->helper to the current * index: this means that the earlier helper now * points to the next helper in the same cluster. */ mref->helper->next = i; /* * mref should be set only for the first helper in the * cluster, clear the others. */ h->mref = NULL; } /* Point the mref helper to the current helper struct. */ mref->helper = h; } mutex_unlock(hdl->lock); return 0; } /* * Handles the corner case where cs->count == 0. It checks whether the * specified control class exists. If that class ID is 0, then it checks * whether there are any controls at all. */ static int class_check(struct v4l2_ctrl_handler *hdl, u32 which) { if (which == 0 || which == V4L2_CTRL_WHICH_DEF_VAL || which == V4L2_CTRL_WHICH_REQUEST_VAL) return 0; return find_ref_lock(hdl, which | 1) ? 0 : -EINVAL; } /* * Get extended controls. Allocates the helpers array if needed. * * Note that v4l2_g_ext_ctrls_common() with 'which' set to * V4L2_CTRL_WHICH_REQUEST_VAL is only called if the request was * completed, and in that case p_req_valid is true for all controls. */ int v4l2_g_ext_ctrls_common(struct v4l2_ctrl_handler *hdl, struct v4l2_ext_controls *cs, struct video_device *vdev) { struct v4l2_ctrl_helper helper[4]; struct v4l2_ctrl_helper *helpers = helper; int ret; int i, j; bool is_default, is_request; is_default = (cs->which == V4L2_CTRL_WHICH_DEF_VAL); is_request = (cs->which == V4L2_CTRL_WHICH_REQUEST_VAL); cs->error_idx = cs->count; cs->which = V4L2_CTRL_ID2WHICH(cs->which); if (!hdl) return -EINVAL; if (cs->count == 0) return class_check(hdl, cs->which); if (cs->count > ARRAY_SIZE(helper)) { helpers = kvmalloc_array(cs->count, sizeof(helper[0]), GFP_KERNEL); if (!helpers) return -ENOMEM; } ret = prepare_ext_ctrls(hdl, cs, helpers, vdev, true); cs->error_idx = cs->count; for (i = 0; !ret && i < cs->count; i++) if (helpers[i].ref->ctrl->flags & V4L2_CTRL_FLAG_WRITE_ONLY) ret = -EACCES; for (i = 0; !ret && i < cs->count; i++) { struct v4l2_ctrl *master; bool is_volatile = false; u32 idx = i; if (!helpers[i].mref) continue; master = helpers[i].mref->ctrl; cs->error_idx = i; v4l2_ctrl_lock(master); /* * g_volatile_ctrl will update the new control values. * This makes no sense for V4L2_CTRL_WHICH_DEF_VAL and * V4L2_CTRL_WHICH_REQUEST_VAL. In the case of requests * it is v4l2_ctrl_request_complete() that copies the * volatile controls at the time of request completion * to the request, so you don't want to do that again. */ if (!is_default && !is_request && ((master->flags & V4L2_CTRL_FLAG_VOLATILE) || (master->has_volatiles && !is_cur_manual(master)))) { for (j = 0; j < master->ncontrols; j++) cur_to_new(master->cluster[j]); ret = call_op(master, g_volatile_ctrl); is_volatile = true; } if (ret) { v4l2_ctrl_unlock(master); break; } /* * Copy the default value (if is_default is true), the * request value (if is_request is true and p_req is valid), * the new volatile value (if is_volatile is true) or the * current value. */ do { struct v4l2_ctrl_ref *ref = helpers[idx].ref; if (is_default) ret = def_to_user(cs->controls + idx, ref->ctrl); else if (is_request && ref->p_req_array_enomem) ret = -ENOMEM; else if (is_request && ref->p_req_valid) ret = req_to_user(cs->controls + idx, ref); else if (is_volatile) ret = new_to_user(cs->controls + idx, ref->ctrl); else ret = cur_to_user(cs->controls + idx, ref->ctrl); idx = helpers[idx].next; } while (!ret && idx); v4l2_ctrl_unlock(master); } if (cs->count > ARRAY_SIZE(helper)) kvfree(helpers); return ret; } int v4l2_g_ext_ctrls(struct v4l2_ctrl_handler *hdl, struct video_device *vdev, struct media_device *mdev, struct v4l2_ext_controls *cs) { if (cs->which == V4L2_CTRL_WHICH_REQUEST_VAL) return v4l2_g_ext_ctrls_request(hdl, vdev, mdev, cs); return v4l2_g_ext_ctrls_common(hdl, cs, vdev); } EXPORT_SYMBOL(v4l2_g_ext_ctrls); /* Validate a new control */ static int validate_new(const struct v4l2_ctrl *ctrl, union v4l2_ctrl_ptr p_new) { return ctrl->type_ops->validate(ctrl, p_new); } /* Validate controls. */ static int validate_ctrls(struct v4l2_ext_controls *cs, struct v4l2_ctrl_helper *helpers, struct video_device *vdev, bool set) { unsigned int i; int ret = 0; cs->error_idx = cs->count; for (i = 0; i < cs->count; i++) { struct v4l2_ctrl *ctrl = helpers[i].ref->ctrl; union v4l2_ctrl_ptr p_new; cs->error_idx = i; if (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY) { dprintk(vdev, "control id 0x%x is read-only\n", ctrl->id); return -EACCES; } /* * This test is also done in try_set_control_cluster() which * is called in atomic context, so that has the final say, * but it makes sense to do an up-front check as well. Once * an error occurs in try_set_control_cluster() some other * controls may have been set already and we want to do a * best-effort to avoid that. */ if (set && (ctrl->flags & V4L2_CTRL_FLAG_GRABBED)) { dprintk(vdev, "control id 0x%x is grabbed, cannot set\n", ctrl->id); return -EBUSY; } /* * Skip validation for now if the payload needs to be copied * from userspace into kernelspace. We'll validate those later. */ if (ctrl->is_ptr) continue; if (ctrl->type == V4L2_CTRL_TYPE_INTEGER64) p_new.p_s64 = &cs->controls[i].value64; else p_new.p_s32 = &cs->controls[i].value; ret = validate_new(ctrl, p_new); if (ret) return ret; } return 0; } /* Try or try-and-set controls */ int try_set_ext_ctrls_common(struct v4l2_fh *fh, struct v4l2_ctrl_handler *hdl, struct v4l2_ext_controls *cs, struct video_device *vdev, bool set) { struct v4l2_ctrl_helper helper[4]; struct v4l2_ctrl_helper *helpers = helper; unsigned int i, j; int ret; cs->error_idx = cs->count; /* Default value cannot be changed */ if (cs->which == V4L2_CTRL_WHICH_DEF_VAL) { dprintk(vdev, "%s: cannot change default value\n", video_device_node_name(vdev)); return -EINVAL; } cs->which = V4L2_CTRL_ID2WHICH(cs->which); if (!hdl) { dprintk(vdev, "%s: invalid null control handler\n", video_device_node_name(vdev)); return -EINVAL; } if (cs->count == 0) return class_check(hdl, cs->which); if (cs->count > ARRAY_SIZE(helper)) { helpers = kvmalloc_array(cs->count, sizeof(helper[0]), GFP_KERNEL); if (!helpers) return -ENOMEM; } ret = prepare_ext_ctrls(hdl, cs, helpers, vdev, false); if (!ret) ret = validate_ctrls(cs, helpers, vdev, set); if (ret && set) cs->error_idx = cs->count; for (i = 0; !ret && i < cs->count; i++) { struct v4l2_ctrl *master; u32 idx = i; if (!helpers[i].mref) continue; cs->error_idx = i; master = helpers[i].mref->ctrl; v4l2_ctrl_lock(master); /* Reset the 'is_new' flags of the cluster */ for (j = 0; j < master->ncontrols; j++) if (master->cluster[j]) master->cluster[j]->is_new = 0; /* * For volatile autoclusters that are currently in auto mode * we need to discover if it will be set to manual mode. * If so, then we have to copy the current volatile values * first since those will become the new manual values (which * may be overwritten by explicit new values from this set * of controls). */ if (master->is_auto && master->has_volatiles && !is_cur_manual(master)) { /* Pick an initial non-manual value */ s32 new_auto_val = master->manual_mode_value + 1; u32 tmp_idx = idx; do { /* * Check if the auto control is part of the * list, and remember the new value. */ if (helpers[tmp_idx].ref->ctrl == master) new_auto_val = cs->controls[tmp_idx].value; tmp_idx = helpers[tmp_idx].next; } while (tmp_idx); /* * If the new value == the manual value, then copy * the current volatile values. */ if (new_auto_val == master->manual_mode_value) update_from_auto_cluster(master); } /* * Copy the new caller-supplied control values. * user_to_new() sets 'is_new' to 1. */ do { struct v4l2_ctrl *ctrl = helpers[idx].ref->ctrl; ret = user_to_new(cs->controls + idx, ctrl); if (!ret && ctrl->is_ptr) { ret = validate_new(ctrl, ctrl->p_new); if (ret) dprintk(vdev, "failed to validate control %s (%d)\n", v4l2_ctrl_get_name(ctrl->id), ret); } idx = helpers[idx].next; } while (!ret && idx); if (!ret) ret = try_or_set_cluster(fh, master, !hdl->req_obj.req && set, 0); if (!ret && hdl->req_obj.req && set) { for (j = 0; j < master->ncontrols; j++) { struct v4l2_ctrl_ref *ref = find_ref(hdl, master->cluster[j]->id); new_to_req(ref); } } /* Copy the new values back to userspace. */ if (!ret) { idx = i; do { ret = new_to_user(cs->controls + idx, helpers[idx].ref->ctrl); idx = helpers[idx].next; } while (!ret && idx); } v4l2_ctrl_unlock(master); } if (cs->count > ARRAY_SIZE(helper)) kvfree(helpers); return ret; } static int try_set_ext_ctrls(struct v4l2_fh *fh, struct v4l2_ctrl_handler *hdl, struct video_device *vdev, struct media_device *mdev, struct v4l2_ext_controls *cs, bool set) { int ret; if (cs->which == V4L2_CTRL_WHICH_REQUEST_VAL) return try_set_ext_ctrls_request(fh, hdl, vdev, mdev, cs, set); ret = try_set_ext_ctrls_common(fh, hdl, cs, vdev, set); if (ret) dprintk(vdev, "%s: try_set_ext_ctrls_common failed (%d)\n", video_device_node_name(vdev), ret); return ret; } int v4l2_try_ext_ctrls(struct v4l2_ctrl_handler *hdl, struct video_device *vdev, struct media_device *mdev, struct v4l2_ext_controls *cs) { return try_set_ext_ctrls(NULL, hdl, vdev, mdev, cs, false); } EXPORT_SYMBOL(v4l2_try_ext_ctrls); int v4l2_s_ext_ctrls(struct v4l2_fh *fh, struct v4l2_ctrl_handler *hdl, struct video_device *vdev, struct media_device *mdev, struct v4l2_ext_controls *cs) { return try_set_ext_ctrls(fh, hdl, vdev, mdev, cs, true); } EXPORT_SYMBOL(v4l2_s_ext_ctrls); /* * VIDIOC_G/S_CTRL implementation */ /* Helper function to get a single control */ static int get_ctrl(struct v4l2_ctrl *ctrl, struct v4l2_ext_control *c) { struct v4l2_ctrl *master = ctrl->cluster[0]; int ret = 0; int i; /* Compound controls are not supported. The new_to_user() and * cur_to_user() calls below would need to be modified not to access * userspace memory when called from get_ctrl(). */ if (!ctrl->is_int && ctrl->type != V4L2_CTRL_TYPE_INTEGER64) return -EINVAL; if (ctrl->flags & V4L2_CTRL_FLAG_WRITE_ONLY) return -EACCES; v4l2_ctrl_lock(master); /* g_volatile_ctrl will update the current control values */ if (ctrl->flags & V4L2_CTRL_FLAG_VOLATILE) { for (i = 0; i < master->ncontrols; i++) cur_to_new(master->cluster[i]); ret = call_op(master, g_volatile_ctrl); if (!ret) ret = new_to_user(c, ctrl); } else { ret = cur_to_user(c, ctrl); } v4l2_ctrl_unlock(master); return ret; } int v4l2_g_ctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_control *control) { struct v4l2_ctrl *ctrl = v4l2_ctrl_find(hdl, control->id); struct v4l2_ext_control c; int ret; if (!ctrl || !ctrl->is_int) return -EINVAL; ret = get_ctrl(ctrl, &c); if (!ret) control->value = c.value; return ret; } EXPORT_SYMBOL(v4l2_g_ctrl); /* Helper function for VIDIOC_S_CTRL compatibility */ static int set_ctrl(struct v4l2_fh *fh, struct v4l2_ctrl *ctrl, u32 ch_flags) { struct v4l2_ctrl *master = ctrl->cluster[0]; int ret; int i; /* Reset the 'is_new' flags of the cluster */ for (i = 0; i < master->ncontrols; i++) if (master->cluster[i]) master->cluster[i]->is_new = 0; ret = validate_new(ctrl, ctrl->p_new); if (ret) return ret; /* * For autoclusters with volatiles that are switched from auto to * manual mode we have to update the current volatile values since * those will become the initial manual values after such a switch. */ if (master->is_auto && master->has_volatiles && ctrl == master && !is_cur_manual(master) && ctrl->val == master->manual_mode_value) update_from_auto_cluster(master); ctrl->is_new = 1; return try_or_set_cluster(fh, master, true, ch_flags); } /* Helper function for VIDIOC_S_CTRL compatibility */ static int set_ctrl_lock(struct v4l2_fh *fh, struct v4l2_ctrl *ctrl, struct v4l2_ext_control *c) { int ret; v4l2_ctrl_lock(ctrl); ret = user_to_new(c, ctrl); if (!ret) ret = set_ctrl(fh, ctrl, 0); if (!ret) ret = cur_to_user(c, ctrl); v4l2_ctrl_unlock(ctrl); return ret; } int v4l2_s_ctrl(struct v4l2_fh *fh, struct v4l2_ctrl_handler *hdl, struct v4l2_control *control) { struct v4l2_ctrl *ctrl = v4l2_ctrl_find(hdl, control->id); struct v4l2_ext_control c = { control->id }; int ret; if (!ctrl || !ctrl->is_int) return -EINVAL; if (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY) return -EACCES; c.value = control->value; ret = set_ctrl_lock(fh, ctrl, &c); control->value = c.value; return ret; } EXPORT_SYMBOL(v4l2_s_ctrl); /* * Helper functions for drivers to get/set controls. */ s32 v4l2_ctrl_g_ctrl(struct v4l2_ctrl *ctrl) { struct v4l2_ext_control c; /* It's a driver bug if this happens. */ if (WARN_ON(!ctrl->is_int)) return 0; c.value = 0; get_ctrl(ctrl, &c); return c.value; } EXPORT_SYMBOL(v4l2_ctrl_g_ctrl); s64 v4l2_ctrl_g_ctrl_int64(struct v4l2_ctrl *ctrl) { struct v4l2_ext_control c; /* It's a driver bug if this happens. */ if (WARN_ON(ctrl->is_ptr || ctrl->type != V4L2_CTRL_TYPE_INTEGER64)) return 0; c.value64 = 0; get_ctrl(ctrl, &c); return c.value64; } EXPORT_SYMBOL(v4l2_ctrl_g_ctrl_int64); int __v4l2_ctrl_s_ctrl(struct v4l2_ctrl *ctrl, s32 val) { lockdep_assert_held(ctrl->handler->lock); /* It's a driver bug if this happens. */ if (WARN_ON(!ctrl->is_int)) return -EINVAL; ctrl->val = val; return set_ctrl(NULL, ctrl, 0); } EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl); int __v4l2_ctrl_s_ctrl_int64(struct v4l2_ctrl *ctrl, s64 val) { lockdep_assert_held(ctrl->handler->lock); /* It's a driver bug if this happens. */ if (WARN_ON(ctrl->is_ptr || ctrl->type != V4L2_CTRL_TYPE_INTEGER64)) return -EINVAL; *ctrl->p_new.p_s64 = val; return set_ctrl(NULL, ctrl, 0); } EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl_int64); int __v4l2_ctrl_s_ctrl_string(struct v4l2_ctrl *ctrl, const char *s) { lockdep_assert_held(ctrl->handler->lock); /* It's a driver bug if this happens. */ if (WARN_ON(ctrl->type != V4L2_CTRL_TYPE_STRING)) return -EINVAL; strscpy(ctrl->p_new.p_char, s, ctrl->maximum + 1); return set_ctrl(NULL, ctrl, 0); } EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl_string); int __v4l2_ctrl_s_ctrl_compound(struct v4l2_ctrl *ctrl, enum v4l2_ctrl_type type, const void *p) { lockdep_assert_held(ctrl->handler->lock); /* It's a driver bug if this happens. */ if (WARN_ON(ctrl->type != type)) return -EINVAL; /* Setting dynamic arrays is not (yet?) supported. */ if (WARN_ON(ctrl->is_dyn_array)) return -EINVAL; memcpy(ctrl->p_new.p, p, ctrl->elems * ctrl->elem_size); return set_ctrl(NULL, ctrl, 0); } EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl_compound); /* * Modify the range of a control. */ int __v4l2_ctrl_modify_range(struct v4l2_ctrl *ctrl, s64 min, s64 max, u64 step, s64 def) { bool value_changed; bool range_changed = false; int ret; lockdep_assert_held(ctrl->handler->lock); switch (ctrl->type) { case V4L2_CTRL_TYPE_INTEGER: case V4L2_CTRL_TYPE_INTEGER64: case V4L2_CTRL_TYPE_BOOLEAN: case V4L2_CTRL_TYPE_MENU: case V4L2_CTRL_TYPE_INTEGER_MENU: case V4L2_CTRL_TYPE_BITMASK: case V4L2_CTRL_TYPE_U8: case V4L2_CTRL_TYPE_U16: case V4L2_CTRL_TYPE_U32: if (ctrl->is_array) return -EINVAL; ret = check_range(ctrl->type, min, max, step, def); if (ret) return ret; break; default: return -EINVAL; } if (ctrl->minimum != min || ctrl->maximum != max || ctrl->step != step || ctrl->default_value != def) { range_changed = true; ctrl->minimum = min; ctrl->maximum = max; ctrl->step = step; ctrl->default_value = def; } cur_to_new(ctrl); if (validate_new(ctrl, ctrl->p_new)) { if (ctrl->type == V4L2_CTRL_TYPE_INTEGER64) *ctrl->p_new.p_s64 = def; else *ctrl->p_new.p_s32 = def; } if (ctrl->type == V4L2_CTRL_TYPE_INTEGER64) value_changed = *ctrl->p_new.p_s64 != *ctrl->p_cur.p_s64; else value_changed = *ctrl->p_new.p_s32 != *ctrl->p_cur.p_s32; if (value_changed) ret = set_ctrl(NULL, ctrl, V4L2_EVENT_CTRL_CH_RANGE); else if (range_changed) send_event(NULL, ctrl, V4L2_EVENT_CTRL_CH_RANGE); return ret; } EXPORT_SYMBOL(__v4l2_ctrl_modify_range); int __v4l2_ctrl_modify_dimensions(struct v4l2_ctrl *ctrl, u32 dims[V4L2_CTRL_MAX_DIMS]) { unsigned int elems = 1; unsigned int i; void *p_array; lockdep_assert_held(ctrl->handler->lock); if (!ctrl->is_array || ctrl->is_dyn_array) return -EINVAL; for (i = 0; i < ctrl->nr_of_dims; i++) elems *= dims[i]; if (elems == 0) return -EINVAL; p_array = kvzalloc(2 * elems * ctrl->elem_size, GFP_KERNEL); if (!p_array) return -ENOMEM; kvfree(ctrl->p_array); ctrl->p_array_alloc_elems = elems; ctrl->elems = elems; ctrl->new_elems = elems; ctrl->p_array = p_array; ctrl->p_new.p = p_array; ctrl->p_cur.p = p_array + elems * ctrl->elem_size; for (i = 0; i < ctrl->nr_of_dims; i++) ctrl->dims[i] = dims[i]; ctrl->type_ops->init(ctrl, 0, ctrl->p_cur); cur_to_new(ctrl); send_event(NULL, ctrl, V4L2_EVENT_CTRL_CH_VALUE | V4L2_EVENT_CTRL_CH_DIMENSIONS); return 0; } EXPORT_SYMBOL(__v4l2_ctrl_modify_dimensions); /* Implement VIDIOC_QUERY_EXT_CTRL */ int v4l2_query_ext_ctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_query_ext_ctrl *qc) { const unsigned int next_flags = V4L2_CTRL_FLAG_NEXT_CTRL | V4L2_CTRL_FLAG_NEXT_COMPOUND; u32 id = qc->id & V4L2_CTRL_ID_MASK; struct v4l2_ctrl_ref *ref; struct v4l2_ctrl *ctrl; if (!hdl) return -EINVAL; mutex_lock(hdl->lock); /* Try to find it */ ref = find_ref(hdl, id); if ((qc->id & next_flags) && !list_empty(&hdl->ctrl_refs)) { bool is_compound; /* Match any control that is not hidden */ unsigned int mask = 1; bool match = false; if ((qc->id & next_flags) == V4L2_CTRL_FLAG_NEXT_COMPOUND) { /* Match any hidden control */ match = true; } else if ((qc->id & next_flags) == next_flags) { /* Match any control, compound or not */ mask = 0; } /* Find the next control with ID > qc->id */ /* Did we reach the end of the control list? */ if (id >= node2id(hdl->ctrl_refs.prev)) { ref = NULL; /* Yes, so there is no next control */ } else if (ref) { struct v4l2_ctrl_ref *pos = ref; /* * We found a control with the given ID, so just get * the next valid one in the list. */ ref = NULL; list_for_each_entry_continue(pos, &hdl->ctrl_refs, node) { is_compound = pos->ctrl->is_array || pos->ctrl->type >= V4L2_CTRL_COMPOUND_TYPES; if (id < pos->ctrl->id && (is_compound & mask) == match) { ref = pos; break; } } } else { struct v4l2_ctrl_ref *pos; /* * No control with the given ID exists, so start * searching for the next largest ID. We know there * is one, otherwise the first 'if' above would have * been true. */ list_for_each_entry(pos, &hdl->ctrl_refs, node) { is_compound = pos->ctrl->is_array || pos->ctrl->type >= V4L2_CTRL_COMPOUND_TYPES; if (id < pos->ctrl->id && (is_compound & mask) == match) { ref = pos; break; } } } } mutex_unlock(hdl->lock); if (!ref) return -EINVAL; ctrl = ref->ctrl; memset(qc, 0, sizeof(*qc)); if (id >= V4L2_CID_PRIVATE_BASE) qc->id = id; else qc->id = ctrl->id; strscpy(qc->name, ctrl->name, sizeof(qc->name)); qc->flags = user_flags(ctrl); qc->type = ctrl->type; qc->elem_size = ctrl->elem_size; qc->elems = ctrl->elems; qc->nr_of_dims = ctrl->nr_of_dims; memcpy(qc->dims, ctrl->dims, qc->nr_of_dims * sizeof(qc->dims[0])); qc->minimum = ctrl->minimum; qc->maximum = ctrl->maximum; qc->default_value = ctrl->default_value; if (ctrl->type == V4L2_CTRL_TYPE_MENU || ctrl->type == V4L2_CTRL_TYPE_INTEGER_MENU) qc->step = 1; else qc->step = ctrl->step; return 0; } EXPORT_SYMBOL(v4l2_query_ext_ctrl); /* Implement VIDIOC_QUERYCTRL */ int v4l2_queryctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_queryctrl *qc) { struct v4l2_query_ext_ctrl qec = { qc->id }; int rc; rc = v4l2_query_ext_ctrl(hdl, &qec); if (rc) return rc; qc->id = qec.id; qc->type = qec.type; qc->flags = qec.flags; strscpy(qc->name, qec.name, sizeof(qc->name)); switch (qc->type) { case V4L2_CTRL_TYPE_INTEGER: case V4L2_CTRL_TYPE_BOOLEAN: case V4L2_CTRL_TYPE_MENU: case V4L2_CTRL_TYPE_INTEGER_MENU: case V4L2_CTRL_TYPE_STRING: case V4L2_CTRL_TYPE_BITMASK: qc->minimum = qec.minimum; qc->maximum = qec.maximum; qc->step = qec.step; qc->default_value = qec.default_value; break; default: qc->minimum = 0; qc->maximum = 0; qc->step = 0; qc->default_value = 0; break; } return 0; } EXPORT_SYMBOL(v4l2_queryctrl); /* Implement VIDIOC_QUERYMENU */ int v4l2_querymenu(struct v4l2_ctrl_handler *hdl, struct v4l2_querymenu *qm) { struct v4l2_ctrl *ctrl; u32 i = qm->index; ctrl = v4l2_ctrl_find(hdl, qm->id); if (!ctrl) return -EINVAL; qm->reserved = 0; /* Sanity checks */ switch (ctrl->type) { case V4L2_CTRL_TYPE_MENU: if (!ctrl->qmenu) return -EINVAL; break; case V4L2_CTRL_TYPE_INTEGER_MENU: if (!ctrl->qmenu_int) return -EINVAL; break; default: return -EINVAL; } if (i < ctrl->minimum || i > ctrl->maximum) return -EINVAL; /* Use mask to see if this menu item should be skipped */ if (i < BITS_PER_LONG_LONG && (ctrl->menu_skip_mask & BIT_ULL(i))) return -EINVAL; /* Empty menu items should also be skipped */ if (ctrl->type == V4L2_CTRL_TYPE_MENU) { if (!ctrl->qmenu[i] || ctrl->qmenu[i][0] == '\0') return -EINVAL; strscpy(qm->name, ctrl->qmenu[i], sizeof(qm->name)); } else { qm->value = ctrl->qmenu_int[i]; } return 0; } EXPORT_SYMBOL(v4l2_querymenu); /* * VIDIOC_LOG_STATUS helpers */ int v4l2_ctrl_log_status(struct file *file, void *fh) { struct video_device *vfd = video_devdata(file); struct v4l2_fh *vfh = file->private_data; if (test_bit(V4L2_FL_USES_V4L2_FH, &vfd->flags) && vfd->v4l2_dev) v4l2_ctrl_handler_log_status(vfh->ctrl_handler, vfd->v4l2_dev->name); return 0; } EXPORT_SYMBOL(v4l2_ctrl_log_status); int v4l2_ctrl_subdev_log_status(struct v4l2_subdev *sd) { v4l2_ctrl_handler_log_status(sd->ctrl_handler, sd->name); return 0; } EXPORT_SYMBOL(v4l2_ctrl_subdev_log_status); /* * VIDIOC_(UN)SUBSCRIBE_EVENT implementation */ static int v4l2_ctrl_add_event(struct v4l2_subscribed_event *sev, unsigned int elems) { struct v4l2_ctrl *ctrl = v4l2_ctrl_find(sev->fh->ctrl_handler, sev->id); if (!ctrl) return -EINVAL; v4l2_ctrl_lock(ctrl); list_add_tail(&sev->node, &ctrl->ev_subs); if (ctrl->type != V4L2_CTRL_TYPE_CTRL_CLASS && (sev->flags & V4L2_EVENT_SUB_FL_SEND_INITIAL)) send_initial_event(sev->fh, ctrl); v4l2_ctrl_unlock(ctrl); return 0; } static void v4l2_ctrl_del_event(struct v4l2_subscribed_event *sev) { struct v4l2_ctrl *ctrl = v4l2_ctrl_find(sev->fh->ctrl_handler, sev->id); if (!ctrl) return; v4l2_ctrl_lock(ctrl); list_del(&sev->node); v4l2_ctrl_unlock(ctrl); } void v4l2_ctrl_replace(struct v4l2_event *old, const struct v4l2_event *new) { u32 old_changes = old->u.ctrl.changes; old->u.ctrl = new->u.ctrl; old->u.ctrl.changes |= old_changes; } EXPORT_SYMBOL(v4l2_ctrl_replace); void v4l2_ctrl_merge(const struct v4l2_event *old, struct v4l2_event *new) { new->u.ctrl.changes |= old->u.ctrl.changes; } EXPORT_SYMBOL(v4l2_ctrl_merge); const struct v4l2_subscribed_event_ops v4l2_ctrl_sub_ev_ops = { .add = v4l2_ctrl_add_event, .del = v4l2_ctrl_del_event, .replace = v4l2_ctrl_replace, .merge = v4l2_ctrl_merge, }; EXPORT_SYMBOL(v4l2_ctrl_sub_ev_ops); int v4l2_ctrl_subscribe_event(struct v4l2_fh *fh, const struct v4l2_event_subscription *sub) { if (sub->type == V4L2_EVENT_CTRL) return v4l2_event_subscribe(fh, sub, 0, &v4l2_ctrl_sub_ev_ops); return -EINVAL; } EXPORT_SYMBOL(v4l2_ctrl_subscribe_event); int v4l2_ctrl_subdev_subscribe_event(struct v4l2_subdev *sd, struct v4l2_fh *fh, struct v4l2_event_subscription *sub) { if (!sd->ctrl_handler) return -EINVAL; return v4l2_ctrl_subscribe_event(fh, sub); } EXPORT_SYMBOL(v4l2_ctrl_subdev_subscribe_event); /* * poll helper */ __poll_t v4l2_ctrl_poll(struct file *file, struct poll_table_struct *wait) { struct v4l2_fh *fh = file->private_data; poll_wait(file, &fh->wait, wait); if (v4l2_event_pending(fh)) return EPOLLPRI; return 0; } EXPORT_SYMBOL(v4l2_ctrl_poll); |
10 10 7 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2005-2006 Micronas USA Inc. */ #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/spinlock.h> #include <linux/unistd.h> #include <linux/time.h> #include <linux/mm.h> #include <linux/vmalloc.h> #include <linux/device.h> #include <linux/i2c.h> #include <linux/firmware.h> #include <linux/mutex.h> #include <linux/uaccess.h> #include <linux/slab.h> #include <linux/videodev2.h> #include <media/tuner.h> #include <media/v4l2-common.h> #include <media/v4l2-event.h> #include "go7007-priv.h" /* * Wait for an interrupt to be delivered from the GO7007SB and return * the associated value and data. * * Must be called with the hw_lock held. */ int go7007_read_interrupt(struct go7007 *go, u16 *value, u16 *data) { go->interrupt_available = 0; go->hpi_ops->read_interrupt(go); if (wait_event_timeout(go->interrupt_waitq, go->interrupt_available, 5*HZ) < 0) { v4l2_err(&go->v4l2_dev, "timeout waiting for read interrupt\n"); return -1; } if (!go->interrupt_available) return -1; go->interrupt_available = 0; *value = go->interrupt_value & 0xfffe; *data = go->interrupt_data; return 0; } EXPORT_SYMBOL(go7007_read_interrupt); /* * Read a register/address on the GO7007SB. * * Must be called with the hw_lock held. */ int go7007_read_addr(struct go7007 *go, u16 addr, u16 *data) { int count = 100; u16 value; if (go7007_write_interrupt(go, 0x0010, addr) < 0) return -EIO; while (count-- > 0) { if (go7007_read_interrupt(go, &value, data) == 0 && value == 0xa000) return 0; } return -EIO; } EXPORT_SYMBOL(go7007_read_addr); /* * Send the boot firmware to the encoder, which just wakes it up and lets * us talk to the GPIO pins and on-board I2C adapter. * * Must be called with the hw_lock held. */ static int go7007_load_encoder(struct go7007 *go) { const struct firmware *fw_entry; char fw_name[] = "go7007/go7007fw.bin"; void *bounce; int fw_len; u16 intr_val, intr_data; if (go->boot_fw == NULL) { if (request_firmware(&fw_entry, fw_name, go->dev)) { v4l2_err(go, "unable to load firmware from file \"%s\"\n", fw_name); return -1; } if (fw_entry->size < 16 || memcmp(fw_entry->data, "WISGO7007FW", 11)) { v4l2_err(go, "file \"%s\" does not appear to be go7007 firmware\n", fw_name); release_firmware(fw_entry); return -1; } fw_len = fw_entry->size - 16; bounce = kmemdup(fw_entry->data + 16, fw_len, GFP_KERNEL); if (bounce == NULL) { v4l2_err(go, "unable to allocate %d bytes for firmware transfer\n", fw_len); release_firmware(fw_entry); return -1; } release_firmware(fw_entry); go->boot_fw_len = fw_len; go->boot_fw = bounce; } if (go7007_interface_reset(go) < 0 || go7007_send_firmware(go, go->boot_fw, go->boot_fw_len) < 0 || go7007_read_interrupt(go, &intr_val, &intr_data) < 0 || (intr_val & ~0x1) != 0x5a5a) { v4l2_err(go, "error transferring firmware\n"); kfree(go->boot_fw); go->boot_fw = NULL; return -1; } return 0; } MODULE_FIRMWARE("go7007/go7007fw.bin"); /* * Boot the encoder and register the I2C adapter if requested. Do the * minimum initialization necessary, since the board-specific code may * still need to probe the board ID. * * Must NOT be called with the hw_lock held. */ int go7007_boot_encoder(struct go7007 *go, int init_i2c) { int ret; mutex_lock(&go->hw_lock); ret = go7007_load_encoder(go); mutex_unlock(&go->hw_lock); if (ret < 0) return -1; if (!init_i2c) return 0; if (go7007_i2c_init(go) < 0) return -1; go->i2c_adapter_online = 1; return 0; } EXPORT_SYMBOL(go7007_boot_encoder); /* * Configure any hardware-related registers in the GO7007, such as GPIO * pins and bus parameters, which are board-specific. This assumes * the boot firmware has already been downloaded. * * Must be called with the hw_lock held. */ static int go7007_init_encoder(struct go7007 *go) { if (go->board_info->audio_flags & GO7007_AUDIO_I2S_MASTER) { go7007_write_addr(go, 0x1000, 0x0811); go7007_write_addr(go, 0x1000, 0x0c11); } switch (go->board_id) { case GO7007_BOARDID_MATRIX_REV: /* Set GPIO pin 0 to be an output (audio clock control) */ go7007_write_addr(go, 0x3c82, 0x0001); go7007_write_addr(go, 0x3c80, 0x00fe); break; case GO7007_BOARDID_ADLINK_MPG24: /* set GPIO5 to be an output, currently low */ go7007_write_addr(go, 0x3c82, 0x0000); go7007_write_addr(go, 0x3c80, 0x00df); break; case GO7007_BOARDID_ADS_USBAV_709: /* GPIO pin 0: audio clock control */ /* pin 2: TW9906 reset */ /* pin 3: capture LED */ go7007_write_addr(go, 0x3c82, 0x000d); go7007_write_addr(go, 0x3c80, 0x00f2); break; } return 0; } /* * Send the boot firmware to the GO7007 and configure the registers. This * is the only way to stop the encoder once it has started streaming video. * * Must be called with the hw_lock held. */ int go7007_reset_encoder(struct go7007 *go) { if (go7007_load_encoder(go) < 0) return -1; return go7007_init_encoder(go); } /* * Attempt to instantiate an I2C client by ID, probably loading a module. */ static int init_i2c_module(struct i2c_adapter *adapter, const struct go_i2c *const i2c) { struct go7007 *go = i2c_get_adapdata(adapter); struct v4l2_device *v4l2_dev = &go->v4l2_dev; struct v4l2_subdev *sd; struct i2c_board_info info; memset(&info, 0, sizeof(info)); strscpy(info.type, i2c->type, sizeof(info.type)); info.addr = i2c->addr; info.flags = i2c->flags; sd = v4l2_i2c_new_subdev_board(v4l2_dev, adapter, &info, NULL); if (sd) { if (i2c->is_video) go->sd_video = sd; if (i2c->is_audio) go->sd_audio = sd; return 0; } pr_info("go7007: probing for module i2c:%s failed\n", i2c->type); return -EINVAL; } /* * Detach and unregister the encoder. The go7007 struct won't be freed * until v4l2 finishes releasing its resources and all associated fds are * closed by applications. */ static void go7007_remove(struct v4l2_device *v4l2_dev) { struct go7007 *go = container_of(v4l2_dev, struct go7007, v4l2_dev); v4l2_device_unregister(v4l2_dev); if (go->hpi_ops->release) go->hpi_ops->release(go); if (go->i2c_adapter_online) { i2c_del_adapter(&go->i2c_adapter); go->i2c_adapter_online = 0; } kfree(go->boot_fw); go7007_v4l2_remove(go); kfree(go); } /* * Finalize the GO7007 hardware setup, register the on-board I2C adapter * (if used on this board), load the I2C client driver for the sensor * (SAA7115 or whatever) and other devices, and register the ALSA and V4L2 * interfaces. * * Must NOT be called with the hw_lock held. */ int go7007_register_encoder(struct go7007 *go, unsigned num_i2c_devs) { int i, ret; dev_info(go->dev, "go7007: registering new %s\n", go->name); go->v4l2_dev.release = go7007_remove; ret = v4l2_device_register(go->dev, &go->v4l2_dev); if (ret < 0) return ret; mutex_lock(&go->hw_lock); ret = go7007_init_encoder(go); mutex_unlock(&go->hw_lock); if (ret < 0) return ret; ret = go7007_v4l2_ctrl_init(go); if (ret < 0) return ret; if (!go->i2c_adapter_online && go->board_info->flags & GO7007_BOARD_USE_ONBOARD_I2C) { ret = go7007_i2c_init(go); if (ret < 0) return ret; go->i2c_adapter_online = 1; } if (go->i2c_adapter_online) { if (go->board_id == GO7007_BOARDID_ADS_USBAV_709) { /* Reset the TW9906 */ go7007_write_addr(go, 0x3c82, 0x0009); msleep(50); go7007_write_addr(go, 0x3c82, 0x000d); } for (i = 0; i < num_i2c_devs; ++i) init_i2c_module(&go->i2c_adapter, &go->board_info->i2c_devs[i]); if (go->tuner_type >= 0) { struct tuner_setup setup = { .addr = ADDR_UNSET, .type = go->tuner_type, .mode_mask = T_ANALOG_TV, }; v4l2_device_call_all(&go->v4l2_dev, 0, tuner, s_type_addr, &setup); } if (go->board_id == GO7007_BOARDID_ADLINK_MPG24) v4l2_subdev_call(go->sd_video, video, s_routing, 0, 0, go->channel_number + 1); } ret = go7007_v4l2_init(go); if (ret < 0) return ret; if (go->board_info->flags & GO7007_BOARD_HAS_AUDIO) { go->audio_enabled = 1; go7007_snd_init(go); } return 0; } EXPORT_SYMBOL(go7007_register_encoder); /* * Send the encode firmware to the encoder, which will cause it * to immediately start delivering the video and audio streams. * * Must be called with the hw_lock held. */ int go7007_start_encoder(struct go7007 *go) { u8 *fw; int fw_len, rv = 0, i, x, y; u16 intr_val, intr_data; go->modet_enable = 0; for (i = 0; i < 4; i++) go->modet[i].enable = 0; switch (v4l2_ctrl_g_ctrl(go->modet_mode)) { case V4L2_DETECT_MD_MODE_GLOBAL: memset(go->modet_map, 0, sizeof(go->modet_map)); go->modet[0].enable = 1; go->modet_enable = 1; break; case V4L2_DETECT_MD_MODE_REGION_GRID: for (y = 0; y < go->height / 16; y++) { for (x = 0; x < go->width / 16; x++) { int idx = y * go->width / 16 + x; go->modet[go->modet_map[idx]].enable = 1; } } go->modet_enable = 1; break; } if (go->dvd_mode) go->modet_enable = 0; if (go7007_construct_fw_image(go, &fw, &fw_len) < 0) return -1; if (go7007_send_firmware(go, fw, fw_len) < 0 || go7007_read_interrupt(go, &intr_val, &intr_data) < 0) { v4l2_err(&go->v4l2_dev, "error transferring firmware\n"); rv = -1; goto start_error; } go->state = STATE_DATA; go->parse_length = 0; go->seen_frame = 0; if (go7007_stream_start(go) < 0) { v4l2_err(&go->v4l2_dev, "error starting stream transfer\n"); rv = -1; goto start_error; } start_error: kfree(fw); return rv; } /* * Store a byte in the current video buffer, if there is one. */ static inline void store_byte(struct go7007_buffer *vb, u8 byte) { if (vb && vb->vb.vb2_buf.planes[0].bytesused < GO7007_BUF_SIZE) { u8 *ptr = vb2_plane_vaddr(&vb->vb.vb2_buf, 0); ptr[vb->vb.vb2_buf.planes[0].bytesused++] = byte; } } static void go7007_set_motion_regions(struct go7007 *go, struct go7007_buffer *vb, u32 motion_regions) { if (motion_regions != go->modet_event_status) { struct v4l2_event ev = { .type = V4L2_EVENT_MOTION_DET, .u.motion_det = { .flags = V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ, .frame_sequence = vb->vb.sequence, .region_mask = motion_regions, }, }; v4l2_event_queue(&go->vdev, &ev); go->modet_event_status = motion_regions; } } /* * Determine regions with motion and send a motion detection event * in case of changes. */ static void go7007_motion_regions(struct go7007 *go, struct go7007_buffer *vb) { u32 *bytesused = &vb->vb.vb2_buf.planes[0].bytesused; unsigned motion[4] = { 0, 0, 0, 0 }; u32 motion_regions = 0; unsigned stride = (go->width + 7) >> 3; unsigned x, y; int i; for (i = 0; i < 216; ++i) store_byte(vb, go->active_map[i]); for (y = 0; y < go->height / 16; y++) { for (x = 0; x < go->width / 16; x++) { if (!(go->active_map[y * stride + (x >> 3)] & (1 << (x & 7)))) continue; motion[go->modet_map[y * (go->width / 16) + x]]++; } } motion_regions = ((motion[0] > 0) << 0) | ((motion[1] > 0) << 1) | ((motion[2] > 0) << 2) | ((motion[3] > 0) << 3); *bytesused -= 216; go7007_set_motion_regions(go, vb, motion_regions); } /* * Deliver the last video buffer and get a new one to start writing to. */ static struct go7007_buffer *frame_boundary(struct go7007 *go, struct go7007_buffer *vb) { u32 *bytesused; struct go7007_buffer *vb_tmp = NULL; unsigned long flags; if (vb == NULL) { spin_lock_irqsave(&go->spinlock, flags); if (!list_empty(&go->vidq_active)) vb = go->active_buf = list_first_entry(&go->vidq_active, struct go7007_buffer, list); spin_unlock_irqrestore(&go->spinlock, flags); go->next_seq++; return vb; } bytesused = &vb->vb.vb2_buf.planes[0].bytesused; vb->vb.sequence = go->next_seq++; if (vb->modet_active && *bytesused + 216 < GO7007_BUF_SIZE) go7007_motion_regions(go, vb); else go7007_set_motion_regions(go, vb, 0); vb->vb.vb2_buf.timestamp = ktime_get_ns(); vb_tmp = vb; spin_lock_irqsave(&go->spinlock, flags); list_del(&vb->list); if (list_empty(&go->vidq_active)) vb = NULL; else vb = list_first_entry(&go->vidq_active, struct go7007_buffer, list); go->active_buf = vb; spin_unlock_irqrestore(&go->spinlock, flags); vb2_buffer_done(&vb_tmp->vb.vb2_buf, VB2_BUF_STATE_DONE); return vb; } static void write_bitmap_word(struct go7007 *go) { int x, y, i, stride = ((go->width >> 4) + 7) >> 3; for (i = 0; i < 16; ++i) { y = (((go->parse_length - 1) << 3) + i) / (go->width >> 4); x = (((go->parse_length - 1) << 3) + i) % (go->width >> 4); if (stride * y + (x >> 3) < sizeof(go->active_map)) go->active_map[stride * y + (x >> 3)] |= (go->modet_word & 1) << (x & 0x7); go->modet_word >>= 1; } } /* * Parse a chunk of the video stream into frames. The frames are not * delimited by the hardware, so we have to parse the frame boundaries * based on the type of video stream we're receiving. */ void go7007_parse_video_stream(struct go7007 *go, u8 *buf, int length) { struct go7007_buffer *vb = go->active_buf; int i, seq_start_code = -1, gop_start_code = -1, frame_start_code = -1; switch (go->format) { case V4L2_PIX_FMT_MPEG4: seq_start_code = 0xB0; gop_start_code = 0xB3; frame_start_code = 0xB6; break; case V4L2_PIX_FMT_MPEG1: case V4L2_PIX_FMT_MPEG2: seq_start_code = 0xB3; gop_start_code = 0xB8; frame_start_code = 0x00; break; } for (i = 0; i < length; ++i) { if (vb && vb->vb.vb2_buf.planes[0].bytesused >= GO7007_BUF_SIZE - 3) { v4l2_info(&go->v4l2_dev, "dropping oversized frame\n"); vb2_set_plane_payload(&vb->vb.vb2_buf, 0, 0); vb->frame_offset = 0; vb->modet_active = 0; vb = go->active_buf = NULL; } switch (go->state) { case STATE_DATA: switch (buf[i]) { case 0x00: go->state = STATE_00; break; case 0xFF: go->state = STATE_FF; break; default: store_byte(vb, buf[i]); break; } break; case STATE_00: switch (buf[i]) { case 0x00: go->state = STATE_00_00; break; case 0xFF: store_byte(vb, 0x00); go->state = STATE_FF; break; default: store_byte(vb, 0x00); store_byte(vb, buf[i]); go->state = STATE_DATA; break; } break; case STATE_00_00: switch (buf[i]) { case 0x00: store_byte(vb, 0x00); /* go->state remains STATE_00_00 */ break; case 0x01: go->state = STATE_00_00_01; break; case 0xFF: store_byte(vb, 0x00); store_byte(vb, 0x00); go->state = STATE_FF; break; default: store_byte(vb, 0x00); store_byte(vb, 0x00); store_byte(vb, buf[i]); go->state = STATE_DATA; break; } break; case STATE_00_00_01: if (buf[i] == 0xF8 && go->modet_enable == 0) { /* MODET start code, but MODET not enabled */ store_byte(vb, 0x00); store_byte(vb, 0x00); store_byte(vb, 0x01); store_byte(vb, 0xF8); go->state = STATE_DATA; break; } /* If this is the start of a new MPEG frame, * get a new buffer */ if ((go->format == V4L2_PIX_FMT_MPEG1 || go->format == V4L2_PIX_FMT_MPEG2 || go->format == V4L2_PIX_FMT_MPEG4) && (buf[i] == seq_start_code || buf[i] == gop_start_code || buf[i] == frame_start_code)) { if (vb == NULL || go->seen_frame) vb = frame_boundary(go, vb); go->seen_frame = buf[i] == frame_start_code; if (vb && go->seen_frame) vb->frame_offset = vb->vb.vb2_buf.planes[0].bytesused; } /* Handle any special chunk types, or just write the * start code to the (potentially new) buffer */ switch (buf[i]) { case 0xF5: /* timestamp */ go->parse_length = 12; go->state = STATE_UNPARSED; break; case 0xF6: /* vbi */ go->state = STATE_VBI_LEN_A; break; case 0xF8: /* MD map */ go->parse_length = 0; memset(go->active_map, 0, sizeof(go->active_map)); go->state = STATE_MODET_MAP; break; case 0xFF: /* Potential JPEG start code */ store_byte(vb, 0x00); store_byte(vb, 0x00); store_byte(vb, 0x01); go->state = STATE_FF; break; default: store_byte(vb, 0x00); store_byte(vb, 0x00); store_byte(vb, 0x01); store_byte(vb, buf[i]); go->state = STATE_DATA; break; } break; case STATE_FF: switch (buf[i]) { case 0x00: store_byte(vb, 0xFF); go->state = STATE_00; break; case 0xFF: store_byte(vb, 0xFF); /* go->state remains STATE_FF */ break; case 0xD8: if (go->format == V4L2_PIX_FMT_MJPEG) vb = frame_boundary(go, vb); fallthrough; default: store_byte(vb, 0xFF); store_byte(vb, buf[i]); go->state = STATE_DATA; break; } break; case STATE_VBI_LEN_A: go->parse_length = buf[i] << 8; go->state = STATE_VBI_LEN_B; break; case STATE_VBI_LEN_B: go->parse_length |= buf[i]; if (go->parse_length > 0) go->state = STATE_UNPARSED; else go->state = STATE_DATA; break; case STATE_MODET_MAP: if (go->parse_length < 204) { if (go->parse_length & 1) { go->modet_word |= buf[i]; write_bitmap_word(go); } else go->modet_word = buf[i] << 8; } else if (go->parse_length == 207 && vb) { vb->modet_active = buf[i]; } if (++go->parse_length == 208) go->state = STATE_DATA; break; case STATE_UNPARSED: if (--go->parse_length == 0) go->state = STATE_DATA; break; } } } EXPORT_SYMBOL(go7007_parse_video_stream); /* * Allocate a new go7007 struct. Used by the hardware-specific probe. */ struct go7007 *go7007_alloc(const struct go7007_board_info *board, struct device *dev) { struct go7007 *go; go = kzalloc(sizeof(struct go7007), GFP_KERNEL); if (go == NULL) return NULL; go->dev = dev; go->board_info = board; go->tuner_type = -1; mutex_init(&go->hw_lock); init_waitqueue_head(&go->frame_waitq); spin_lock_init(&go->spinlock); go->status = STATUS_INIT; init_waitqueue_head(&go->interrupt_waitq); go7007_update_board(go); go->format = V4L2_PIX_FMT_MJPEG; go->bitrate = 1500000; go->fps_scale = 1; go->aspect_ratio = GO7007_RATIO_1_1; return go; } EXPORT_SYMBOL(go7007_alloc); void go7007_update_board(struct go7007 *go) { const struct go7007_board_info *board = go->board_info; if (board->sensor_flags & GO7007_SENSOR_TV) { go->standard = GO7007_STD_NTSC; go->std = V4L2_STD_NTSC_M; go->width = 720; go->height = 480; go->sensor_framerate = 30000; } else { go->standard = GO7007_STD_OTHER; go->width = board->sensor_width; go->height = board->sensor_height; go->sensor_framerate = board->sensor_framerate; } go->encoder_v_offset = board->sensor_v_offset; go->encoder_h_offset = board->sensor_h_offset; } EXPORT_SYMBOL(go7007_update_board); MODULE_DESCRIPTION("WIS GO7007 MPEG encoder support"); MODULE_LICENSE("GPL v2"); |
3 6 9 4 6 832 335 553 549 96 4 631 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Generic nexthop implementation * * Copyright (c) 2017-19 Cumulus Networks * Copyright (c) 2017-19 David Ahern <dsa@cumulusnetworks.com> */ #ifndef __LINUX_NEXTHOP_H #define __LINUX_NEXTHOP_H #include <linux/netdevice.h> #include <linux/notifier.h> #include <linux/route.h> #include <linux/types.h> #include <net/ip_fib.h> #include <net/ip6_fib.h> #include <net/netlink.h> #define NEXTHOP_VALID_USER_FLAGS RTNH_F_ONLINK struct nexthop; struct nh_config { u32 nh_id; u8 nh_family; u8 nh_protocol; u8 nh_blackhole; u8 nh_fdb; u32 nh_flags; int nh_ifindex; struct net_device *dev; union { __be32 ipv4; struct in6_addr ipv6; } gw; struct nlattr *nh_grp; u16 nh_grp_type; u16 nh_grp_res_num_buckets; unsigned long nh_grp_res_idle_timer; unsigned long nh_grp_res_unbalanced_timer; bool nh_grp_res_has_num_buckets; bool nh_grp_res_has_idle_timer; bool nh_grp_res_has_unbalanced_timer; bool nh_hw_stats; struct nlattr *nh_encap; u16 nh_encap_type; u32 nlflags; struct nl_info nlinfo; }; struct nh_info { struct hlist_node dev_hash; /* entry on netns devhash */ struct nexthop *nh_parent; u8 family; bool reject_nh; bool fdb_nh; union { struct fib_nh_common fib_nhc; struct fib_nh fib_nh; struct fib6_nh fib6_nh; }; }; struct nh_res_bucket { struct nh_grp_entry __rcu *nh_entry; atomic_long_t used_time; unsigned long migrated_time; bool occupied; u8 nh_flags; }; struct nh_res_table { struct net *net; u32 nhg_id; struct delayed_work upkeep_dw; /* List of NHGEs that have too few buckets ("uw" for underweight). * Reclaimed buckets will be given to entries in this list. */ struct list_head uw_nh_entries; unsigned long unbalanced_since; u32 idle_timer; u32 unbalanced_timer; u16 num_nh_buckets; struct nh_res_bucket nh_buckets[] __counted_by(num_nh_buckets); }; struct nh_grp_entry_stats { u64_stats_t packets; struct u64_stats_sync syncp; }; struct nh_grp_entry { struct nexthop *nh; struct nh_grp_entry_stats __percpu *stats; u16 weight; union { struct { atomic_t upper_bound; } hthr; struct { /* Member on uw_nh_entries. */ struct list_head uw_nh_entry; u16 count_buckets; u16 wants_buckets; } res; }; struct list_head nh_list; struct nexthop *nh_parent; /* nexthop of group with this entry */ u64 packets_hw; }; struct nh_group { struct nh_group *spare; /* spare group for removals */ u16 num_nh; bool is_multipath; bool hash_threshold; bool resilient; bool fdb_nh; bool has_v4; bool hw_stats; struct nh_res_table __rcu *res_table; struct nh_grp_entry nh_entries[] __counted_by(num_nh); }; struct nexthop { struct rb_node rb_node; /* entry on netns rbtree */ struct list_head fi_list; /* v4 entries using nh */ struct list_head f6i_list; /* v6 entries using nh */ struct list_head fdb_list; /* fdb entries using this nh */ struct list_head grp_list; /* nh group entries using this nh */ struct net *net; u32 id; u8 protocol; /* app managing this nh */ u8 nh_flags; bool is_group; refcount_t refcnt; struct rcu_head rcu; union { struct nh_info __rcu *nh_info; struct nh_group __rcu *nh_grp; }; }; enum nexthop_event_type { NEXTHOP_EVENT_DEL, NEXTHOP_EVENT_REPLACE, NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, NEXTHOP_EVENT_BUCKET_REPLACE, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA, }; enum nh_notifier_info_type { NH_NOTIFIER_INFO_TYPE_SINGLE, NH_NOTIFIER_INFO_TYPE_GRP, NH_NOTIFIER_INFO_TYPE_RES_TABLE, NH_NOTIFIER_INFO_TYPE_RES_BUCKET, NH_NOTIFIER_INFO_TYPE_GRP_HW_STATS, }; struct nh_notifier_single_info { struct net_device *dev; u8 gw_family; union { __be32 ipv4; struct in6_addr ipv6; }; u32 id; u8 is_reject:1, is_fdb:1, has_encap:1; }; struct nh_notifier_grp_entry_info { u16 weight; struct nh_notifier_single_info nh; }; struct nh_notifier_grp_info { u16 num_nh; bool is_fdb; bool hw_stats; struct nh_notifier_grp_entry_info nh_entries[] __counted_by(num_nh); }; struct nh_notifier_res_bucket_info { u16 bucket_index; unsigned int idle_timer_ms; bool force; struct nh_notifier_single_info old_nh; struct nh_notifier_single_info new_nh; }; struct nh_notifier_res_table_info { u16 num_nh_buckets; bool hw_stats; struct nh_notifier_single_info nhs[] __counted_by(num_nh_buckets); }; struct nh_notifier_grp_hw_stats_entry_info { u32 id; u64 packets; }; struct nh_notifier_grp_hw_stats_info { u16 num_nh; bool hw_stats_used; struct nh_notifier_grp_hw_stats_entry_info stats[] __counted_by(num_nh); }; struct nh_notifier_info { struct net *net; struct netlink_ext_ack *extack; u32 id; enum nh_notifier_info_type type; union { struct nh_notifier_single_info *nh; struct nh_notifier_grp_info *nh_grp; struct nh_notifier_res_table_info *nh_res_table; struct nh_notifier_res_bucket_info *nh_res_bucket; struct nh_notifier_grp_hw_stats_info *nh_grp_hw_stats; }; }; int register_nexthop_notifier(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack); int __unregister_nexthop_notifier(struct net *net, struct notifier_block *nb); int unregister_nexthop_notifier(struct net *net, struct notifier_block *nb); void nexthop_set_hw_flags(struct net *net, u32 id, bool offload, bool trap); void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index, bool offload, bool trap); void nexthop_res_grp_activity_update(struct net *net, u32 id, u16 num_buckets, unsigned long *activity); void nh_grp_hw_stats_report_delta(struct nh_notifier_grp_hw_stats_info *info, unsigned int nh_idx, u64 delta_packets); /* caller is holding rcu or rtnl; no reference taken to nexthop */ struct nexthop *nexthop_find_by_id(struct net *net, u32 id); void nexthop_free_rcu(struct rcu_head *head); static inline bool nexthop_get(struct nexthop *nh) { return refcount_inc_not_zero(&nh->refcnt); } static inline void nexthop_put(struct nexthop *nh) { if (refcount_dec_and_test(&nh->refcnt)) call_rcu_hurry(&nh->rcu, nexthop_free_rcu); } static inline bool nexthop_cmp(const struct nexthop *nh1, const struct nexthop *nh2) { return nh1 == nh2; } static inline bool nexthop_is_fdb(const struct nexthop *nh) { if (nh->is_group) { const struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); return nh_grp->fdb_nh; } else { const struct nh_info *nhi; nhi = rcu_dereference_rtnl(nh->nh_info); return nhi->fdb_nh; } } static inline bool nexthop_has_v4(const struct nexthop *nh) { if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); return nh_grp->has_v4; } return false; } static inline bool nexthop_is_multipath(const struct nexthop *nh) { if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); return nh_grp->is_multipath; } return false; } struct nexthop *nexthop_select_path(struct nexthop *nh, int hash); static inline unsigned int nexthop_num_path(const struct nexthop *nh) { unsigned int rc = 1; if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); if (nh_grp->is_multipath) rc = nh_grp->num_nh; } return rc; } static inline struct nexthop *nexthop_mpath_select(const struct nh_group *nhg, int nhsel) { /* for_nexthops macros in fib_semantics.c grabs a pointer to * the nexthop before checking nhsel */ if (nhsel >= nhg->num_nh) return NULL; return nhg->nh_entries[nhsel].nh; } static inline int nexthop_mpath_fill_node(struct sk_buff *skb, struct nexthop *nh, u8 rt_family) { struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp); int i; for (i = 0; i < nhg->num_nh; i++) { struct nexthop *nhe = nhg->nh_entries[i].nh; struct nh_info *nhi = rcu_dereference_rtnl(nhe->nh_info); struct fib_nh_common *nhc = &nhi->fib_nhc; int weight = nhg->nh_entries[i].weight; if (fib_add_nexthop(skb, nhc, weight, rt_family, 0) < 0) return -EMSGSIZE; } return 0; } /* called with rcu lock */ static inline bool nexthop_is_blackhole(const struct nexthop *nh) { const struct nh_info *nhi; if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); if (nh_grp->num_nh > 1) return false; nh = nh_grp->nh_entries[0].nh; } nhi = rcu_dereference_rtnl(nh->nh_info); return nhi->reject_nh; } static inline void nexthop_path_fib_result(struct fib_result *res, int hash) { struct nh_info *nhi; struct nexthop *nh; nh = nexthop_select_path(res->fi->nh, hash); nhi = rcu_dereference(nh->nh_info); res->nhc = &nhi->fib_nhc; } /* called with rcu read lock or rtnl held */ static inline struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel) { struct nh_info *nhi; BUILD_BUG_ON(offsetof(struct fib_nh, nh_common) != 0); BUILD_BUG_ON(offsetof(struct fib6_nh, nh_common) != 0); if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); if (nh_grp->is_multipath) { nh = nexthop_mpath_select(nh_grp, nhsel); if (!nh) return NULL; } } nhi = rcu_dereference_rtnl(nh->nh_info); return &nhi->fib_nhc; } /* called from fib_table_lookup with rcu_lock */ static inline struct fib_nh_common *nexthop_get_nhc_lookup(const struct nexthop *nh, int fib_flags, const struct flowi4 *flp, int *nhsel) { struct nh_info *nhi; if (nh->is_group) { struct nh_group *nhg = rcu_dereference(nh->nh_grp); int i; for (i = 0; i < nhg->num_nh; i++) { struct nexthop *nhe = nhg->nh_entries[i].nh; nhi = rcu_dereference(nhe->nh_info); if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) { *nhsel = i; return &nhi->fib_nhc; } } } else { nhi = rcu_dereference(nh->nh_info); if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) { *nhsel = 0; return &nhi->fib_nhc; } } return NULL; } static inline bool nexthop_uses_dev(const struct nexthop *nh, const struct net_device *dev) { struct nh_info *nhi; if (nh->is_group) { struct nh_group *nhg = rcu_dereference(nh->nh_grp); int i; for (i = 0; i < nhg->num_nh; i++) { struct nexthop *nhe = nhg->nh_entries[i].nh; nhi = rcu_dereference(nhe->nh_info); if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev)) return true; } } else { nhi = rcu_dereference(nh->nh_info); if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev)) return true; } return false; } static inline unsigned int fib_info_num_path(const struct fib_info *fi) { if (unlikely(fi->nh)) return nexthop_num_path(fi->nh); return fi->fib_nhs; } int fib_check_nexthop(struct nexthop *nh, u8 scope, struct netlink_ext_ack *extack); static inline struct fib_nh_common *fib_info_nhc(struct fib_info *fi, int nhsel) { if (unlikely(fi->nh)) return nexthop_fib_nhc(fi->nh, nhsel); return &fi->fib_nh[nhsel].nh_common; } /* only used when fib_nh is built into fib_info */ static inline struct fib_nh *fib_info_nh(struct fib_info *fi, int nhsel) { WARN_ON(fi->nh); return &fi->fib_nh[nhsel]; } /* * IPv6 variants */ int fib6_check_nexthop(struct nexthop *nh, struct fib6_config *cfg, struct netlink_ext_ack *extack); /* Caller should either hold rcu_read_lock(), or RTNL. */ static inline struct fib6_nh *nexthop_fib6_nh(struct nexthop *nh) { struct nh_info *nhi; if (nh->is_group) { struct nh_group *nh_grp; nh_grp = rcu_dereference_rtnl(nh->nh_grp); nh = nexthop_mpath_select(nh_grp, 0); if (!nh) return NULL; } nhi = rcu_dereference_rtnl(nh->nh_info); if (nhi->family == AF_INET6) return &nhi->fib6_nh; return NULL; } static inline struct net_device *fib6_info_nh_dev(struct fib6_info *f6i) { struct fib6_nh *fib6_nh; fib6_nh = f6i->nh ? nexthop_fib6_nh(f6i->nh) : f6i->fib6_nh; return fib6_nh->fib_nh_dev; } static inline void nexthop_path_fib6_result(struct fib6_result *res, int hash) { struct nexthop *nh = res->f6i->nh; struct nh_info *nhi; nh = nexthop_select_path(nh, hash); nhi = rcu_dereference_rtnl(nh->nh_info); if (nhi->reject_nh) { res->fib6_type = RTN_BLACKHOLE; res->fib6_flags |= RTF_REJECT; res->nh = nexthop_fib6_nh(nh); } else { res->nh = &nhi->fib6_nh; } } int nexthop_for_each_fib6_nh(struct nexthop *nh, int (*cb)(struct fib6_nh *nh, void *arg), void *arg); static inline int nexthop_get_family(struct nexthop *nh) { struct nh_info *nhi = rcu_dereference_rtnl(nh->nh_info); return nhi->family; } static inline struct fib_nh_common *nexthop_fdb_nhc(struct nexthop *nh) { struct nh_info *nhi = rcu_dereference_rtnl(nh->nh_info); return &nhi->fib_nhc; } static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh, int hash) { struct nh_info *nhi; struct nexthop *nhp; nhp = nexthop_select_path(nh, hash); if (unlikely(!nhp)) return NULL; nhi = rcu_dereference(nhp->nh_info); return &nhi->fib_nhc; } #endif |
1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/kernel/reboot.c * * Copyright (C) 2013 Linus Torvalds */ #define pr_fmt(fmt) "reboot: " fmt #include <linux/atomic.h> #include <linux/ctype.h> #include <linux/export.h> #include <linux/kexec.h> #include <linux/kmod.h> #include <linux/kmsg_dump.h> #include <linux/reboot.h> #include <linux/suspend.h> #include <linux/syscalls.h> #include <linux/syscore_ops.h> #include <linux/uaccess.h> /* * this indicates whether you can reboot with ctrl-alt-del: the default is yes */ static int C_A_D = 1; struct pid *cad_pid; EXPORT_SYMBOL(cad_pid); #if defined(CONFIG_ARM) #define DEFAULT_REBOOT_MODE = REBOOT_HARD #else #define DEFAULT_REBOOT_MODE #endif enum reboot_mode reboot_mode DEFAULT_REBOOT_MODE; EXPORT_SYMBOL_GPL(reboot_mode); enum reboot_mode panic_reboot_mode = REBOOT_UNDEFINED; /* * This variable is used privately to keep track of whether or not * reboot_type is still set to its default value (i.e., reboot= hasn't * been set on the command line). This is needed so that we can * suppress DMI scanning for reboot quirks. Without it, it's * impossible to override a faulty reboot quirk without recompiling. */ int reboot_default = 1; int reboot_cpu; enum reboot_type reboot_type = BOOT_ACPI; int reboot_force; struct sys_off_handler { struct notifier_block nb; int (*sys_off_cb)(struct sys_off_data *data); void *cb_data; enum sys_off_mode mode; bool blocking; void *list; struct device *dev; }; /* * This variable is used to indicate if a halt was initiated instead of a * reboot when the reboot call was invoked with LINUX_REBOOT_CMD_POWER_OFF, but * the system cannot be powered off. This allowes kernel_halt() to notify users * of that. */ static bool poweroff_fallback_to_halt; /* * Temporary stub that prevents linkage failure while we're in process * of removing all uses of legacy pm_power_off() around the kernel. */ void __weak (*pm_power_off)(void); /* * Notifier list for kernel code which wants to be called * at shutdown. This is used to stop any idling DMA operations * and the like. */ static BLOCKING_NOTIFIER_HEAD(reboot_notifier_list); /** * emergency_restart - reboot the system * * Without shutting down any hardware or taking any locks * reboot the system. This is called when we know we are in * trouble so this is our best effort to reboot. This is * safe to call in interrupt context. */ void emergency_restart(void) { kmsg_dump(KMSG_DUMP_EMERG); system_state = SYSTEM_RESTART; machine_emergency_restart(); } EXPORT_SYMBOL_GPL(emergency_restart); void kernel_restart_prepare(char *cmd) { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); system_state = SYSTEM_RESTART; usermodehelper_disable(); device_shutdown(); } /** * register_reboot_notifier - Register function to be called at reboot time * @nb: Info about notifier function to be called * * Registers a function with the list of functions * to be called at reboot time. * * Currently always returns zero, as blocking_notifier_chain_register() * always returns zero. */ int register_reboot_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&reboot_notifier_list, nb); } EXPORT_SYMBOL(register_reboot_notifier); /** * unregister_reboot_notifier - Unregister previously registered reboot notifier * @nb: Hook to be unregistered * * Unregisters a previously registered reboot * notifier function. * * Returns zero on success, or %-ENOENT on failure. */ int unregister_reboot_notifier(struct notifier_block *nb) { return blocking_notifier_chain_unregister(&reboot_notifier_list, nb); } EXPORT_SYMBOL(unregister_reboot_notifier); static void devm_unregister_reboot_notifier(struct device *dev, void *res) { WARN_ON(unregister_reboot_notifier(*(struct notifier_block **)res)); } int devm_register_reboot_notifier(struct device *dev, struct notifier_block *nb) { struct notifier_block **rcnb; int ret; rcnb = devres_alloc(devm_unregister_reboot_notifier, sizeof(*rcnb), GFP_KERNEL); if (!rcnb) return -ENOMEM; ret = register_reboot_notifier(nb); if (!ret) { *rcnb = nb; devres_add(dev, rcnb); } else { devres_free(rcnb); } return ret; } EXPORT_SYMBOL(devm_register_reboot_notifier); /* * Notifier list for kernel code which wants to be called * to restart the system. */ static ATOMIC_NOTIFIER_HEAD(restart_handler_list); /** * register_restart_handler - Register function to be called to reset * the system * @nb: Info about handler function to be called * @nb->priority: Handler priority. Handlers should follow the * following guidelines for setting priorities. * 0: Restart handler of last resort, * with limited restart capabilities * 128: Default restart handler; use if no other * restart handler is expected to be available, * and/or if restart functionality is * sufficient to restart the entire system * 255: Highest priority restart handler, will * preempt all other restart handlers * * Registers a function with code to be called to restart the * system. * * Registered functions will be called from machine_restart as last * step of the restart sequence (if the architecture specific * machine_restart function calls do_kernel_restart - see below * for details). * Registered functions are expected to restart the system immediately. * If more than one function is registered, the restart handler priority * selects which function will be called first. * * Restart handlers are expected to be registered from non-architecture * code, typically from drivers. A typical use case would be a system * where restart functionality is provided through a watchdog. Multiple * restart handlers may exist; for example, one restart handler might * restart the entire system, while another only restarts the CPU. * In such cases, the restart handler which only restarts part of the * hardware is expected to register with low priority to ensure that * it only runs if no other means to restart the system is available. * * Currently always returns zero, as atomic_notifier_chain_register() * always returns zero. */ int register_restart_handler(struct notifier_block *nb) { return atomic_notifier_chain_register(&restart_handler_list, nb); } EXPORT_SYMBOL(register_restart_handler); /** * unregister_restart_handler - Unregister previously registered * restart handler * @nb: Hook to be unregistered * * Unregisters a previously registered restart handler function. * * Returns zero on success, or %-ENOENT on failure. */ int unregister_restart_handler(struct notifier_block *nb) { return atomic_notifier_chain_unregister(&restart_handler_list, nb); } EXPORT_SYMBOL(unregister_restart_handler); /** * do_kernel_restart - Execute kernel restart handler call chain * * Calls functions registered with register_restart_handler. * * Expected to be called from machine_restart as last step of the restart * sequence. * * Restarts the system immediately if a restart handler function has been * registered. Otherwise does nothing. */ void do_kernel_restart(char *cmd) { atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd); } void migrate_to_reboot_cpu(void) { /* The boot cpu is always logical cpu 0 */ int cpu = reboot_cpu; cpu_hotplug_disable(); /* Make certain the cpu I'm about to reboot on is online */ if (!cpu_online(cpu)) cpu = cpumask_first(cpu_online_mask); /* Prevent races with other tasks migrating this task */ current->flags |= PF_NO_SETAFFINITY; /* Make certain I only run on the appropriate processor */ set_cpus_allowed_ptr(current, cpumask_of(cpu)); } /* * Notifier list for kernel code which wants to be called * to prepare system for restart. */ static BLOCKING_NOTIFIER_HEAD(restart_prep_handler_list); static void do_kernel_restart_prepare(void) { blocking_notifier_call_chain(&restart_prep_handler_list, 0, NULL); } /** * kernel_restart - reboot the system * @cmd: pointer to buffer containing command to execute for restart * or %NULL * * Shutdown everything and perform a clean reboot. * This is not safe to call in interrupt context. */ void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); do_kernel_restart_prepare(); migrate_to_reboot_cpu(); syscore_shutdown(); if (!cmd) pr_emerg("Restarting system\n"); else pr_emerg("Restarting system with command '%s'\n", cmd); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_restart(cmd); } EXPORT_SYMBOL_GPL(kernel_restart); static void kernel_shutdown_prepare(enum system_states state) { blocking_notifier_call_chain(&reboot_notifier_list, (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL); system_state = state; usermodehelper_disable(); device_shutdown(); } /** * kernel_halt - halt the system * * Shutdown everything and perform a clean system halt. */ void kernel_halt(void) { kernel_shutdown_prepare(SYSTEM_HALT); migrate_to_reboot_cpu(); syscore_shutdown(); if (poweroff_fallback_to_halt) pr_emerg("Power off not available: System halted instead\n"); else pr_emerg("System halted\n"); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_halt(); } EXPORT_SYMBOL_GPL(kernel_halt); /* * Notifier list for kernel code which wants to be called * to prepare system for power off. */ static BLOCKING_NOTIFIER_HEAD(power_off_prep_handler_list); /* * Notifier list for kernel code which wants to be called * to power off system. */ static ATOMIC_NOTIFIER_HEAD(power_off_handler_list); static int sys_off_notify(struct notifier_block *nb, unsigned long mode, void *cmd) { struct sys_off_handler *handler; struct sys_off_data data = {}; handler = container_of(nb, struct sys_off_handler, nb); data.cb_data = handler->cb_data; data.mode = mode; data.cmd = cmd; data.dev = handler->dev; return handler->sys_off_cb(&data); } static struct sys_off_handler platform_sys_off_handler; static struct sys_off_handler *alloc_sys_off_handler(int priority) { struct sys_off_handler *handler; gfp_t flags; /* * Platforms like m68k can't allocate sys_off handler dynamically * at the early boot time because memory allocator isn't available yet. */ if (priority == SYS_OFF_PRIO_PLATFORM) { handler = &platform_sys_off_handler; if (handler->cb_data) return ERR_PTR(-EBUSY); } else { if (system_state > SYSTEM_RUNNING) flags = GFP_ATOMIC; else flags = GFP_KERNEL; handler = kzalloc(sizeof(*handler), flags); if (!handler) return ERR_PTR(-ENOMEM); } return handler; } static void free_sys_off_handler(struct sys_off_handler *handler) { if (handler == &platform_sys_off_handler) memset(handler, 0, sizeof(*handler)); else kfree(handler); } /** * register_sys_off_handler - Register sys-off handler * @mode: Sys-off mode * @priority: Handler priority * @callback: Callback function * @cb_data: Callback argument * * Registers system power-off or restart handler that will be invoked * at the step corresponding to the given sys-off mode. Handler's callback * should return NOTIFY_DONE to permit execution of the next handler in * the call chain or NOTIFY_STOP to break the chain (in error case for * example). * * Multiple handlers can be registered at the default priority level. * * Only one handler can be registered at the non-default priority level, * otherwise ERR_PTR(-EBUSY) is returned. * * Returns a new instance of struct sys_off_handler on success, or * an ERR_PTR()-encoded error code otherwise. */ struct sys_off_handler * register_sys_off_handler(enum sys_off_mode mode, int priority, int (*callback)(struct sys_off_data *data), void *cb_data) { struct sys_off_handler *handler; int err; handler = alloc_sys_off_handler(priority); if (IS_ERR(handler)) return handler; switch (mode) { case SYS_OFF_MODE_POWER_OFF_PREPARE: handler->list = &power_off_prep_handler_list; handler->blocking = true; break; case SYS_OFF_MODE_POWER_OFF: handler->list = &power_off_handler_list; break; case SYS_OFF_MODE_RESTART_PREPARE: handler->list = &restart_prep_handler_list; handler->blocking = true; break; case SYS_OFF_MODE_RESTART: handler->list = &restart_handler_list; break; default: free_sys_off_handler(handler); return ERR_PTR(-EINVAL); } handler->nb.notifier_call = sys_off_notify; handler->nb.priority = priority; handler->sys_off_cb = callback; handler->cb_data = cb_data; handler->mode = mode; if (handler->blocking) { if (priority == SYS_OFF_PRIO_DEFAULT) err = blocking_notifier_chain_register(handler->list, &handler->nb); else err = blocking_notifier_chain_register_unique_prio(handler->list, &handler->nb); } else { if (priority == SYS_OFF_PRIO_DEFAULT) err = atomic_notifier_chain_register(handler->list, &handler->nb); else err = atomic_notifier_chain_register_unique_prio(handler->list, &handler->nb); } if (err) { free_sys_off_handler(handler); return ERR_PTR(err); } return handler; } EXPORT_SYMBOL_GPL(register_sys_off_handler); /** * unregister_sys_off_handler - Unregister sys-off handler * @handler: Sys-off handler * * Unregisters given sys-off handler. */ void unregister_sys_off_handler(struct sys_off_handler *handler) { int err; if (IS_ERR_OR_NULL(handler)) return; if (handler->blocking) err = blocking_notifier_chain_unregister(handler->list, &handler->nb); else err = atomic_notifier_chain_unregister(handler->list, &handler->nb); /* sanity check, shall never happen */ WARN_ON(err); free_sys_off_handler(handler); } EXPORT_SYMBOL_GPL(unregister_sys_off_handler); static void devm_unregister_sys_off_handler(void *data) { struct sys_off_handler *handler = data; unregister_sys_off_handler(handler); } /** * devm_register_sys_off_handler - Register sys-off handler * @dev: Device that registers handler * @mode: Sys-off mode * @priority: Handler priority * @callback: Callback function * @cb_data: Callback argument * * Registers resource-managed sys-off handler. * * Returns zero on success, or error code on failure. */ int devm_register_sys_off_handler(struct device *dev, enum sys_off_mode mode, int priority, int (*callback)(struct sys_off_data *data), void *cb_data) { struct sys_off_handler *handler; handler = register_sys_off_handler(mode, priority, callback, cb_data); if (IS_ERR(handler)) return PTR_ERR(handler); handler->dev = dev; return devm_add_action_or_reset(dev, devm_unregister_sys_off_handler, handler); } EXPORT_SYMBOL_GPL(devm_register_sys_off_handler); /** * devm_register_power_off_handler - Register power-off handler * @dev: Device that registers callback * @callback: Callback function * @cb_data: Callback's argument * * Registers resource-managed sys-off handler with a default priority * and using power-off mode. * * Returns zero on success, or error code on failure. */ int devm_register_power_off_handler(struct device *dev, int (*callback)(struct sys_off_data *data), void *cb_data) { return devm_register_sys_off_handler(dev, SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_DEFAULT, callback, cb_data); } EXPORT_SYMBOL_GPL(devm_register_power_off_handler); /** * devm_register_restart_handler - Register restart handler * @dev: Device that registers callback * @callback: Callback function * @cb_data: Callback's argument * * Registers resource-managed sys-off handler with a default priority * and using restart mode. * * Returns zero on success, or error code on failure. */ int devm_register_restart_handler(struct device *dev, int (*callback)(struct sys_off_data *data), void *cb_data) { return devm_register_sys_off_handler(dev, SYS_OFF_MODE_RESTART, SYS_OFF_PRIO_DEFAULT, callback, cb_data); } EXPORT_SYMBOL_GPL(devm_register_restart_handler); static struct sys_off_handler *platform_power_off_handler; static int platform_power_off_notify(struct sys_off_data *data) { void (*platform_power_power_off_cb)(void) = data->cb_data; platform_power_power_off_cb(); return NOTIFY_DONE; } /** * register_platform_power_off - Register platform-level power-off callback * @power_off: Power-off callback * * Registers power-off callback that will be called as last step * of the power-off sequence. This callback is expected to be invoked * for the last resort. Only one platform power-off callback is allowed * to be registered at a time. * * Returns zero on success, or error code on failure. */ int register_platform_power_off(void (*power_off)(void)) { struct sys_off_handler *handler; handler = register_sys_off_handler(SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_PLATFORM, platform_power_off_notify, power_off); if (IS_ERR(handler)) return PTR_ERR(handler); platform_power_off_handler = handler; return 0; } EXPORT_SYMBOL_GPL(register_platform_power_off); /** * unregister_platform_power_off - Unregister platform-level power-off callback * @power_off: Power-off callback * * Unregisters previously registered platform power-off callback. */ void unregister_platform_power_off(void (*power_off)(void)) { if (platform_power_off_handler && platform_power_off_handler->cb_data == power_off) { unregister_sys_off_handler(platform_power_off_handler); platform_power_off_handler = NULL; } } EXPORT_SYMBOL_GPL(unregister_platform_power_off); static int legacy_pm_power_off(struct sys_off_data *data) { if (pm_power_off) pm_power_off(); return NOTIFY_DONE; } static void do_kernel_power_off_prepare(void) { blocking_notifier_call_chain(&power_off_prep_handler_list, 0, NULL); } /** * do_kernel_power_off - Execute kernel power-off handler call chain * * Expected to be called as last step of the power-off sequence. * * Powers off the system immediately if a power-off handler function has * been registered. Otherwise does nothing. */ void do_kernel_power_off(void) { struct sys_off_handler *sys_off = NULL; /* * Register sys-off handlers for legacy PM callback. This allows * legacy PM callbacks temporary co-exist with the new sys-off API. * * TODO: Remove legacy handlers once all legacy PM users will be * switched to the sys-off based APIs. */ if (pm_power_off) sys_off = register_sys_off_handler(SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_DEFAULT, legacy_pm_power_off, NULL); atomic_notifier_call_chain(&power_off_handler_list, 0, NULL); unregister_sys_off_handler(sys_off); } /** * kernel_can_power_off - check whether system can be powered off * * Returns true if power-off handler is registered and system can be * powered off, false otherwise. */ bool kernel_can_power_off(void) { return !atomic_notifier_call_chain_is_empty(&power_off_handler_list) || pm_power_off; } EXPORT_SYMBOL_GPL(kernel_can_power_off); /** * kernel_power_off - power_off the system * * Shutdown everything and perform a clean system power_off. */ void kernel_power_off(void) { kernel_shutdown_prepare(SYSTEM_POWER_OFF); do_kernel_power_off_prepare(); migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down\n"); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off(); } EXPORT_SYMBOL_GPL(kernel_power_off); DEFINE_MUTEX(system_transition_mutex); /* * Reboot system call: for obvious reasons only root may call it, * and even root needs to set up some magic numbers in the registers * so that some mistake won't make this reboot the whole machine. * You can also set the meaning of the ctrl-alt-del-key here. * * reboot doesn't sync: do that yourself before calling this. */ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd, void __user *, arg) { struct pid_namespace *pid_ns = task_active_pid_ns(current); char buffer[256]; int ret = 0; /* We only trust the superuser with rebooting the system. */ if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT)) return -EPERM; /* For safety, we require "magic" arguments. */ if (magic1 != LINUX_REBOOT_MAGIC1 || (magic2 != LINUX_REBOOT_MAGIC2 && magic2 != LINUX_REBOOT_MAGIC2A && magic2 != LINUX_REBOOT_MAGIC2B && magic2 != LINUX_REBOOT_MAGIC2C)) return -EINVAL; /* * If pid namespaces are enabled and the current task is in a child * pid_namespace, the command is handled by reboot_pid_ns() which will * call do_exit(). */ ret = reboot_pid_ns(pid_ns, cmd); if (ret) return ret; /* Instead of trying to make the power_off code look like * halt when pm_power_off is not set do it the easy way. */ if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !kernel_can_power_off()) { poweroff_fallback_to_halt = true; cmd = LINUX_REBOOT_CMD_HALT; } mutex_lock(&system_transition_mutex); switch (cmd) { case LINUX_REBOOT_CMD_RESTART: kernel_restart(NULL); break; case LINUX_REBOOT_CMD_CAD_ON: C_A_D = 1; break; case LINUX_REBOOT_CMD_CAD_OFF: C_A_D = 0; break; case LINUX_REBOOT_CMD_HALT: kernel_halt(); do_exit(0); case LINUX_REBOOT_CMD_POWER_OFF: kernel_power_off(); do_exit(0); break; case LINUX_REBOOT_CMD_RESTART2: ret = strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1); if (ret < 0) { ret = -EFAULT; break; } buffer[sizeof(buffer) - 1] = '\0'; kernel_restart(buffer); break; #ifdef CONFIG_KEXEC_CORE case LINUX_REBOOT_CMD_KEXEC: ret = kernel_kexec(); break; #endif #ifdef CONFIG_HIBERNATION case LINUX_REBOOT_CMD_SW_SUSPEND: ret = hibernate(); break; #endif default: ret = -EINVAL; break; } mutex_unlock(&system_transition_mutex); return ret; } static void deferred_cad(struct work_struct *dummy) { kernel_restart(NULL); } /* * This function gets called by ctrl-alt-del - ie the keyboard interrupt. * As it's called within an interrupt, it may NOT sync: the only choice * is whether to reboot at once, or just ignore the ctrl-alt-del. */ void ctrl_alt_del(void) { static DECLARE_WORK(cad_work, deferred_cad); if (C_A_D) schedule_work(&cad_work); else kill_cad_pid(SIGINT, 1); } #define POWEROFF_CMD_PATH_LEN 256 static char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff"; static const char reboot_cmd[] = "/sbin/reboot"; static int run_cmd(const char *cmd) { char **argv; static char *envp[] = { "HOME=/", "PATH=/sbin:/bin:/usr/sbin:/usr/bin", NULL }; int ret; argv = argv_split(GFP_KERNEL, cmd, NULL); if (argv) { ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC); argv_free(argv); } else { ret = -ENOMEM; } return ret; } static int __orderly_reboot(void) { int ret; ret = run_cmd(reboot_cmd); if (ret) { pr_warn("Failed to start orderly reboot: forcing the issue\n"); emergency_sync(); kernel_restart(NULL); } return ret; } static int __orderly_poweroff(bool force) { int ret; ret = run_cmd(poweroff_cmd); if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? */ emergency_sync(); kernel_power_off(); } return ret; } static bool poweroff_force; static void poweroff_work_func(struct work_struct *work) { __orderly_poweroff(poweroff_force); } static DECLARE_WORK(poweroff_work, poweroff_work_func); /** * orderly_poweroff - Trigger an orderly system poweroff * @force: force poweroff if command execution fails * * This may be called from any context to trigger a system shutdown. * If the orderly shutdown fails, it will force an immediate shutdown. */ void orderly_poweroff(bool force) { if (force) /* do not override the pending "true" */ poweroff_force = true; schedule_work(&poweroff_work); } EXPORT_SYMBOL_GPL(orderly_poweroff); static void reboot_work_func(struct work_struct *work) { __orderly_reboot(); } static DECLARE_WORK(reboot_work, reboot_work_func); /** * orderly_reboot - Trigger an orderly system reboot * * This may be called from any context to trigger a system reboot. * If the orderly reboot fails, it will force an immediate reboot. */ void orderly_reboot(void) { schedule_work(&reboot_work); } EXPORT_SYMBOL_GPL(orderly_reboot); /** * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay * @work: work_struct associated with the emergency poweroff function * * This function is called in very critical situations to force * a kernel poweroff after a configurable timeout value. */ static void hw_failure_emergency_poweroff_func(struct work_struct *work) { /* * We have reached here after the emergency shutdown waiting period has * expired. This means orderly_poweroff has not been able to shut off * the system for some reason. * * Try to shut down the system immediately using kernel_power_off * if populated */ pr_emerg("Hardware protection timed-out. Trying forced poweroff\n"); kernel_power_off(); /* * Worst of the worst case trigger emergency restart */ pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n"); emergency_restart(); } static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work, hw_failure_emergency_poweroff_func); /** * hw_failure_emergency_poweroff - Trigger an emergency system poweroff * * This may be called from any critical situation to trigger a system shutdown * after a given period of time. If time is negative this is not scheduled. */ static void hw_failure_emergency_poweroff(int poweroff_delay_ms) { if (poweroff_delay_ms <= 0) return; schedule_delayed_work(&hw_failure_emergency_poweroff_work, msecs_to_jiffies(poweroff_delay_ms)); } /** * __hw_protection_shutdown - Trigger an emergency system shutdown or reboot * * @reason: Reason of emergency shutdown or reboot to be printed. * @ms_until_forced: Time to wait for orderly shutdown or reboot before * triggering it. Negative value disables the forced * shutdown or reboot. * @shutdown: If true, indicates that a shutdown will happen * after the critical tempeature is reached. * If false, indicates that a reboot will happen * after the critical tempeature is reached. * * Initiate an emergency system shutdown or reboot in order to protect * hardware from further damage. Usage examples include a thermal protection. * NOTE: The request is ignored if protection shutdown or reboot is already * pending even if the previous request has given a large timeout for forced * shutdown/reboot. */ void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shutdown) { static atomic_t allow_proceed = ATOMIC_INIT(1); pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason); /* Shutdown should be initiated only once. */ if (!atomic_dec_and_test(&allow_proceed)) return; /* * Queue a backup emergency shutdown in the event of * orderly_poweroff failure */ hw_failure_emergency_poweroff(ms_until_forced); if (shutdown) orderly_poweroff(true); else orderly_reboot(); } EXPORT_SYMBOL_GPL(__hw_protection_shutdown); static int __init reboot_setup(char *str) { for (;;) { enum reboot_mode *mode; /* * Having anything passed on the command line via * reboot= will cause us to disable DMI checking * below. */ reboot_default = 0; if (!strncmp(str, "panic_", 6)) { mode = &panic_reboot_mode; str += 6; } else { mode = &reboot_mode; } switch (*str) { case 'w': *mode = REBOOT_WARM; break; case 'c': *mode = REBOOT_COLD; break; case 'h': *mode = REBOOT_HARD; break; case 's': /* * reboot_cpu is s[mp]#### with #### being the processor * to be used for rebooting. Skip 's' or 'smp' prefix. */ str += str[1] == 'm' && str[2] == 'p' ? 3 : 1; if (isdigit(str[0])) { int cpu = simple_strtoul(str, NULL, 0); if (cpu >= num_possible_cpus()) { pr_err("Ignoring the CPU number in reboot= option. " "CPU %d exceeds possible cpu number %d\n", cpu, num_possible_cpus()); break; } reboot_cpu = cpu; } else *mode = REBOOT_SOFT; break; case 'g': *mode = REBOOT_GPIO; break; case 'b': case 'a': case 'k': case 't': case 'e': case 'p': reboot_type = *str; break; case 'f': reboot_force = 1; break; } str = strchr(str, ','); if (str) str++; else break; } return 1; } __setup("reboot=", reboot_setup); #ifdef CONFIG_SYSFS #define REBOOT_COLD_STR "cold" #define REBOOT_WARM_STR "warm" #define REBOOT_HARD_STR "hard" #define REBOOT_SOFT_STR "soft" #define REBOOT_GPIO_STR "gpio" #define REBOOT_UNDEFINED_STR "undefined" #define BOOT_TRIPLE_STR "triple" #define BOOT_KBD_STR "kbd" #define BOOT_BIOS_STR "bios" #define BOOT_ACPI_STR "acpi" #define BOOT_EFI_STR "efi" #define BOOT_PCI_STR "pci" static ssize_t mode_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { const char *val; switch (reboot_mode) { case REBOOT_COLD: val = REBOOT_COLD_STR; break; case REBOOT_WARM: val = REBOOT_WARM_STR; break; case REBOOT_HARD: val = REBOOT_HARD_STR; break; case REBOOT_SOFT: val = REBOOT_SOFT_STR; break; case REBOOT_GPIO: val = REBOOT_GPIO_STR; break; default: val = REBOOT_UNDEFINED_STR; } return sysfs_emit(buf, "%s\n", val); } static ssize_t mode_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { if (!capable(CAP_SYS_BOOT)) return -EPERM; if (!strncmp(buf, REBOOT_COLD_STR, strlen(REBOOT_COLD_STR))) reboot_mode = REBOOT_COLD; else if (!strncmp(buf, REBOOT_WARM_STR, strlen(REBOOT_WARM_STR))) reboot_mode = REBOOT_WARM; else if (!strncmp(buf, REBOOT_HARD_STR, strlen(REBOOT_HARD_STR))) reboot_mode = REBOOT_HARD; else if (!strncmp(buf, REBOOT_SOFT_STR, strlen(REBOOT_SOFT_STR))) reboot_mode = REBOOT_SOFT; else if (!strncmp(buf, REBOOT_GPIO_STR, strlen(REBOOT_GPIO_STR))) reboot_mode = REBOOT_GPIO; else return -EINVAL; reboot_default = 0; return count; } static struct kobj_attribute reboot_mode_attr = __ATTR_RW(mode); #ifdef CONFIG_X86 static ssize_t force_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { return sysfs_emit(buf, "%d\n", reboot_force); } static ssize_t force_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { bool res; if (!capable(CAP_SYS_BOOT)) return -EPERM; if (kstrtobool(buf, &res)) return -EINVAL; reboot_default = 0; reboot_force = res; return count; } static struct kobj_attribute reboot_force_attr = __ATTR_RW(force); static ssize_t type_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { const char *val; switch (reboot_type) { case BOOT_TRIPLE: val = BOOT_TRIPLE_STR; break; case BOOT_KBD: val = BOOT_KBD_STR; break; case BOOT_BIOS: val = BOOT_BIOS_STR; break; case BOOT_ACPI: val = BOOT_ACPI_STR; break; case BOOT_EFI: val = BOOT_EFI_STR; break; case BOOT_CF9_FORCE: val = BOOT_PCI_STR; break; default: val = REBOOT_UNDEFINED_STR; } return sysfs_emit(buf, "%s\n", val); } static ssize_t type_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { if (!capable(CAP_SYS_BOOT)) return -EPERM; if (!strncmp(buf, BOOT_TRIPLE_STR, strlen(BOOT_TRIPLE_STR))) reboot_type = BOOT_TRIPLE; else if (!strncmp(buf, BOOT_KBD_STR, strlen(BOOT_KBD_STR))) reboot_type = BOOT_KBD; else if (!strncmp(buf, BOOT_BIOS_STR, strlen(BOOT_BIOS_STR))) reboot_type = BOOT_BIOS; else if (!strncmp(buf, BOOT_ACPI_STR, strlen(BOOT_ACPI_STR))) reboot_type = BOOT_ACPI; else if (!strncmp(buf, BOOT_EFI_STR, strlen(BOOT_EFI_STR))) reboot_type = BOOT_EFI; else if (!strncmp(buf, BOOT_PCI_STR, strlen(BOOT_PCI_STR))) reboot_type = BOOT_CF9_FORCE; else return -EINVAL; reboot_default = 0; return count; } static struct kobj_attribute reboot_type_attr = __ATTR_RW(type); #endif #ifdef CONFIG_SMP static ssize_t cpu_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { return sysfs_emit(buf, "%d\n", reboot_cpu); } static ssize_t cpu_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { unsigned int cpunum; int rc; if (!capable(CAP_SYS_BOOT)) return -EPERM; rc = kstrtouint(buf, 0, &cpunum); if (rc) return rc; if (cpunum >= num_possible_cpus()) return -ERANGE; reboot_default = 0; reboot_cpu = cpunum; return count; } static struct kobj_attribute reboot_cpu_attr = __ATTR_RW(cpu); #endif static struct attribute *reboot_attrs[] = { &reboot_mode_attr.attr, #ifdef CONFIG_X86 &reboot_force_attr.attr, &reboot_type_attr.attr, #endif #ifdef CONFIG_SMP &reboot_cpu_attr.attr, #endif NULL, }; #ifdef CONFIG_SYSCTL static struct ctl_table kern_reboot_table[] = { { .procname = "poweroff_cmd", .data = &poweroff_cmd, .maxlen = POWEROFF_CMD_PATH_LEN, .mode = 0644, .proc_handler = proc_dostring, }, { .procname = "ctrl-alt-del", .data = &C_A_D, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, }; static void __init kernel_reboot_sysctls_init(void) { register_sysctl_init("kernel", kern_reboot_table); } #else #define kernel_reboot_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ static const struct attribute_group reboot_attr_group = { .attrs = reboot_attrs, }; static int __init reboot_ksysfs_init(void) { struct kobject *reboot_kobj; int ret; reboot_kobj = kobject_create_and_add("reboot", kernel_kobj); if (!reboot_kobj) return -ENOMEM; ret = sysfs_create_group(reboot_kobj, &reboot_attr_group); if (ret) { kobject_put(reboot_kobj); return ret; } kernel_reboot_sysctls_init(); return 0; } late_initcall(reboot_ksysfs_init); #endif |
17 1 16 1 70 62 1 1 1 1 1 63 7 56 16 10 62 3 3 2 3 4 46 44 2 67 70 11 95 87 56 68 5 11 5 1 16 70 70 84 16 100 13 87 88 99 13 88 9 100 100 100 100 100 99 105 12 100 3 3 89 7 89 26 20 11 22 50 2 87 87 78 15 85 5 98 1 1 1 97 96 1 4 1 89 28 4 90 89 1 88 88 3 96 96 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 | // SPDX-License-Identifier: GPL-2.0 /* * mm/mprotect.c * * (C) Copyright 1994 Linus Torvalds * (C) Copyright 2002 Christoph Hellwig * * Address space accounting code <alan@lxorguk.ukuu.org.uk> * (C) Copyright 2002 Red Hat Inc, All Rights Reserved */ #include <linux/pagewalk.h> #include <linux/hugetlb.h> #include <linux/shm.h> #include <linux/mman.h> #include <linux/fs.h> #include <linux/highmem.h> #include <linux/security.h> #include <linux/mempolicy.h> #include <linux/personality.h> #include <linux/syscalls.h> #include <linux/swap.h> #include <linux/swapops.h> #include <linux/mmu_notifier.h> #include <linux/migrate.h> #include <linux/perf_event.h> #include <linux/pkeys.h> #include <linux/ksm.h> #include <linux/uaccess.h> #include <linux/mm_inline.h> #include <linux/pgtable.h> #include <linux/sched/sysctl.h> #include <linux/userfaultfd_k.h> #include <linux/memory-tiers.h> #include <uapi/linux/mman.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> #include <asm/tlb.h> #include "internal.h" bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { struct page *page; if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) return false; /* Don't touch entries that are not even readable. */ if (pte_protnone(pte)) return false; /* Do we need write faults for softdirty tracking? */ if (pte_needs_soft_dirty_wp(vma, pte)) return false; /* Do we need write faults for uffd-wp tracking? */ if (userfaultfd_pte_wp(vma, pte)) return false; if (!(vma->vm_flags & VM_SHARED)) { /* * Writable MAP_PRIVATE mapping: We can only special-case on * exclusive anonymous pages, because we know that our * write-fault handler similarly would map them writable without * any additional checks while holding the PT lock. */ page = vm_normal_page(vma, addr, pte); return page && PageAnon(page) && PageAnonExclusive(page); } VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)); /* * Writable MAP_SHARED mapping: "clean" might indicate that the FS still * needs a real write-fault for writenotify * (see vma_wants_writenotify()). If "dirty", the assumption is that the * FS was already notified and we can simply mark the PTE writable * just like the write-fault handler would do. */ return pte_dirty(pte); } static long change_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { pte_t *pte, oldpte; spinlock_t *ptl; long pages = 0; int target_node = NUMA_NO_NODE; bool prot_numa = cp_flags & MM_CP_PROT_NUMA; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; tlb_change_page_size(tlb, PAGE_SIZE); pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (!pte) return -EAGAIN; /* Get target node for single threaded private VMAs */ if (prot_numa && !(vma->vm_flags & VM_SHARED) && atomic_read(&vma->vm_mm->mm_users) == 1) target_node = numa_node_id(); flush_tlb_batched_pending(vma->vm_mm); arch_enter_lazy_mmu_mode(); do { oldpte = ptep_get(pte); if (pte_present(oldpte)) { pte_t ptent; /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. */ if (prot_numa) { struct folio *folio; int nid; bool toptier; /* Avoid TLB flush if possible */ if (pte_protnone(oldpte)) continue; folio = vm_normal_folio(vma, addr, oldpte); if (!folio || folio_is_zone_device(folio) || folio_test_ksm(folio)) continue; /* Also skip shared copy-on-write pages */ if (is_cow_mapping(vma->vm_flags) && (folio_maybe_dma_pinned(folio) || folio_likely_mapped_shared(folio))) continue; /* * While migration can move some dirty pages, * it cannot move them all from MIGRATE_ASYNC * context. */ if (folio_is_file_lru(folio) && folio_test_dirty(folio)) continue; /* * Don't mess with PTEs if page is already on the node * a single-threaded process is running on. */ nid = folio_nid(folio); if (target_node == nid) continue; toptier = node_is_toptier(nid); /* * Skip scanning top tier node if normal numa * balancing is disabled */ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && toptier) continue; if (folio_use_access_time(folio)) folio_xchg_access_time(folio, jiffies_to_msecs(jiffies)); } oldpte = ptep_modify_prot_start(vma, addr, pte); ptent = pte_modify(oldpte, newprot); if (uffd_wp) ptent = pte_mkuffd_wp(ptent); else if (uffd_wp_resolve) ptent = pte_clear_uffd_wp(ptent); /* * In some writable, shared mappings, we might want * to catch actual write access -- see * vma_wants_writenotify(). * * In all writable, private mappings, we have to * properly handle COW. * * In both cases, we can sometimes still change PTEs * writable and avoid the write-fault handler, for * example, if a PTE is already dirty and no other * COW or special handling is required. */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pte_write(ptent) && can_change_pte_writable(vma, addr, ptent)) ptent = pte_mkwrite(ptent, vma); ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); if (pte_needs_flush(oldpte, ptent)) tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; } else if (is_swap_pte(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); pte_t newpte; if (is_writable_migration_entry(entry)) { struct folio *folio = pfn_swap_entry_folio(entry); /* * A protection check is difficult so * just be safe and disable write */ if (folio_test_anon(folio)) entry = make_readable_exclusive_migration_entry( swp_offset(entry)); else entry = make_readable_migration_entry(swp_offset(entry)); newpte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(oldpte)) newpte = pte_swp_mksoft_dirty(newpte); } else if (is_writable_device_private_entry(entry)) { /* * We do not preserve soft-dirtiness. See * copy_nonpresent_pte() for explanation. */ entry = make_readable_device_private_entry( swp_offset(entry)); newpte = swp_entry_to_pte(entry); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); } else if (is_writable_device_exclusive_entry(entry)) { entry = make_readable_device_exclusive_entry( swp_offset(entry)); newpte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(oldpte)) newpte = pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); } else if (is_pte_marker_entry(entry)) { /* * Ignore error swap entries unconditionally, * because any access should sigbus/sigsegv * anyway. */ if (is_poisoned_swp_entry(entry) || is_guard_swp_entry(entry)) continue; /* * If this is uffd-wp pte marker and we'd like * to unprotect it, drop it; the next page * fault will trigger without uffd trapping. */ if (uffd_wp_resolve) { pte_clear(vma->vm_mm, addr, pte); pages++; } continue; } else { newpte = oldpte; } if (uffd_wp) newpte = pte_swp_mkuffd_wp(newpte); else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); if (!pte_same(oldpte, newpte)) { set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } } else { /* It must be an none page, or what else?.. */ WARN_ON_ONCE(!pte_none(oldpte)); /* * Nobody plays with any none ptes besides * userfaultfd when applying the protections. */ if (likely(!uffd_wp)) continue; if (userfaultfd_wp_use_markers(vma)) { /* * For file-backed mem, we need to be able to * wr-protect a none pte, because even if the * pte is none, the page/swap cache could * exist. Doing that by install a marker. */ set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); pages++; } } } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); return pages; } /* * Return true if we want to split THPs into PTE mappings in change * protection procedure, false otherwise. */ static inline bool pgtable_split_needed(struct vm_area_struct *vma, unsigned long cp_flags) { /* * pte markers only resides in pte level, if we need pte markers, * we need to split. For example, we cannot wr-protect a file thp * (e.g. 2M shmem) because file thp is handled differently when * split by erasing the pmd so far. */ return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); } /* * Return true if we want to populate pgtables in change protection * procedure, false otherwise */ static inline bool pgtable_populate_needed(struct vm_area_struct *vma, unsigned long cp_flags) { /* If not within ioctl(UFFDIO_WRITEPROTECT), then don't bother */ if (!(cp_flags & MM_CP_UFFD_WP)) return false; /* Populate if the userfaultfd mode requires pte markers */ return userfaultfd_wp_use_markers(vma); } /* * Populate the pgtable underneath for whatever reason if requested. * When {pte|pmd|...}_alloc() failed we treat it the same way as pgtable * allocation failures during page faults by kicking OOM and returning * error. */ #define change_pmd_prepare(vma, pmd, cp_flags) \ ({ \ long err = 0; \ if (unlikely(pgtable_populate_needed(vma, cp_flags))) { \ if (pte_alloc(vma->vm_mm, pmd)) \ err = -ENOMEM; \ } \ err; \ }) /* * This is the general pud/p4d/pgd version of change_pmd_prepare(). We need to * have separate change_pmd_prepare() because pte_alloc() returns 0 on success, * while {pmd|pud|p4d}_alloc() returns the valid pointer on success. */ #define change_prepare(vma, high, low, addr, cp_flags) \ ({ \ long err = 0; \ if (unlikely(pgtable_populate_needed(vma, cp_flags))) { \ low##_t *p = low##_alloc(vma->vm_mm, high, addr); \ if (p == NULL) \ err = -ENOMEM; \ } \ err; \ }) static inline long change_pmd_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { pmd_t *pmd; unsigned long next; long pages = 0; unsigned long nr_huge_updates = 0; pmd = pmd_offset(pud, addr); do { long ret; pmd_t _pmd; again: next = pmd_addr_end(addr, end); ret = change_pmd_prepare(vma, pmd, cp_flags); if (ret) { pages = ret; break; } if (pmd_none(*pmd)) goto next; _pmd = pmdp_get_lockless(pmd); if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); /* * For file-backed, the pmd could have been * cleared; make sure pmd populated if * necessary, then fall-through to pte level. */ ret = change_pmd_prepare(vma, pmd, cp_flags); if (ret) { pages = ret; break; } } else { ret = change_huge_pmd(tlb, vma, pmd, addr, newprot, cp_flags); if (ret) { if (ret == HPAGE_PMD_NR) { pages += HPAGE_PMD_NR; nr_huge_updates++; } /* huge pmd was handled */ goto next; } } /* fall through, the trans huge pmd just split */ } ret = change_pte_range(tlb, vma, pmd, addr, next, newprot, cp_flags); if (ret < 0) goto again; pages += ret; next: cond_resched(); } while (pmd++, addr = next, addr != end); if (nr_huge_updates) count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates); return pages; } static inline long change_pud_range(struct mmu_gather *tlb, struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { struct mmu_notifier_range range; pud_t *pudp, pud; unsigned long next; long pages = 0, ret; range.start = 0; pudp = pud_offset(p4d, addr); do { again: next = pud_addr_end(addr, end); ret = change_prepare(vma, pudp, pmd, addr, cp_flags); if (ret) { pages = ret; break; } pud = READ_ONCE(*pudp); if (pud_none(pud)) continue; if (!range.start) { mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0, vma->vm_mm, addr, end); mmu_notifier_invalidate_range_start(&range); } if (pud_leaf(pud)) { if ((next - addr != PUD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pud(vma, pudp, addr); goto again; } else { ret = change_huge_pud(tlb, vma, pudp, addr, newprot, cp_flags); if (ret == 0) goto again; /* huge pud was handled */ if (ret == HPAGE_PUD_NR) pages += HPAGE_PUD_NR; continue; } } pages += change_pmd_range(tlb, vma, pudp, addr, next, newprot, cp_flags); } while (pudp++, addr = next, addr != end); if (range.start) mmu_notifier_invalidate_range_end(&range); return pages; } static inline long change_p4d_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pgd_t *pgd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { p4d_t *p4d; unsigned long next; long pages = 0, ret; p4d = p4d_offset(pgd, addr); do { next = p4d_addr_end(addr, end); ret = change_prepare(vma, p4d, pud, addr, cp_flags); if (ret) return ret; if (p4d_none_or_clear_bad(p4d)) continue; pages += change_pud_range(tlb, vma, p4d, addr, next, newprot, cp_flags); } while (p4d++, addr = next, addr != end); return pages; } static long change_protection_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; pgd_t *pgd; unsigned long next; long pages = 0, ret; BUG_ON(addr >= end); pgd = pgd_offset(mm, addr); tlb_start_vma(tlb, vma); do { next = pgd_addr_end(addr, end); ret = change_prepare(vma, pgd, p4d, addr, cp_flags); if (ret) { pages = ret; break; } if (pgd_none_or_clear_bad(pgd)) continue; pages += change_p4d_range(tlb, vma, pgd, addr, next, newprot, cp_flags); } while (pgd++, addr = next, addr != end); tlb_end_vma(tlb, vma); return pages; } long change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long cp_flags) { pgprot_t newprot = vma->vm_page_prot; long pages; BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) == MM_CP_UFFD_WP_ALL); #ifdef CONFIG_NUMA_BALANCING /* * Ordinary protection updates (mprotect, uffd-wp, softdirty tracking) * are expected to reflect their requirements via VMA flags such that * vma_set_page_prot() will adjust vma->vm_page_prot accordingly. */ if (cp_flags & MM_CP_PROT_NUMA) newprot = PAGE_NONE; #else WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA); #endif if (is_vm_hugetlb_page(vma)) pages = hugetlb_change_protection(vma, start, end, newprot, cp_flags); else pages = change_protection_range(tlb, vma, start, end, newprot, cp_flags); return pages; } static int prot_none_pte_entry(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { return pfn_modify_allowed(pte_pfn(ptep_get(pte)), *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long next, struct mm_walk *walk) { return pfn_modify_allowed(pte_pfn(ptep_get(pte)), *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_test(unsigned long addr, unsigned long next, struct mm_walk *walk) { return 0; } static const struct mm_walk_ops prot_none_walk_ops = { .pte_entry = prot_none_pte_entry, .hugetlb_entry = prot_none_hugetlb_entry, .test_walk = prot_none_test, .walk_lock = PGWALK_WRLOCK, }; int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) { struct mm_struct *mm = vma->vm_mm; unsigned long oldflags = vma->vm_flags; long nrpages = (end - start) >> PAGE_SHIFT; unsigned int mm_cp_flags = 0; unsigned long charged = 0; int error; if (!can_modify_vma(vma)) return -EPERM; if (newflags == oldflags) { *pprev = vma; return 0; } /* * Do PROT_NONE PFN permission checks here when we can still * bail out without undoing a lot of state. This is a rather * uncommon case, so doesn't need to be very optimized. */ if (arch_has_pfn_modify_check() && (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && (newflags & VM_ACCESS_FLAGS) == 0) { pgprot_t new_pgprot = vm_get_page_prot(newflags); error = walk_page_range(current->mm, start, end, &prot_none_walk_ops, &new_pgprot); if (error) return error; } /* * If we make a private mapping writable we increase our commit; * but (without finer accounting) cannot reduce our commit if we * make it unwritable again except in the anonymous case where no * anon_vma has yet to be assigned. * * hugetlb mapping were accounted for even if read-only so there is * no need to account for them here. */ if (newflags & VM_WRITE) { /* Check space limits when area turns into data. */ if (!may_expand_vm(mm, newflags, nrpages) && may_expand_vm(mm, oldflags, nrpages)) return -ENOMEM; if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_HUGETLB| VM_SHARED|VM_NORESERVE))) { charged = nrpages; if (security_vm_enough_memory_mm(mm, charged)) return -ENOMEM; newflags |= VM_ACCOUNT; } } else if ((oldflags & VM_ACCOUNT) && vma_is_anonymous(vma) && !vma->anon_vma) { newflags &= ~VM_ACCOUNT; } vma = vma_modify_flags(vmi, *pprev, vma, start, end, newflags); if (IS_ERR(vma)) { error = PTR_ERR(vma); goto fail; } *pprev = vma; /* * vm_flags and vm_page_prot are protected by the mmap_lock * held in write mode. */ vma_start_write(vma); vm_flags_reset(vma, newflags); if (vma_wants_manual_pte_write_upgrade(vma)) mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE; vma_set_page_prot(vma); change_protection(tlb, vma, start, end, mm_cp_flags); if ((oldflags & VM_ACCOUNT) && !(newflags & VM_ACCOUNT)) vm_unacct_memory(nrpages); /* * Private VM_LOCKED VMA becoming writable: trigger COW to avoid major * fault on access. */ if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED && (newflags & VM_WRITE)) { populate_vma_page_range(vma, start, end, NULL); } vm_stat_account(mm, oldflags, -nrpages); vm_stat_account(mm, newflags, nrpages); perf_event_mmap(vma); return 0; fail: vm_unacct_memory(charged); return error; } /* * pkey==-1 when doing a legacy mprotect() */ static int do_mprotect_pkey(unsigned long start, size_t len, unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; struct vm_area_struct *vma, *prev; int error; const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP); const bool rier = (current->personality & READ_IMPLIES_EXEC) && (prot & PROT_READ); struct mmu_gather tlb; struct vma_iterator vmi; start = untagged_addr(start); prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ return -EINVAL; if (start & ~PAGE_MASK) return -EINVAL; if (!len) return 0; len = PAGE_ALIGN(len); end = start + len; if (end <= start) return -ENOMEM; if (!arch_validate_prot(prot, start)) return -EINVAL; reqprot = prot; if (mmap_write_lock_killable(current->mm)) return -EINTR; /* * If userspace did not allocate the pkey, do not let * them use it here. */ error = -EINVAL; if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey)) goto out; vma_iter_init(&vmi, current->mm, start); vma = vma_find(&vmi, end); error = -ENOMEM; if (!vma) goto out; if (unlikely(grows & PROT_GROWSDOWN)) { if (vma->vm_start >= end) goto out; start = vma->vm_start; error = -EINVAL; if (!(vma->vm_flags & VM_GROWSDOWN)) goto out; } else { if (vma->vm_start > start) goto out; if (unlikely(grows & PROT_GROWSUP)) { end = vma->vm_end; error = -EINVAL; if (!(vma->vm_flags & VM_GROWSUP)) goto out; } } prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma; tlb_gather_mmu(&tlb, current->mm); nstart = start; tmp = vma->vm_start; for_each_vma_range(vmi, vma, end) { unsigned long mask_off_old_flags; unsigned long newflags; int new_vma_pkey; if (vma->vm_start != tmp) { error = -ENOMEM; break; } /* Does the application expect PROT_READ to imply PROT_EXEC */ if (rier && (vma->vm_flags & VM_MAYEXEC)) prot |= PROT_EXEC; /* * Each mprotect() call explicitly passes r/w/x permissions. * If a permission is not passed to mprotect(), it must be * cleared from the VMA. */ mask_off_old_flags = VM_ACCESS_FLAGS | VM_FLAGS_CLEAR; new_vma_pkey = arch_override_mprotect_pkey(vma, prot, pkey); newflags = calc_vm_prot_bits(prot, new_vma_pkey); newflags |= (vma->vm_flags & ~mask_off_old_flags); /* newflags >> 4 shift VM_MAY% in place of VM_% */ if ((newflags & ~(newflags >> 4)) & VM_ACCESS_FLAGS) { error = -EACCES; break; } if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } /* Allow architectures to sanity-check the new flags */ if (!arch_validate_flags(newflags)) { error = -EINVAL; break; } error = security_file_mprotect(vma, reqprot, prot); if (error) break; tmp = vma->vm_end; if (tmp > end) tmp = end; if (vma->vm_ops && vma->vm_ops->mprotect) { error = vma->vm_ops->mprotect(vma, nstart, tmp, newflags); if (error) break; } error = mprotect_fixup(&vmi, &tlb, vma, &prev, nstart, tmp, newflags); if (error) break; tmp = vma_iter_end(&vmi); nstart = tmp; prot = reqprot; } tlb_finish_mmu(&tlb); if (!error && tmp < end) error = -ENOMEM; out: mmap_write_unlock(current->mm); return error; } SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, unsigned long, prot) { return do_mprotect_pkey(start, len, prot, -1); } #ifdef CONFIG_ARCH_HAS_PKEYS SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { return do_mprotect_pkey(start, len, prot, pkey); } SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val) { int pkey; int ret; /* No flags supported yet. */ if (flags) return -EINVAL; /* check for unsupported init values */ if (init_val & ~PKEY_ACCESS_MASK) return -EINVAL; mmap_write_lock(current->mm); pkey = mm_pkey_alloc(current->mm); ret = -ENOSPC; if (pkey == -1) goto out; ret = arch_set_user_pkey_access(current, pkey, init_val); if (ret) { mm_pkey_free(current->mm, pkey); goto out; } ret = pkey; out: mmap_write_unlock(current->mm); return ret; } SYSCALL_DEFINE1(pkey_free, int, pkey) { int ret; mmap_write_lock(current->mm); ret = mm_pkey_free(current->mm, pkey); mmap_write_unlock(current->mm); /* * We could provide warnings or errors if any VMA still * has the pkey set here. */ return ret; } #endif /* CONFIG_ARCH_HAS_PKEYS */ |
1327 14798 5 15564 154 50 1 27 128 14857 14959 14777 14870 182 14853 14931 26 14949 14943 15027 14943 14944 182 14897 431 170 172 171 170 14494 14486 827 14353 576 584 583 576 559 49 556 576 3353 3358 89 89 88 1559 1561 1566 5 1557 1309 1461 1457 135 133 1 135 8 9 8 9 7 2 2 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/syscalls.h> #include <linux/export.h> #include <linux/uaccess.h> #include <linux/fs_struct.h> #include <linux/fs.h> #include <linux/slab.h> #include <linux/prefetch.h> #include "mount.h" #include "internal.h" struct prepend_buffer { char *buf; int len; }; #define DECLARE_BUFFER(__name, __buf, __len) \ struct prepend_buffer __name = {.buf = __buf + __len, .len = __len} static char *extract_string(struct prepend_buffer *p) { if (likely(p->len >= 0)) return p->buf; return ERR_PTR(-ENAMETOOLONG); } static bool prepend_char(struct prepend_buffer *p, unsigned char c) { if (likely(p->len > 0)) { p->len--; *--p->buf = c; return true; } p->len = -1; return false; } /* * The source of the prepend data can be an optimistic load * of a dentry name and length. And because we don't hold any * locks, the length and the pointer to the name may not be * in sync if a concurrent rename happens, and the kernel * copy might fault as a result. * * The end result will correct itself when we check the * rename sequence count, but we need to be able to handle * the fault gracefully. */ static bool prepend_copy(void *dst, const void *src, int len) { if (unlikely(copy_from_kernel_nofault(dst, src, len))) { memset(dst, 'x', len); return false; } return true; } static bool prepend(struct prepend_buffer *p, const char *str, int namelen) { // Already overflowed? if (p->len < 0) return false; // Will overflow? if (p->len < namelen) { // Fill as much as possible from the end of the name str += namelen - p->len; p->buf -= p->len; prepend_copy(p->buf, str, p->len); p->len = -1; return false; } // Fits fully p->len -= namelen; p->buf -= namelen; return prepend_copy(p->buf, str, namelen); } /** * prepend_name - prepend a pathname in front of current buffer pointer * @p: prepend buffer which contains buffer pointer and allocated length * @name: name string and length qstr structure * * With RCU path tracing, it may race with d_move(). Use READ_ONCE() to * make sure that either the old or the new name pointer and length are * fetched. However, there may be mismatch between length and pointer. * But since the length cannot be trusted, we need to copy the name very * carefully when doing the prepend_copy(). It also prepends "/" at * the beginning of the name. The sequence number check at the caller will * retry it again when a d_move() does happen. So any garbage in the buffer * due to mismatched pointer and length will be discarded. * * Load acquire is needed to make sure that we see the new name data even * if we might get the length wrong. */ static bool prepend_name(struct prepend_buffer *p, const struct qstr *name) { const char *dname = smp_load_acquire(&name->name); /* ^^^ */ u32 dlen = READ_ONCE(name->len); return prepend(p, dname, dlen) && prepend_char(p, '/'); } static int __prepend_path(const struct dentry *dentry, const struct mount *mnt, const struct path *root, struct prepend_buffer *p) { while (dentry != root->dentry || &mnt->mnt != root->mnt) { const struct dentry *parent = READ_ONCE(dentry->d_parent); if (dentry == mnt->mnt.mnt_root) { struct mount *m = READ_ONCE(mnt->mnt_parent); struct mnt_namespace *mnt_ns; if (likely(mnt != m)) { dentry = READ_ONCE(mnt->mnt_mountpoint); mnt = m; continue; } /* Global root */ mnt_ns = READ_ONCE(mnt->mnt_ns); /* open-coded is_mounted() to use local mnt_ns */ if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns)) return 1; // absolute root else return 2; // detached or not attached yet } if (unlikely(dentry == parent)) /* Escaped? */ return 3; prefetch(parent); if (!prepend_name(p, &dentry->d_name)) break; dentry = parent; } return 0; } /** * prepend_path - Prepend path string to a buffer * @path: the dentry/vfsmount to report * @root: root vfsmnt/dentry * @p: prepend buffer which contains buffer pointer and allocated length * * The function will first try to write out the pathname without taking any * lock other than the RCU read lock to make sure that dentries won't go away. * It only checks the sequence number of the global rename_lock as any change * in the dentry's d_seq will be preceded by changes in the rename_lock * sequence number. If the sequence number had been changed, it will restart * the whole pathname back-tracing sequence again by taking the rename_lock. * In this case, there is no need to take the RCU read lock as the recursive * parent pointer references will keep the dentry chain alive as long as no * rename operation is performed. */ static int prepend_path(const struct path *path, const struct path *root, struct prepend_buffer *p) { unsigned seq, m_seq = 0; struct prepend_buffer b; int error; rcu_read_lock(); restart_mnt: read_seqbegin_or_lock(&mount_lock, &m_seq); seq = 0; rcu_read_lock(); restart: b = *p; read_seqbegin_or_lock(&rename_lock, &seq); error = __prepend_path(path->dentry, real_mount(path->mnt), root, &b); if (!(seq & 1)) rcu_read_unlock(); if (need_seqretry(&rename_lock, seq)) { seq = 1; goto restart; } done_seqretry(&rename_lock, seq); if (!(m_seq & 1)) rcu_read_unlock(); if (need_seqretry(&mount_lock, m_seq)) { m_seq = 1; goto restart_mnt; } done_seqretry(&mount_lock, m_seq); if (unlikely(error == 3)) b = *p; if (b.len == p->len) prepend_char(&b, '/'); *p = b; return error; } /** * __d_path - return the path of a dentry * @path: the dentry/vfsmount to report * @root: root vfsmnt/dentry * @buf: buffer to return value in * @buflen: buffer length * * Convert a dentry into an ASCII path name. * * Returns a pointer into the buffer or an error code if the * path was too long. * * "buflen" should be positive. * * If the path is not reachable from the supplied root, return %NULL. */ char *__d_path(const struct path *path, const struct path *root, char *buf, int buflen) { DECLARE_BUFFER(b, buf, buflen); prepend_char(&b, 0); if (unlikely(prepend_path(path, root, &b) > 0)) return NULL; return extract_string(&b); } char *d_absolute_path(const struct path *path, char *buf, int buflen) { struct path root = {}; DECLARE_BUFFER(b, buf, buflen); prepend_char(&b, 0); if (unlikely(prepend_path(path, &root, &b) > 1)) return ERR_PTR(-EINVAL); return extract_string(&b); } static void get_fs_root_rcu(struct fs_struct *fs, struct path *root) { unsigned seq; do { seq = read_seqcount_begin(&fs->seq); *root = fs->root; } while (read_seqcount_retry(&fs->seq, seq)); } /** * d_path - return the path of a dentry * @path: path to report * @buf: buffer to return value in * @buflen: buffer length * * Convert a dentry into an ASCII path name. If the entry has been deleted * the string " (deleted)" is appended. Note that this is ambiguous. * * Returns a pointer into the buffer or an error code if the path was * too long. Note: Callers should use the returned pointer, not the passed * in buffer, to use the name! The implementation often starts at an offset * into the buffer, and may leave 0 bytes at the start. * * "buflen" should be positive. */ char *d_path(const struct path *path, char *buf, int buflen) { DECLARE_BUFFER(b, buf, buflen); struct path root; /* * We have various synthetic filesystems that never get mounted. On * these filesystems dentries are never used for lookup purposes, and * thus don't need to be hashed. They also don't need a name until a * user wants to identify the object in /proc/pid/fd/. The little hack * below allows us to generate a name for these objects on demand: * * Some pseudo inodes are mountable. When they are mounted * path->dentry == path->mnt->mnt_root. In that case don't call d_dname * and instead have d_path return the mounted path. */ if (path->dentry->d_op && path->dentry->d_op->d_dname && (!IS_ROOT(path->dentry) || path->dentry != path->mnt->mnt_root)) return path->dentry->d_op->d_dname(path->dentry, buf, buflen); rcu_read_lock(); get_fs_root_rcu(current->fs, &root); if (unlikely(d_unlinked(path->dentry))) prepend(&b, " (deleted)", 11); else prepend_char(&b, 0); prepend_path(path, &root, &b); rcu_read_unlock(); return extract_string(&b); } EXPORT_SYMBOL(d_path); /* * Helper function for dentry_operations.d_dname() members */ char *dynamic_dname(char *buffer, int buflen, const char *fmt, ...) { va_list args; char temp[64]; int sz; va_start(args, fmt); sz = vsnprintf(temp, sizeof(temp), fmt, args) + 1; va_end(args); if (sz > sizeof(temp) || sz > buflen) return ERR_PTR(-ENAMETOOLONG); buffer += buflen - sz; return memcpy(buffer, temp, sz); } char *simple_dname(struct dentry *dentry, char *buffer, int buflen) { DECLARE_BUFFER(b, buffer, buflen); /* these dentries are never renamed, so d_lock is not needed */ prepend(&b, " (deleted)", 11); prepend(&b, dentry->d_name.name, dentry->d_name.len); prepend_char(&b, '/'); return extract_string(&b); } /* * Write full pathname from the root of the filesystem into the buffer. */ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; int seq = 0; rcu_read_lock(); restart: dentry = d; b = *p; read_seqbegin_or_lock(&rename_lock, &seq); while (!IS_ROOT(dentry)) { const struct dentry *parent = dentry->d_parent; prefetch(parent); if (!prepend_name(&b, &dentry->d_name)) break; dentry = parent; } if (!(seq & 1)) rcu_read_unlock(); if (need_seqretry(&rename_lock, seq)) { seq = 1; goto restart; } done_seqretry(&rename_lock, seq); if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b); } char *dentry_path_raw(const struct dentry *dentry, char *buf, int buflen) { DECLARE_BUFFER(b, buf, buflen); prepend_char(&b, 0); return __dentry_path(dentry, &b); } EXPORT_SYMBOL(dentry_path_raw); char *dentry_path(const struct dentry *dentry, char *buf, int buflen) { DECLARE_BUFFER(b, buf, buflen); if (unlikely(d_unlinked(dentry))) prepend(&b, "//deleted", 10); else prepend_char(&b, 0); return __dentry_path(dentry, &b); } static void get_fs_root_and_pwd_rcu(struct fs_struct *fs, struct path *root, struct path *pwd) { unsigned seq; do { seq = read_seqcount_begin(&fs->seq); *root = fs->root; *pwd = fs->pwd; } while (read_seqcount_retry(&fs->seq, seq)); } /* * NOTE! The user-level library version returns a * character pointer. The kernel system call just * returns the length of the buffer filled (which * includes the ending '\0' character), or a negative * error value. So libc would do something like * * char *getcwd(char * buf, size_t size) * { * int retval; * * retval = sys_getcwd(buf, size); * if (retval >= 0) * return buf; * errno = -retval; * return NULL; * } */ SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size) { int error; struct path pwd, root; char *page = __getname(); if (!page) return -ENOMEM; rcu_read_lock(); get_fs_root_and_pwd_rcu(current->fs, &root, &pwd); if (unlikely(d_unlinked(pwd.dentry))) { rcu_read_unlock(); error = -ENOENT; } else { unsigned len; DECLARE_BUFFER(b, page, PATH_MAX); prepend_char(&b, 0); if (unlikely(prepend_path(&pwd, &root, &b) > 0)) prepend(&b, "(unreachable)", 13); rcu_read_unlock(); len = PATH_MAX - b.len; if (unlikely(len > PATH_MAX)) error = -ENAMETOOLONG; else if (unlikely(len > size)) error = -ERANGE; else if (copy_to_user(buf, b.buf, len)) error = -EFAULT; else error = len; } __putname(page); return error; } |
67 67 67 4 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 | /* SPDX-License-Identifier: GPL-2.0 */ /* * devres.c - managed gpio resources * This file is based on kernel/irq/devres.c * * Copyright (c) 2011 John Crispin <john@phrozen.org> */ #include <linux/device.h> #include <linux/err.h> #include <linux/export.h> #include <linux/gfp.h> #include <linux/types.h> #include <linux/gpio/consumer.h> #include "gpiolib.h" struct fwnode_handle; struct lock_class_key; static void devm_gpiod_release(struct device *dev, void *res) { struct gpio_desc **desc = res; gpiod_put(*desc); } static int devm_gpiod_match(struct device *dev, void *res, void *data) { struct gpio_desc **this = res, **gpio = data; return *this == *gpio; } static void devm_gpiod_release_array(struct device *dev, void *res) { struct gpio_descs **descs = res; gpiod_put_array(*descs); } static int devm_gpiod_match_array(struct device *dev, void *res, void *data) { struct gpio_descs **this = res, **gpios = data; return *this == *gpios; } /** * devm_gpiod_get - Resource-managed gpiod_get() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get(). GPIO descriptors returned from this function are * automatically disposed on driver detach. See gpiod_get() for detailed * information about behavior and return values. * * Returns: * The GPIO descriptor corresponding to the function @con_id of device * dev, %-ENOENT if no GPIO has been assigned to the requested function, or * another IS_ERR() code if an error occurred while trying to acquire the GPIO. */ struct gpio_desc *__must_check devm_gpiod_get(struct device *dev, const char *con_id, enum gpiod_flags flags) { return devm_gpiod_get_index(dev, con_id, 0, flags); } EXPORT_SYMBOL_GPL(devm_gpiod_get); /** * devm_gpiod_get_optional - Resource-managed gpiod_get_optional() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get_optional(). GPIO descriptors returned from this function * are automatically disposed on driver detach. See gpiod_get_optional() for * detailed information about behavior and return values. * * Returns: * The GPIO descriptor corresponding to the function @con_id of device * dev, NULL if no GPIO has been assigned to the requested function, or * another IS_ERR() code if an error occurred while trying to acquire the GPIO. */ struct gpio_desc *__must_check devm_gpiod_get_optional(struct device *dev, const char *con_id, enum gpiod_flags flags) { return devm_gpiod_get_index_optional(dev, con_id, 0, flags); } EXPORT_SYMBOL_GPL(devm_gpiod_get_optional); /** * devm_gpiod_get_index - Resource-managed gpiod_get_index() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @idx: index of the GPIO to obtain in the consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get_index(). GPIO descriptors returned from this function are * automatically disposed on driver detach. See gpiod_get_index() for detailed * information about behavior and return values. * * Returns: * The GPIO descriptor corresponding to the function @con_id of device * dev, %-ENOENT if no GPIO has been assigned to the requested function, or * another IS_ERR() code if an error occurred while trying to acquire the GPIO. */ struct gpio_desc *__must_check devm_gpiod_get_index(struct device *dev, const char *con_id, unsigned int idx, enum gpiod_flags flags) { struct gpio_desc **dr; struct gpio_desc *desc; desc = gpiod_get_index(dev, con_id, idx, flags); if (IS_ERR(desc)) return desc; /* * For non-exclusive GPIO descriptors, check if this descriptor is * already under resource management by this device. */ if (flags & GPIOD_FLAGS_BIT_NONEXCLUSIVE) { struct devres *dres; dres = devres_find(dev, devm_gpiod_release, devm_gpiod_match, &desc); if (dres) return desc; } dr = devres_alloc(devm_gpiod_release, sizeof(struct gpio_desc *), GFP_KERNEL); if (!dr) { gpiod_put(desc); return ERR_PTR(-ENOMEM); } *dr = desc; devres_add(dev, dr); return desc; } EXPORT_SYMBOL_GPL(devm_gpiod_get_index); /** * devm_fwnode_gpiod_get_index - get a GPIO descriptor from a given node * @dev: GPIO consumer * @fwnode: firmware node containing GPIO reference * @con_id: function within the GPIO consumer * @index: index of the GPIO to obtain in the consumer * @flags: GPIO initialization flags * @label: label to attach to the requested GPIO * * GPIO descriptors returned from this function are automatically disposed on * driver detach. * * Returns: * The GPIO descriptor corresponding to the function @con_id of device * dev, %-ENOENT if no GPIO has been assigned to the requested function, or * another IS_ERR() code if an error occurred while trying to acquire the GPIO. */ struct gpio_desc *devm_fwnode_gpiod_get_index(struct device *dev, struct fwnode_handle *fwnode, const char *con_id, int index, enum gpiod_flags flags, const char *label) { struct gpio_desc **dr; struct gpio_desc *desc; dr = devres_alloc(devm_gpiod_release, sizeof(struct gpio_desc *), GFP_KERNEL); if (!dr) return ERR_PTR(-ENOMEM); desc = gpiod_find_and_request(dev, fwnode, con_id, index, flags, label, false); if (IS_ERR(desc)) { devres_free(dr); return desc; } *dr = desc; devres_add(dev, dr); return desc; } EXPORT_SYMBOL_GPL(devm_fwnode_gpiod_get_index); /** * devm_gpiod_get_index_optional - Resource-managed gpiod_get_index_optional() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @index: index of the GPIO to obtain in the consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get_index_optional(). GPIO descriptors returned from this * function are automatically disposed on driver detach. See * gpiod_get_index_optional() for detailed information about behavior and * return values. * * Returns: * The GPIO descriptor corresponding to the function @con_id of device * dev, %NULL if no GPIO has been assigned to the requested function, or * another IS_ERR() code if an error occurred while trying to acquire the GPIO. */ struct gpio_desc *__must_check devm_gpiod_get_index_optional(struct device *dev, const char *con_id, unsigned int index, enum gpiod_flags flags) { struct gpio_desc *desc; desc = devm_gpiod_get_index(dev, con_id, index, flags); if (gpiod_not_found(desc)) return NULL; return desc; } EXPORT_SYMBOL_GPL(devm_gpiod_get_index_optional); /** * devm_gpiod_get_array - Resource-managed gpiod_get_array() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get_array(). GPIO descriptors returned from this function are * automatically disposed on driver detach. See gpiod_get_array() for detailed * information about behavior and return values. * * Returns: * The GPIO descriptors corresponding to the function @con_id of device * dev, %-ENOENT if no GPIO has been assigned to the requested function, * or another IS_ERR() code if an error occurred while trying to acquire * the GPIOs. */ struct gpio_descs *__must_check devm_gpiod_get_array(struct device *dev, const char *con_id, enum gpiod_flags flags) { struct gpio_descs **dr; struct gpio_descs *descs; dr = devres_alloc(devm_gpiod_release_array, sizeof(struct gpio_descs *), GFP_KERNEL); if (!dr) return ERR_PTR(-ENOMEM); descs = gpiod_get_array(dev, con_id, flags); if (IS_ERR(descs)) { devres_free(dr); return descs; } *dr = descs; devres_add(dev, dr); return descs; } EXPORT_SYMBOL_GPL(devm_gpiod_get_array); /** * devm_gpiod_get_array_optional - Resource-managed gpiod_get_array_optional() * @dev: GPIO consumer * @con_id: function within the GPIO consumer * @flags: optional GPIO initialization flags * * Managed gpiod_get_array_optional(). GPIO descriptors returned from this * function are automatically disposed on driver detach. * See gpiod_get_array_optional() for detailed information about behavior and * return values. * * Returns: * The GPIO descriptors corresponding to the function @con_id of device * dev, %NULL if no GPIO has been assigned to the requested function, * or another IS_ERR() code if an error occurred while trying to acquire * the GPIOs. */ struct gpio_descs *__must_check devm_gpiod_get_array_optional(struct device *dev, const char *con_id, enum gpiod_flags flags) { struct gpio_descs *descs; descs = devm_gpiod_get_array(dev, con_id, flags); if (gpiod_not_found(descs)) return NULL; return descs; } EXPORT_SYMBOL_GPL(devm_gpiod_get_array_optional); /** * devm_gpiod_put - Resource-managed gpiod_put() * @dev: GPIO consumer * @desc: GPIO descriptor to dispose of * * Dispose of a GPIO descriptor obtained with devm_gpiod_get() or * devm_gpiod_get_index(). Normally this function will not be called as the GPIO * will be disposed of by the resource management code. */ void devm_gpiod_put(struct device *dev, struct gpio_desc *desc) { WARN_ON(devres_release(dev, devm_gpiod_release, devm_gpiod_match, &desc)); } EXPORT_SYMBOL_GPL(devm_gpiod_put); /** * devm_gpiod_unhinge - Remove resource management from a gpio descriptor * @dev: GPIO consumer * @desc: GPIO descriptor to remove resource management from * * Remove resource management from a GPIO descriptor. This is needed when * you want to hand over lifecycle management of a descriptor to another * mechanism. */ void devm_gpiod_unhinge(struct device *dev, struct gpio_desc *desc) { int ret; if (IS_ERR_OR_NULL(desc)) return; ret = devres_destroy(dev, devm_gpiod_release, devm_gpiod_match, &desc); /* * If the GPIO descriptor is requested as nonexclusive, we * may call this function several times on the same descriptor * so it is OK if devres_destroy() returns -ENOENT. */ if (ret == -ENOENT) return; /* Anything else we should warn about */ WARN_ON(ret); } EXPORT_SYMBOL_GPL(devm_gpiod_unhinge); /** * devm_gpiod_put_array - Resource-managed gpiod_put_array() * @dev: GPIO consumer * @descs: GPIO descriptor array to dispose of * * Dispose of an array of GPIO descriptors obtained with devm_gpiod_get_array(). * Normally this function will not be called as the GPIOs will be disposed of * by the resource management code. */ void devm_gpiod_put_array(struct device *dev, struct gpio_descs *descs) { WARN_ON(devres_release(dev, devm_gpiod_release_array, devm_gpiod_match_array, &descs)); } EXPORT_SYMBOL_GPL(devm_gpiod_put_array); static void devm_gpio_chip_release(void *data) { struct gpio_chip *gc = data; gpiochip_remove(gc); } /** * devm_gpiochip_add_data_with_key() - Resource managed gpiochip_add_data_with_key() * @dev: pointer to the device that gpio_chip belongs to. * @gc: the GPIO chip to register * @data: driver-private data associated with this chip * @lock_key: lockdep class for IRQ lock * @request_key: lockdep class for IRQ request * * Context: potentially before irqs will work * * The gpio chip automatically be released when the device is unbound. * * Returns: * A negative errno if the chip can't be registered, such as because the * gc->base is invalid or already associated with a different chip. * Otherwise it returns zero as a success code. */ int devm_gpiochip_add_data_with_key(struct device *dev, struct gpio_chip *gc, void *data, struct lock_class_key *lock_key, struct lock_class_key *request_key) { int ret; ret = gpiochip_add_data_with_key(gc, data, lock_key, request_key); if (ret < 0) return ret; return devm_add_action_or_reset(dev, devm_gpio_chip_release, gc); } EXPORT_SYMBOL_GPL(devm_gpiochip_add_data_with_key); |
2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPVS: Least-Connection Scheduling module * * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> * * Changes: * Wensong Zhang : added the ip_vs_lc_update_svc * Wensong Zhang : added any dest with weight=0 is quiesced */ #define KMSG_COMPONENT "IPVS" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/module.h> #include <linux/kernel.h> #include <net/ip_vs.h> /* * Least Connection scheduling */ static struct ip_vs_dest * ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest, *least = NULL; unsigned int loh = 0, doh; IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); /* * Simply select the server with the least number of * (activeconns<<5) + inactconns * Except whose weight is equal to zero. * If the weight is equal to zero, it means that the server is * quiesced, the existing connections to the server still get * served, but no new connection is assigned to the server. */ list_for_each_entry_rcu(dest, &svc->destinations, n_list) { if ((dest->flags & IP_VS_DEST_F_OVERLOAD) || atomic_read(&dest->weight) == 0) continue; doh = ip_vs_dest_conn_overhead(dest); if (!least || doh < loh) { least = dest; loh = doh; } } if (!least) ip_vs_scheduler_err(svc, "no destination available"); else IP_VS_DBG_BUF(6, "LC: server %s:%u activeconns %d " "inactconns %d\n", IP_VS_DBG_ADDR(least->af, &least->addr), ntohs(least->port), atomic_read(&least->activeconns), atomic_read(&least->inactconns)); return least; } static struct ip_vs_scheduler ip_vs_lc_scheduler = { .name = "lc", .refcnt = ATOMIC_INIT(0), .module = THIS_MODULE, .n_list = LIST_HEAD_INIT(ip_vs_lc_scheduler.n_list), .schedule = ip_vs_lc_schedule, }; static int __init ip_vs_lc_init(void) { return register_ip_vs_scheduler(&ip_vs_lc_scheduler) ; } static void __exit ip_vs_lc_cleanup(void) { unregister_ip_vs_scheduler(&ip_vs_lc_scheduler); synchronize_rcu(); } module_init(ip_vs_lc_init); module_exit(ip_vs_lc_cleanup); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("ipvs least connection scheduler"); |
1 1 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/net/sunrpc/svc.c * * High-level RPC service routines * * Copyright (C) 1995, 1996 Olaf Kirch <okir@monad.swb.de> * * Multiple threads pools and NUMAisation * Copyright (c) 2006 Silicon Graphics, Inc. * by Greg Banks <gnb@melbourne.sgi.com> */ #include <linux/linkage.h> #include <linux/sched/signal.h> #include <linux/errno.h> #include <linux/net.h> #include <linux/in.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/module.h> #include <linux/kthread.h> #include <linux/slab.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/xdr.h> #include <linux/sunrpc/stats.h> #include <linux/sunrpc/svcsock.h> #include <linux/sunrpc/clnt.h> #include <linux/sunrpc/bc_xprt.h> #include <trace/events/sunrpc.h> #include "fail.h" #include "sunrpc.h" #define RPCDBG_FACILITY RPCDBG_SVCDSP static void svc_unregister(const struct svc_serv *serv, struct net *net); #define SVC_POOL_DEFAULT SVC_POOL_GLOBAL /* * Mode for mapping cpus to pools. */ enum { SVC_POOL_AUTO = -1, /* choose one of the others */ SVC_POOL_GLOBAL, /* no mapping, just a single global pool * (legacy & UP mode) */ SVC_POOL_PERCPU, /* one pool per cpu */ SVC_POOL_PERNODE /* one pool per numa node */ }; /* * Structure for mapping cpus to pools and vice versa. * Setup once during sunrpc initialisation. */ struct svc_pool_map { int count; /* How many svc_servs use us */ int mode; /* Note: int not enum to avoid * warnings about "enumeration value * not handled in switch" */ unsigned int npools; unsigned int *pool_to; /* maps pool id to cpu or node */ unsigned int *to_pool; /* maps cpu or node to pool id */ }; static struct svc_pool_map svc_pool_map = { .mode = SVC_POOL_DEFAULT }; static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */ static int __param_set_pool_mode(const char *val, struct svc_pool_map *m) { int err, mode; mutex_lock(&svc_pool_map_mutex); err = 0; if (!strncmp(val, "auto", 4)) mode = SVC_POOL_AUTO; else if (!strncmp(val, "global", 6)) mode = SVC_POOL_GLOBAL; else if (!strncmp(val, "percpu", 6)) mode = SVC_POOL_PERCPU; else if (!strncmp(val, "pernode", 7)) mode = SVC_POOL_PERNODE; else err = -EINVAL; if (err) goto out; if (m->count == 0) m->mode = mode; else if (mode != m->mode) err = -EBUSY; out: mutex_unlock(&svc_pool_map_mutex); return err; } static int param_set_pool_mode(const char *val, const struct kernel_param *kp) { struct svc_pool_map *m = kp->arg; return __param_set_pool_mode(val, m); } int sunrpc_set_pool_mode(const char *val) { return __param_set_pool_mode(val, &svc_pool_map); } EXPORT_SYMBOL(sunrpc_set_pool_mode); /** * sunrpc_get_pool_mode - get the current pool_mode for the host * @buf: where to write the current pool_mode * @size: size of @buf * * Grab the current pool_mode from the svc_pool_map and write * the resulting string to @buf. Returns the number of characters * written to @buf (a'la snprintf()). */ int sunrpc_get_pool_mode(char *buf, size_t size) { struct svc_pool_map *m = &svc_pool_map; switch (m->mode) { case SVC_POOL_AUTO: return snprintf(buf, size, "auto"); case SVC_POOL_GLOBAL: return snprintf(buf, size, "global"); case SVC_POOL_PERCPU: return snprintf(buf, size, "percpu"); case SVC_POOL_PERNODE: return snprintf(buf, size, "pernode"); default: return snprintf(buf, size, "%d", m->mode); } } EXPORT_SYMBOL(sunrpc_get_pool_mode); static int param_get_pool_mode(char *buf, const struct kernel_param *kp) { char str[16]; int len; len = sunrpc_get_pool_mode(str, ARRAY_SIZE(str)); /* Ensure we have room for newline and NUL */ len = min_t(int, len, ARRAY_SIZE(str) - 2); /* tack on the newline */ str[len] = '\n'; str[len + 1] = '\0'; return sysfs_emit(buf, "%s", str); } module_param_call(pool_mode, param_set_pool_mode, param_get_pool_mode, &svc_pool_map, 0644); /* * Detect best pool mapping mode heuristically, * according to the machine's topology. */ static int svc_pool_map_choose_mode(void) { unsigned int node; if (nr_online_nodes > 1) { /* * Actually have multiple NUMA nodes, * so split pools on NUMA node boundaries */ return SVC_POOL_PERNODE; } node = first_online_node; if (nr_cpus_node(node) > 2) { /* * Non-trivial SMP, or CONFIG_NUMA on * non-NUMA hardware, e.g. with a generic * x86_64 kernel on Xeons. In this case we * want to divide the pools on cpu boundaries. */ return SVC_POOL_PERCPU; } /* default: one global pool */ return SVC_POOL_GLOBAL; } /* * Allocate the to_pool[] and pool_to[] arrays. * Returns 0 on success or an errno. */ static int svc_pool_map_alloc_arrays(struct svc_pool_map *m, unsigned int maxpools) { m->to_pool = kcalloc(maxpools, sizeof(unsigned int), GFP_KERNEL); if (!m->to_pool) goto fail; m->pool_to = kcalloc(maxpools, sizeof(unsigned int), GFP_KERNEL); if (!m->pool_to) goto fail_free; return 0; fail_free: kfree(m->to_pool); m->to_pool = NULL; fail: return -ENOMEM; } /* * Initialise the pool map for SVC_POOL_PERCPU mode. * Returns number of pools or <0 on error. */ static int svc_pool_map_init_percpu(struct svc_pool_map *m) { unsigned int maxpools = nr_cpu_ids; unsigned int pidx = 0; unsigned int cpu; int err; err = svc_pool_map_alloc_arrays(m, maxpools); if (err) return err; for_each_online_cpu(cpu) { BUG_ON(pidx >= maxpools); m->to_pool[cpu] = pidx; m->pool_to[pidx] = cpu; pidx++; } /* cpus brought online later all get mapped to pool0, sorry */ return pidx; }; /* * Initialise the pool map for SVC_POOL_PERNODE mode. * Returns number of pools or <0 on error. */ static int svc_pool_map_init_pernode(struct svc_pool_map *m) { unsigned int maxpools = nr_node_ids; unsigned int pidx = 0; unsigned int node; int err; err = svc_pool_map_alloc_arrays(m, maxpools); if (err) return err; for_each_node_with_cpus(node) { /* some architectures (e.g. SN2) have cpuless nodes */ BUG_ON(pidx > maxpools); m->to_pool[node] = pidx; m->pool_to[pidx] = node; pidx++; } /* nodes brought online later all get mapped to pool0, sorry */ return pidx; } /* * Add a reference to the global map of cpus to pools (and * vice versa) if pools are in use. * Initialise the map if we're the first user. * Returns the number of pools. If this is '1', no reference * was taken. */ static unsigned int svc_pool_map_get(void) { struct svc_pool_map *m = &svc_pool_map; int npools = -1; mutex_lock(&svc_pool_map_mutex); if (m->count++) { mutex_unlock(&svc_pool_map_mutex); return m->npools; } if (m->mode == SVC_POOL_AUTO) m->mode = svc_pool_map_choose_mode(); switch (m->mode) { case SVC_POOL_PERCPU: npools = svc_pool_map_init_percpu(m); break; case SVC_POOL_PERNODE: npools = svc_pool_map_init_pernode(m); break; } if (npools <= 0) { /* default, or memory allocation failure */ npools = 1; m->mode = SVC_POOL_GLOBAL; } m->npools = npools; mutex_unlock(&svc_pool_map_mutex); return npools; } /* * Drop a reference to the global map of cpus to pools. * When the last reference is dropped, the map data is * freed; this allows the sysadmin to change the pool. */ static void svc_pool_map_put(void) { struct svc_pool_map *m = &svc_pool_map; mutex_lock(&svc_pool_map_mutex); if (!--m->count) { kfree(m->to_pool); m->to_pool = NULL; kfree(m->pool_to); m->pool_to = NULL; m->npools = 0; } mutex_unlock(&svc_pool_map_mutex); } static int svc_pool_map_get_node(unsigned int pidx) { const struct svc_pool_map *m = &svc_pool_map; if (m->count) { if (m->mode == SVC_POOL_PERCPU) return cpu_to_node(m->pool_to[pidx]); if (m->mode == SVC_POOL_PERNODE) return m->pool_to[pidx]; } return NUMA_NO_NODE; } /* * Set the given thread's cpus_allowed mask so that it * will only run on cpus in the given pool. */ static inline void svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx) { struct svc_pool_map *m = &svc_pool_map; unsigned int node = m->pool_to[pidx]; /* * The caller checks for sv_nrpools > 1, which * implies that we've been initialized. */ WARN_ON_ONCE(m->count == 0); if (m->count == 0) return; switch (m->mode) { case SVC_POOL_PERCPU: { set_cpus_allowed_ptr(task, cpumask_of(node)); break; } case SVC_POOL_PERNODE: { set_cpus_allowed_ptr(task, cpumask_of_node(node)); break; } } } /** * svc_pool_for_cpu - Select pool to run a thread on this cpu * @serv: An RPC service * * Use the active CPU and the svc_pool_map's mode setting to * select the svc thread pool to use. Once initialized, the * svc_pool_map does not change. * * Return value: * A pointer to an svc_pool */ struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv) { struct svc_pool_map *m = &svc_pool_map; int cpu = raw_smp_processor_id(); unsigned int pidx = 0; if (serv->sv_nrpools <= 1) return serv->sv_pools; switch (m->mode) { case SVC_POOL_PERCPU: pidx = m->to_pool[cpu]; break; case SVC_POOL_PERNODE: pidx = m->to_pool[cpu_to_node(cpu)]; break; } return &serv->sv_pools[pidx % serv->sv_nrpools]; } static int svc_rpcb_setup(struct svc_serv *serv, struct net *net) { int err; err = rpcb_create_local(net); if (err) return err; /* Remove any stale portmap registrations */ svc_unregister(serv, net); return 0; } void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net) { svc_unregister(serv, net); rpcb_put_local(net); } EXPORT_SYMBOL_GPL(svc_rpcb_cleanup); static int svc_uses_rpcbind(struct svc_serv *serv) { unsigned int p, i; for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; if (!progp->pg_vers[i]->vs_hidden) return 1; } } return 0; } int svc_bind(struct svc_serv *serv, struct net *net) { if (!svc_uses_rpcbind(serv)) return 0; return svc_rpcb_setup(serv, net); } EXPORT_SYMBOL_GPL(svc_bind); #if defined(CONFIG_SUNRPC_BACKCHANNEL) static void __svc_init_bc(struct svc_serv *serv) { lwq_init(&serv->sv_cb_list); } #else static void __svc_init_bc(struct svc_serv *serv) { } #endif /* * Create an RPC service */ static struct svc_serv * __svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats, unsigned int bufsize, int npools, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int vers; unsigned int xdrsize; unsigned int i; if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL))) return NULL; serv->sv_name = prog->pg_name; serv->sv_programs = prog; serv->sv_nprogs = nprogs; serv->sv_stats = stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); serv->sv_threadfn = threadfn; xdrsize = 0; for (i = 0; i < nprogs; i++) { struct svc_program *progp = &prog[i]; progp->pg_lovers = progp->pg_nvers-1; for (vers = 0; vers < progp->pg_nvers ; vers++) if (progp->pg_vers[vers]) { progp->pg_hivers = vers; if (progp->pg_lovers > vers) progp->pg_lovers = vers; if (progp->pg_vers[vers]->vs_xdrsize > xdrsize) xdrsize = progp->pg_vers[vers]->vs_xdrsize; } } serv->sv_xdrsize = xdrsize; INIT_LIST_HEAD(&serv->sv_tempsocks); INIT_LIST_HEAD(&serv->sv_permsocks); timer_setup(&serv->sv_temptimer, NULL, 0); spin_lock_init(&serv->sv_lock); __svc_init_bc(serv); serv->sv_nrpools = npools; serv->sv_pools = kcalloc(serv->sv_nrpools, sizeof(struct svc_pool), GFP_KERNEL); if (!serv->sv_pools) { kfree(serv); return NULL; } for (i = 0; i < serv->sv_nrpools; i++) { struct svc_pool *pool = &serv->sv_pools[i]; dprintk("svc: initialising pool %u for %s\n", i, serv->sv_name); pool->sp_id = i; lwq_init(&pool->sp_xprts); INIT_LIST_HEAD(&pool->sp_all_threads); init_llist_head(&pool->sp_idle_threads); percpu_counter_init(&pool->sp_messages_arrived, 0, GFP_KERNEL); percpu_counter_init(&pool->sp_sockets_queued, 0, GFP_KERNEL); percpu_counter_init(&pool->sp_threads_woken, 0, GFP_KERNEL); } return serv; } /** * svc_create - Create an RPC service * @prog: the RPC program the new service will handle * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog * * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, int (*threadfn)(void *data)) { return __svc_create(prog, 1, NULL, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create); /** * svc_create_pooled - Create an RPC service with pooled threads * @prog: Array of RPC programs the new service will handle * @nprogs: Number of programs in the array * @stats: the stats struct if desired * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog * * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create_pooled(struct svc_program *prog, unsigned int nprogs, struct svc_stat *stats, unsigned int bufsize, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn); if (!serv) goto out_err; serv->sv_is_pooled = true; return serv; out_err: svc_pool_map_put(); return NULL; } EXPORT_SYMBOL_GPL(svc_create_pooled); /* * Destroy an RPC service. Should be called with appropriate locking to * protect sv_permsocks and sv_tempsocks. */ void svc_destroy(struct svc_serv **servp) { struct svc_serv *serv = *servp; unsigned int i; *servp = NULL; dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name); timer_shutdown_sync(&serv->sv_temptimer); /* * Remaining transports at this point are not expected. */ WARN_ONCE(!list_empty(&serv->sv_permsocks), "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name); WARN_ONCE(!list_empty(&serv->sv_tempsocks), "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name); cache_clean_deferred(serv); if (serv->sv_is_pooled) svc_pool_map_put(); for (i = 0; i < serv->sv_nrpools; i++) { struct svc_pool *pool = &serv->sv_pools[i]; percpu_counter_destroy(&pool->sp_messages_arrived); percpu_counter_destroy(&pool->sp_sockets_queued); percpu_counter_destroy(&pool->sp_threads_woken); } kfree(serv->sv_pools); kfree(serv); } EXPORT_SYMBOL_GPL(svc_destroy); static bool svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node) { unsigned long pages, ret; /* bc_xprt uses fore channel allocated buffers */ if (svc_is_backchannel(rqstp)) return true; pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply. * We assume one is at most one page */ WARN_ON_ONCE(pages > RPCSVC_MAXPAGES); if (pages > RPCSVC_MAXPAGES) pages = RPCSVC_MAXPAGES; ret = alloc_pages_bulk_array_node(GFP_KERNEL, node, pages, rqstp->rq_pages); return ret == pages; } /* * Release an RPC server buffer */ static void svc_release_buffer(struct svc_rqst *rqstp) { unsigned int i; for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++) if (rqstp->rq_pages[i]) put_page(rqstp->rq_pages[i]); } static void svc_rqst_free(struct svc_rqst *rqstp) { folio_batch_release(&rqstp->rq_fbatch); svc_release_buffer(rqstp); if (rqstp->rq_scratch_page) put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data); kfree_rcu(rqstp, rq_rcu_head); } static struct svc_rqst * svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) { struct svc_rqst *rqstp; rqstp = kzalloc_node(sizeof(*rqstp), GFP_KERNEL, node); if (!rqstp) return rqstp; folio_batch_init(&rqstp->rq_fbatch); rqstp->rq_server = serv; rqstp->rq_pool = pool; rqstp->rq_scratch_page = alloc_pages_node(node, GFP_KERNEL, 0); if (!rqstp->rq_scratch_page) goto out_enomem; rqstp->rq_argp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_argp) goto out_enomem; rqstp->rq_resp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_resp) goto out_enomem; if (!svc_init_buffer(rqstp, serv->sv_max_mesg, node)) goto out_enomem; rqstp->rq_err = -EAGAIN; /* No error yet */ serv->sv_nrthreads += 1; pool->sp_nrthreads += 1; /* Protected by whatever lock the service uses when calling * svc_set_num_threads() */ list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads); return rqstp; out_enomem: svc_rqst_free(rqstp); return NULL; } /** * svc_pool_wake_idle_thread - Awaken an idle thread in @pool * @pool: service thread pool * * Can be called from soft IRQ or process context. Finding an idle * service thread and marking it BUSY is atomic with respect to * other calls to svc_pool_wake_idle_thread(). * */ void svc_pool_wake_idle_thread(struct svc_pool *pool) { struct svc_rqst *rqstp; struct llist_node *ln; rcu_read_lock(); ln = READ_ONCE(pool->sp_idle_threads.first); if (ln) { rqstp = llist_entry(ln, struct svc_rqst, rq_idle); WRITE_ONCE(rqstp->rq_qtime, ktime_get()); if (!task_is_running(rqstp->rq_task)) { wake_up_process(rqstp->rq_task); trace_svc_wake_up(rqstp->rq_task->pid); percpu_counter_inc(&pool->sp_threads_woken); } rcu_read_unlock(); return; } rcu_read_unlock(); } EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread); static struct svc_pool * svc_pool_next(struct svc_serv *serv, struct svc_pool *pool, unsigned int *state) { return pool ? pool : &serv->sv_pools[(*state)++ % serv->sv_nrpools]; } static struct svc_pool * svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool, unsigned int *state) { struct svc_pool *pool; unsigned int i; pool = target_pool; if (!pool) { for (i = 0; i < serv->sv_nrpools; i++) { pool = &serv->sv_pools[--(*state) % serv->sv_nrpools]; if (pool->sp_nrthreads) break; } } if (pool && pool->sp_nrthreads) { set_bit(SP_VICTIM_REMAINS, &pool->sp_flags); set_bit(SP_NEED_VICTIM, &pool->sp_flags); return pool; } return NULL; } static int svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { struct svc_rqst *rqstp; struct task_struct *task; struct svc_pool *chosen_pool; unsigned int state = serv->sv_nrthreads-1; int node; int err; do { nrservs--; chosen_pool = svc_pool_next(serv, pool, &state); node = svc_pool_map_get_node(chosen_pool->sp_id); rqstp = svc_prepare_thread(serv, chosen_pool, node); if (!rqstp) return -ENOMEM; task = kthread_create_on_node(serv->sv_threadfn, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { svc_exit_thread(rqstp); return PTR_ERR(task); } rqstp->rq_task = task; if (serv->sv_nrpools > 1) svc_pool_map_set_cpumask(task, chosen_pool->sp_id); svc_sock_update_bufs(serv); wake_up_process(task); wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN); err = rqstp->rq_err; if (err) { svc_exit_thread(rqstp); return err; } } while (nrservs > 0); return 0; } static int svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { unsigned int state = serv->sv_nrthreads-1; struct svc_pool *victim; do { victim = svc_pool_victim(serv, pool, &state); if (!victim) break; svc_pool_wake_idle_thread(victim); wait_on_bit(&victim->sp_flags, SP_VICTIM_REMAINS, TASK_IDLE); nrservs++; } while (nrservs < 0); return 0; } /** * svc_set_num_threads - adjust number of threads per RPC service * @serv: RPC service to adjust * @pool: Specific pool from which to choose threads, or NULL * @nrservs: New number of threads for @serv (0 or less means kill all threads) * * Create or destroy threads to make the number of threads for @serv the * given number. If @pool is non-NULL, change only threads in that pool; * otherwise, round-robin between all pools for @serv. @serv's * sv_nrthreads is adjusted for each thread created or destroyed. * * Caller must ensure mutual exclusion between this and server startup or * shutdown. * * Returns zero on success or a negative errno if an error occurred while * starting a thread. */ int svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (!pool) nrservs -= serv->sv_nrthreads; else nrservs -= pool->sp_nrthreads; if (nrservs > 0) return svc_start_kthreads(serv, pool, nrservs); if (nrservs < 0) return svc_stop_kthreads(serv, pool, nrservs); return 0; } EXPORT_SYMBOL_GPL(svc_set_num_threads); /** * svc_rqst_replace_page - Replace one page in rq_pages[] * @rqstp: svc_rqst with pages to replace * @page: replacement page * * When replacing a page in rq_pages, batch the release of the * replaced pages to avoid hammering the page allocator. * * Return values: * %true: page replaced * %false: array bounds checking failed */ bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) { struct page **begin = rqstp->rq_pages; struct page **end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; if (unlikely(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) { trace_svc_replace_page_err(rqstp); return false; } if (*rqstp->rq_next_page) { if (!folio_batch_add(&rqstp->rq_fbatch, page_folio(*rqstp->rq_next_page))) __folio_batch_release(&rqstp->rq_fbatch); } get_page(page); *(rqstp->rq_next_page++) = page; return true; } EXPORT_SYMBOL_GPL(svc_rqst_replace_page); /** * svc_rqst_release_pages - Release Reply buffer pages * @rqstp: RPC transaction context * * Release response pages that might still be in flight after * svc_send, and any spliced filesystem-owned pages. */ void svc_rqst_release_pages(struct svc_rqst *rqstp) { int i, count = rqstp->rq_next_page - rqstp->rq_respages; if (count) { release_pages(rqstp->rq_respages, count); for (i = 0; i < count; i++) rqstp->rq_respages[i] = NULL; } } /** * svc_exit_thread - finalise the termination of a sunrpc server thread * @rqstp: the svc_rqst which represents the thread. * * When a thread started with svc_new_thread() exits it must call * svc_exit_thread() as its last act. This must be done with the * service mutex held. Normally this is held by a DIFFERENT thread, the * one that is calling svc_set_num_threads() and which will wait for * SP_VICTIM_REMAINS to be cleared before dropping the mutex. If the * thread exits for any reason other than svc_thread_should_stop() * returning %true (which indicated that svc_set_num_threads() is * waiting for it to exit), then it must take the service mutex itself, * which can only safely be done using mutex_try_lock(). */ void svc_exit_thread(struct svc_rqst *rqstp) { struct svc_serv *serv = rqstp->rq_server; struct svc_pool *pool = rqstp->rq_pool; list_del_rcu(&rqstp->rq_all); pool->sp_nrthreads -= 1; serv->sv_nrthreads -= 1; svc_sock_update_bufs(serv); svc_rqst_free(rqstp); clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags); } EXPORT_SYMBOL_GPL(svc_exit_thread); /* * Register an "inet" protocol family netid with the local * rpcbind daemon via an rpcbind v4 SET request. * * No netconfig infrastructure is available in the kernel, so * we map IP_ protocol numbers to netids by hand. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_rpcb_register4(struct net *net, const u32 program, const u32 version, const unsigned short protocol, const unsigned short port) { const struct sockaddr_in sin = { .sin_family = AF_INET, .sin_addr.s_addr = htonl(INADDR_ANY), .sin_port = htons(port), }; const char *netid; int error; switch (protocol) { case IPPROTO_UDP: netid = RPCBIND_NETID_UDP; break; case IPPROTO_TCP: netid = RPCBIND_NETID_TCP; break; default: return -ENOPROTOOPT; } error = rpcb_v4_register(net, program, version, (const struct sockaddr *)&sin, netid); /* * User space didn't support rpcbind v4, so retry this * registration request with the legacy rpcbind v2 protocol. */ if (error == -EPROTONOSUPPORT) error = rpcb_register(net, program, version, protocol, port); return error; } #if IS_ENABLED(CONFIG_IPV6) /* * Register an "inet6" protocol family netid with the local * rpcbind daemon via an rpcbind v4 SET request. * * No netconfig infrastructure is available in the kernel, so * we map IP_ protocol numbers to netids by hand. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_rpcb_register6(struct net *net, const u32 program, const u32 version, const unsigned short protocol, const unsigned short port) { const struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6, .sin6_addr = IN6ADDR_ANY_INIT, .sin6_port = htons(port), }; const char *netid; int error; switch (protocol) { case IPPROTO_UDP: netid = RPCBIND_NETID_UDP6; break; case IPPROTO_TCP: netid = RPCBIND_NETID_TCP6; break; default: return -ENOPROTOOPT; } error = rpcb_v4_register(net, program, version, (const struct sockaddr *)&sin6, netid); /* * User space didn't support rpcbind version 4, so we won't * use a PF_INET6 listener. */ if (error == -EPROTONOSUPPORT) error = -EAFNOSUPPORT; return error; } #endif /* IS_ENABLED(CONFIG_IPV6) */ /* * Register a kernel RPC service via rpcbind version 4. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_register(struct net *net, const char *progname, const u32 program, const u32 version, const int family, const unsigned short protocol, const unsigned short port) { int error = -EAFNOSUPPORT; switch (family) { case PF_INET: error = __svc_rpcb_register4(net, program, version, protocol, port); break; #if IS_ENABLED(CONFIG_IPV6) case PF_INET6: error = __svc_rpcb_register6(net, program, version, protocol, port); #endif } trace_svc_register(progname, version, family, protocol, port, error); return error; } static int svc_rpcbind_set_version(struct net *net, const struct svc_program *progp, u32 version, int family, unsigned short proto, unsigned short port) { return __svc_register(net, progp->pg_name, progp->pg_prog, version, family, proto, port); } int svc_generic_rpcbind_set(struct net *net, const struct svc_program *progp, u32 version, int family, unsigned short proto, unsigned short port) { const struct svc_version *vers = progp->pg_vers[version]; int error; if (vers == NULL) return 0; if (vers->vs_hidden) { trace_svc_noregister(progp->pg_name, version, proto, port, family, 0); return 0; } /* * Don't register a UDP port if we need congestion * control. */ if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP) return 0; error = svc_rpcbind_set_version(net, progp, version, family, proto, port); return (vers->vs_rpcb_optnl) ? 0 : error; } EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set); /** * svc_register - register an RPC service with the local portmapper * @serv: svc_serv struct for the service to register * @net: net namespace for the service to register * @family: protocol family of service's listener socket * @proto: transport protocol number to advertise * @port: port to advertise * * Service is registered for any address in the passed-in protocol family */ int svc_register(const struct svc_serv *serv, struct net *net, const int family, const unsigned short proto, const unsigned short port) { unsigned int p, i; int error = 0; WARN_ON_ONCE(proto == 0 && port == 0); if (proto == 0 && port == 0) return -EINVAL; for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { error = progp->pg_rpcbind_set(net, progp, i, family, proto, port); if (error < 0) { printk(KERN_WARNING "svc: failed to register " "%sv%u RPC service (errno %d).\n", progp->pg_name, i, -error); break; } } } return error; } /* * If user space is running rpcbind, it should take the v4 UNSET * and clear everything for this [program, version]. If user space * is running portmap, it will reject the v4 UNSET, but won't have * any "inet6" entries anyway. So a PMAP_UNSET should be sufficient * in this case to clear all existing entries for [program, version]. */ static void __svc_unregister(struct net *net, const u32 program, const u32 version, const char *progname) { int error; error = rpcb_v4_register(net, program, version, NULL, ""); /* * User space didn't support rpcbind v4, so retry this * request with the legacy rpcbind v2 protocol. */ if (error == -EPROTONOSUPPORT) error = rpcb_register(net, program, version, 0, 0); trace_svc_unregister(progname, version, error); } /* * All netids, bind addresses and ports registered for [program, version] * are removed from the local rpcbind database (if the service is not * hidden) to make way for a new instance of the service. * * The result of unregistration is reported via dprintk for those who want * verification of the result, but is otherwise not important. */ static void svc_unregister(const struct svc_serv *serv, struct net *net) { struct sighand_struct *sighand; unsigned long flags; unsigned int p, i; clear_thread_flag(TIF_SIGPENDING); for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; if (progp->pg_vers[i]->vs_hidden) continue; __svc_unregister(net, progp->pg_prog, i, progp->pg_name); } } rcu_read_lock(); sighand = rcu_dereference(current->sighand); spin_lock_irqsave(&sighand->siglock, flags); recalc_sigpending(); spin_unlock_irqrestore(&sighand->siglock, flags); rcu_read_unlock(); } /* * dprintk the given error with the address of the client that caused it. */ #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) static __printf(2, 3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) { struct va_format vaf; va_list args; char buf[RPC_MAX_ADDRBUFLEN]; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dprintk("svc: %s: %pV", svc_print_addr(rqstp, buf, sizeof(buf)), &vaf); va_end(args); } #else static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif __be32 svc_generic_init_request(struct svc_rqst *rqstp, const struct svc_program *progp, struct svc_process_info *ret) { const struct svc_version *versp = NULL; /* compiler food */ const struct svc_procedure *procp = NULL; if (rqstp->rq_vers >= progp->pg_nvers ) goto err_bad_vers; versp = progp->pg_vers[rqstp->rq_vers]; if (!versp) goto err_bad_vers; /* * Some protocol versions (namely NFSv4) require some form of * congestion control. (See RFC 7530 section 3.1 paragraph 2) * In other words, UDP is not allowed. We mark those when setting * up the svc_xprt, and verify that here. * * The spec is not very clear about what error should be returned * when someone tries to access a server that is listening on UDP * for lower versions. RPC_PROG_MISMATCH seems to be the closest * fit. */ if (versp->vs_need_cong_ctrl && rqstp->rq_xprt && !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags)) goto err_bad_vers; if (rqstp->rq_proc >= versp->vs_nproc) goto err_bad_proc; rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc]; /* Initialize storage for argp and resp */ memset(rqstp->rq_argp, 0, procp->pc_argzero); memset(rqstp->rq_resp, 0, procp->pc_ressize); /* Bump per-procedure stats counter */ this_cpu_inc(versp->vs_count[rqstp->rq_proc]); ret->dispatch = versp->vs_dispatch; return rpc_success; err_bad_vers: ret->mismatch.lovers = progp->pg_lovers; ret->mismatch.hivers = progp->pg_hivers; return rpc_prog_mismatch; err_bad_proc: return rpc_proc_unavail; } EXPORT_SYMBOL_GPL(svc_generic_init_request); /* * Common routine for processing the RPC request. */ static int svc_process_common(struct svc_rqst *rqstp) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct svc_program *progp = NULL; const struct svc_procedure *procp = NULL; struct svc_serv *serv = rqstp->rq_server; struct svc_process_info process; enum svc_auth_status auth_res; unsigned int aoffset; int pr, rc; __be32 *p; /* Will be turned off only when NFSv4 Sessions are used */ set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); clear_bit(RQ_DROPME, &rqstp->rq_flags); /* Construct the first words of the reply: */ svcxdr_init_encode(rqstp); xdr_stream_encode_be32(xdr, rqstp->rq_xid); xdr_stream_encode_be32(xdr, rpc_reply); p = xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 4); if (unlikely(!p)) goto err_short_len; if (*p++ != cpu_to_be32(RPC_VERSION)) goto err_bad_rpc; xdr_stream_encode_be32(xdr, rpc_msg_accepted); rqstp->rq_prog = be32_to_cpup(p++); rqstp->rq_vers = be32_to_cpup(p++); rqstp->rq_proc = be32_to_cpup(p); for (pr = 0; pr < serv->sv_nprogs; pr++) if (rqstp->rq_prog == serv->sv_programs[pr].pg_prog) progp = &serv->sv_programs[pr]; /* * Decode auth data, and add verifier to reply buffer. * We do this before anything else in order to get a decent * auth verifier. */ auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ if (auth_res == SVC_OK && progp) auth_res = progp->pg_authenticate(rqstp); trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { case SVC_OK: break; case SVC_GARBAGE: goto err_garbage_args; case SVC_SYSERR: goto err_system_err; case SVC_DENIED: goto err_bad_auth; case SVC_CLOSE: goto close; case SVC_DROP: goto dropit; case SVC_COMPLETE: goto sendit; default: pr_warn_once("Unexpected svc_auth_status (%d)\n", auth_res); goto err_system_err; } if (progp == NULL) goto err_bad_prog; switch (progp->pg_init_request(rqstp, progp, &process)) { case rpc_success: break; case rpc_prog_unavail: goto err_bad_prog; case rpc_prog_mismatch: goto err_bad_vers; case rpc_proc_unavail: goto err_bad_proc; } procp = rqstp->rq_procinfo; /* Should this check go into the dispatcher? */ if (!procp || !procp->pc_func) goto err_bad_proc; /* Syntactic check complete */ if (serv->sv_stats) serv->sv_stats->rpccnt++; trace_svc_process(rqstp, progp->pg_name); aoffset = xdr_stream_pos(xdr); /* un-reserve some of the out-queue now that we have a * better idea of reply size */ if (procp->pc_xdrressize) svc_reserve_auth(rqstp, procp->pc_xdrressize<<2); /* Call the function that processes the request. */ rc = process.dispatch(rqstp); if (procp->pc_release) procp->pc_release(rqstp); xdr_finish_decode(xdr); if (!rc) goto dropit; if (rqstp->rq_auth_stat != rpc_auth_ok) goto err_bad_auth; if (*rqstp->rq_accept_statp != rpc_success) xdr_truncate_encode(xdr, aoffset); if (procp->pc_encode == NULL) goto dropit; sendit: if (svc_authorise(rqstp)) goto close_xprt; return 1; /* Caller can now send it */ dropit: svc_authorise(rqstp); /* doesn't hurt to call this twice */ dprintk("svc: svc_process dropit\n"); return 0; close: svc_authorise(rqstp); close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) svc_xprt_close(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); return 0; err_short_len: svc_printk(rqstp, "short len %u, dropping request\n", rqstp->rq_arg.len); goto close_xprt; err_bad_rpc: if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; xdr_stream_encode_u32(xdr, RPC_MSG_DENIED); xdr_stream_encode_u32(xdr, RPC_MISMATCH); /* Only RPCv2 supported */ xdr_stream_encode_u32(xdr, RPC_VERSION); xdr_stream_encode_u32(xdr, RPC_VERSION); return 1; /* don't wrap */ err_bad_auth: dprintk("svc: authentication failed (%d)\n", be32_to_cpu(rqstp->rq_auth_stat)); if (serv->sv_stats) serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of reply status: */ xdr_truncate_encode(xdr, XDR_UNIT * 2); xdr_stream_encode_u32(xdr, RPC_MSG_DENIED); xdr_stream_encode_u32(xdr, RPC_AUTH_ERROR); xdr_stream_encode_be32(xdr, rqstp->rq_auth_stat); goto sendit; err_bad_prog: dprintk("svc: unknown program %d\n", rqstp->rq_prog); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_prog_unavail; goto sendit; err_bad_vers: svc_printk(rqstp, "unknown version (%d for prog %d, %s)\n", rqstp->rq_vers, rqstp->rq_prog, progp->pg_name); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_prog_mismatch; /* * svc_authenticate() has already added the verifier and * advanced the stream just past rq_accept_statp. */ xdr_stream_encode_u32(xdr, process.mismatch.lovers); xdr_stream_encode_u32(xdr, process.mismatch.hivers); goto sendit; err_bad_proc: svc_printk(rqstp, "unknown procedure (%d)\n", rqstp->rq_proc); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_proc_unavail; goto sendit; err_garbage_args: svc_printk(rqstp, "failed to decode RPC header\n"); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_garbage_args; goto sendit; err_system_err: if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_system_err; goto sendit; } /* * Drop request */ static void svc_drop(struct svc_rqst *rqstp) { trace_svc_drop(rqstp); } /** * svc_process - Execute one RPC transaction * @rqstp: RPC transaction context * */ void svc_process(struct svc_rqst *rqstp) { struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p; #if IS_ENABLED(CONFIG_FAIL_SUNRPC) if (!fail_sunrpc.ignore_server_disconnect && should_fail(&fail_sunrpc.attr, 1)) svc_xprt_deferred_close(rqstp->rq_xprt); #endif /* * Setup response xdr_buf. * Initially it has just one page */ rqstp->rq_next_page = &rqstp->rq_respages[1]; resv->iov_base = page_address(rqstp->rq_respages[0]); resv->iov_len = 0; rqstp->rq_res.pages = rqstp->rq_next_page; rqstp->rq_res.len = 0; rqstp->rq_res.page_base = 0; rqstp->rq_res.page_len = 0; rqstp->rq_res.buflen = PAGE_SIZE; rqstp->rq_res.tail[0].iov_base = NULL; rqstp->rq_res.tail[0].iov_len = 0; svcxdr_init_decode(rqstp); p = xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 2); if (unlikely(!p)) goto out_drop; rqstp->rq_xid = *p++; if (unlikely(*p != rpc_call)) goto out_baddir; if (!svc_process_common(rqstp)) goto out_drop; svc_send(rqstp); return; out_baddir: svc_printk(rqstp, "bad direction 0x%08x, dropping request\n", be32_to_cpu(*p)); if (rqstp->rq_server->sv_stats) rqstp->rq_server->sv_stats->rpcbadfmt++; out_drop: svc_drop(rqstp); } #if defined(CONFIG_SUNRPC_BACKCHANNEL) /** * svc_process_bc - process a reverse-direction RPC request * @req: RPC request to be used for client-side processing * @rqstp: server-side execution context * */ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp) { struct rpc_timeout timeout = { .to_increment = 0, }; struct rpc_task *task; int proc_error; /* Build the svc_rqst used by the common processing routine */ rqstp->rq_xid = req->rq_xid; rqstp->rq_prot = req->rq_xprt->prot; rqstp->rq_bc_net = req->rq_xprt->xprt_net; rqstp->rq_addrlen = sizeof(req->rq_xprt->addr); memcpy(&rqstp->rq_addr, &req->rq_xprt->addr, rqstp->rq_addrlen); memcpy(&rqstp->rq_arg, &req->rq_rcv_buf, sizeof(rqstp->rq_arg)); memcpy(&rqstp->rq_res, &req->rq_snd_buf, sizeof(rqstp->rq_res)); /* Adjust the argument buffer length */ rqstp->rq_arg.len = req->rq_private_buf.len; if (rqstp->rq_arg.len <= rqstp->rq_arg.head[0].iov_len) { rqstp->rq_arg.head[0].iov_len = rqstp->rq_arg.len; rqstp->rq_arg.page_len = 0; } else if (rqstp->rq_arg.len <= rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len) rqstp->rq_arg.page_len = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len; else rqstp->rq_arg.len = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len; /* Reset the response buffer */ rqstp->rq_res.head[0].iov_len = 0; /* * Skip the XID and calldir fields because they've already * been processed by the caller. */ svcxdr_init_decode(rqstp); if (!xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 2)) return; /* Parse and execute the bc call */ proc_error = svc_process_common(rqstp); atomic_dec(&req->rq_xprt->bc_slot_count); if (!proc_error) { /* Processing error: drop the request */ xprt_free_bc_request(req); return; } /* Finally, send the reply synchronously */ if (rqstp->bc_to_initval > 0) { timeout.to_initval = rqstp->bc_to_initval; timeout.to_retries = rqstp->bc_to_retries; } else { timeout.to_initval = req->rq_xprt->timeout->to_initval; timeout.to_retries = req->rq_xprt->timeout->to_retries; } timeout.to_maxval = timeout.to_initval; memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf)); task = rpc_run_bc_task(req, &timeout); if (IS_ERR(task)) return; WARN_ON_ONCE(atomic_read(&task->tk_count) != 1); rpc_put_task(task); } #endif /* CONFIG_SUNRPC_BACKCHANNEL */ /** * svc_max_payload - Return transport-specific limit on the RPC payload * @rqstp: RPC transaction context * * Returns the maximum number of payload bytes the current transport * allows. */ u32 svc_max_payload(const struct svc_rqst *rqstp) { u32 max = rqstp->rq_xprt->xpt_class->xcl_max_payload; if (rqstp->rq_server->sv_max_payload < max) max = rqstp->rq_server->sv_max_payload; return max; } EXPORT_SYMBOL_GPL(svc_max_payload); /** * svc_proc_name - Return RPC procedure name in string form * @rqstp: svc_rqst to operate on * * Return value: * Pointer to a NUL-terminated string */ const char *svc_proc_name(const struct svc_rqst *rqstp) { if (rqstp && rqstp->rq_procinfo) return rqstp->rq_procinfo->pc_name; return "unknown"; } /** * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in rqstp->rq_res * @length: size of payload, in bytes * * Returns zero on success, or a negative errno if a permanent * error occurred. */ int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { return rqstp->rq_xprt->xpt_ops->xpo_result_payload(rqstp, offset, length); } EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on * @payload: xdr_buf containing only the write data payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct xdr_buf *payload) { struct page **pages = payload->pages; struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; size_t total = payload->len; unsigned int i; /* Some types of transport can present the write payload * entirely in rq_arg.pages. In this case, @first is empty. */ i = 0; if (first->iov_len) { vec[i].iov_base = first->iov_base; vec[i].iov_len = min_t(size_t, total, first->iov_len); total -= vec[i].iov_len; ++i; } while (total) { vec[i].iov_base = page_address(*pages); vec[i].iov_len = min_t(size_t, total, PAGE_SIZE); total -= vec[i].iov_len; ++i; ++pages; } WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec)); return i; } EXPORT_SYMBOL_GPL(svc_fill_write_vector); /** * svc_fill_symlink_pathname - Construct pathname argument for VFS symlink call * @rqstp: svc_rqst to operate on * @first: buffer containing first section of pathname * @p: buffer containing remaining section of pathname * @total: total length of the pathname argument * * The VFS symlink API demands a NUL-terminated pathname in mapped memory. * Returns pointer to a NUL-terminated string, or an ERR_PTR. Caller must free * the returned string. */ char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total) { size_t len, remaining; char *result, *dst; result = kmalloc(total + 1, GFP_KERNEL); if (!result) return ERR_PTR(-ESERVERFAULT); dst = result; remaining = total; len = min_t(size_t, total, first->iov_len); if (len) { memcpy(dst, first->iov_base, len); dst += len; remaining -= len; } if (remaining) { len = min_t(size_t, remaining, PAGE_SIZE); memcpy(dst, p, len); dst += len; } *dst = '\0'; /* Sanity check: Linux doesn't allow the pathname argument to * contain a NUL byte. */ if (strlen(result) != total) { kfree(result); return ERR_PTR(-EINVAL); } return result; } EXPORT_SYMBOL_GPL(svc_fill_symlink_pathname); |
2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef LINUX_MLD_H #define LINUX_MLD_H #include <linux/in6.h> #include <linux/icmpv6.h> /* MLDv1 Query/Report/Done */ struct mld_msg { struct icmp6hdr mld_hdr; struct in6_addr mld_mca; }; #define mld_type mld_hdr.icmp6_type #define mld_code mld_hdr.icmp6_code #define mld_cksum mld_hdr.icmp6_cksum #define mld_maxdelay mld_hdr.icmp6_maxdelay #define mld_reserved mld_hdr.icmp6_dataun.un_data16[1] /* Multicast Listener Discovery version 2 headers */ /* MLDv2 Report */ struct mld2_grec { __u8 grec_type; __u8 grec_auxwords; __be16 grec_nsrcs; struct in6_addr grec_mca; struct in6_addr grec_src[]; }; struct mld2_report { struct icmp6hdr mld2r_hdr; struct mld2_grec mld2r_grec[]; }; #define mld2r_type mld2r_hdr.icmp6_type #define mld2r_resv1 mld2r_hdr.icmp6_code #define mld2r_cksum mld2r_hdr.icmp6_cksum #define mld2r_resv2 mld2r_hdr.icmp6_dataun.un_data16[0] #define mld2r_ngrec mld2r_hdr.icmp6_dataun.un_data16[1] /* MLDv2 Query */ struct mld2_query { struct icmp6hdr mld2q_hdr; struct in6_addr mld2q_mca; #if defined(__LITTLE_ENDIAN_BITFIELD) __u8 mld2q_qrv:3, mld2q_suppress:1, mld2q_resv2:4; #elif defined(__BIG_ENDIAN_BITFIELD) __u8 mld2q_resv2:4, mld2q_suppress:1, mld2q_qrv:3; #else #error "Please fix <asm/byteorder.h>" #endif __u8 mld2q_qqic; __be16 mld2q_nsrcs; struct in6_addr mld2q_srcs[]; }; #define mld2q_type mld2q_hdr.icmp6_type #define mld2q_code mld2q_hdr.icmp6_code #define mld2q_cksum mld2q_hdr.icmp6_cksum #define mld2q_mrc mld2q_hdr.icmp6_maxdelay #define mld2q_resv1 mld2q_hdr.icmp6_dataun.un_data16[1] /* RFC3810, 5.1.3. Maximum Response Code: * * If Maximum Response Code >= 32768, Maximum Response Code represents a * floating-point value as follows: * * 0 1 2 3 4 5 6 7 8 9 A B C D E F * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |1| exp | mant | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ #define MLDV2_MRC_EXP(value) (((value) >> 12) & 0x0007) #define MLDV2_MRC_MAN(value) ((value) & 0x0fff) /* RFC3810, 5.1.9. QQIC (Querier's Query Interval Code): * * If QQIC >= 128, QQIC represents a floating-point value as follows: * * 0 1 2 3 4 5 6 7 * +-+-+-+-+-+-+-+-+ * |1| exp | mant | * +-+-+-+-+-+-+-+-+ */ #define MLDV2_QQIC_EXP(value) (((value) >> 4) & 0x07) #define MLDV2_QQIC_MAN(value) ((value) & 0x0f) #define MLD_EXP_MIN_LIMIT 32768UL #define MLDV1_MRD_MAX_COMPAT (MLD_EXP_MIN_LIMIT - 1) #define MLD_MAX_QUEUE 8 #define MLD_MAX_SKBS 32 static inline unsigned long mldv2_mrc(const struct mld2_query *mlh2) { /* RFC3810, 5.1.3. Maximum Response Code */ unsigned long ret, mc_mrc = ntohs(mlh2->mld2q_mrc); if (mc_mrc < MLD_EXP_MIN_LIMIT) { ret = mc_mrc; } else { unsigned long mc_man, mc_exp; mc_exp = MLDV2_MRC_EXP(mc_mrc); mc_man = MLDV2_MRC_MAN(mc_mrc); ret = (mc_man | 0x1000) << (mc_exp + 3); } return ret; } #endif |
55 2 55 9 25 42 127 55 31 33 31 1 41 6 25 397 362 362 267 361 281 283 67 265 36 76 317 236 75 141 55 66 136 48 48 50 25 50 50 50 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/fs/fat/misc.c * * Written 1992,1993 by Werner Almesberger * 22/11/2000 - Fixed fat_date_unix2dos for dates earlier than 01/01/1980 * and date_dos2unix for date==0 by Igor Zhbanov(bsg@uniyar.ac.ru) */ #include "fat.h" #include <linux/iversion.h> /* * fat_fs_error reports a file system problem that might indicate fa data * corruption/inconsistency. Depending on 'errors' mount option the * panic() is called, or error message is printed FAT and nothing is done, * or filesystem is remounted read-only (default behavior). * In case the file system is remounted read-only, it can be made writable * again by remounting it. */ void __fat_fs_error(struct super_block *sb, int report, const char *fmt, ...) { struct fat_mount_options *opts = &MSDOS_SB(sb)->options; va_list args; struct va_format vaf; if (report) { va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; fat_msg(sb, KERN_ERR, "error, %pV", &vaf); va_end(args); } if (opts->errors == FAT_ERRORS_PANIC) panic("FAT-fs (%s): fs panic from previous error\n", sb->s_id); else if (opts->errors == FAT_ERRORS_RO && !sb_rdonly(sb)) { sb->s_flags |= SB_RDONLY; fat_msg(sb, KERN_ERR, "Filesystem has been set read-only"); } } EXPORT_SYMBOL_GPL(__fat_fs_error); /** * _fat_msg() - Print a preformatted FAT message based on a superblock. * @sb: A pointer to a &struct super_block * @level: A Kernel printk level constant * @fmt: The printf-style format string to print. * * Everything that is not fat_fs_error() should be fat_msg(). * * fat_msg() wraps _fat_msg() for printk indexing. */ void _fat_msg(struct super_block *sb, const char *level, const char *fmt, ...) { struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; _printk(FAT_PRINTK_PREFIX "%pV\n", level, sb->s_id, &vaf); va_end(args); } /* Flushes the number of free clusters on FAT32 */ /* XXX: Need to write one per FSINFO block. Currently only writes 1 */ int fat_clusters_flush(struct super_block *sb) { struct msdos_sb_info *sbi = MSDOS_SB(sb); struct buffer_head *bh; struct fat_boot_fsinfo *fsinfo; if (!is_fat32(sbi)) return 0; bh = sb_bread(sb, sbi->fsinfo_sector); if (bh == NULL) { fat_msg(sb, KERN_ERR, "bread failed in fat_clusters_flush"); return -EIO; } fsinfo = (struct fat_boot_fsinfo *)bh->b_data; /* Sanity check */ if (!IS_FSINFO(fsinfo)) { fat_msg(sb, KERN_ERR, "Invalid FSINFO signature: " "0x%08x, 0x%08x (sector = %lu)", le32_to_cpu(fsinfo->signature1), le32_to_cpu(fsinfo->signature2), sbi->fsinfo_sector); } else { if (sbi->free_clusters != -1) fsinfo->free_clusters = cpu_to_le32(sbi->free_clusters); if (sbi->prev_free != -1) fsinfo->next_cluster = cpu_to_le32(sbi->prev_free); mark_buffer_dirty(bh); } brelse(bh); return 0; } /* * fat_chain_add() adds a new cluster to the chain of clusters represented * by inode. */ int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster) { struct super_block *sb = inode->i_sb; struct msdos_sb_info *sbi = MSDOS_SB(sb); int ret, new_fclus, last; /* * We must locate the last cluster of the file to add this new * one (new_dclus) to the end of the link list (the FAT). */ last = new_fclus = 0; if (MSDOS_I(inode)->i_start) { int fclus, dclus; ret = fat_get_cluster(inode, FAT_ENT_EOF, &fclus, &dclus); if (ret < 0) return ret; new_fclus = fclus + 1; last = dclus; } /* add new one to the last of the cluster chain */ if (last) { struct fat_entry fatent; fatent_init(&fatent); ret = fat_ent_read(inode, &fatent, last); if (ret >= 0) { int wait = inode_needs_sync(inode); ret = fat_ent_write(inode, &fatent, new_dclus, wait); fatent_brelse(&fatent); } if (ret < 0) return ret; /* * FIXME:Although we can add this cache, fat_cache_add() is * assuming to be called after linear search with fat_cache_id. */ // fat_cache_add(inode, new_fclus, new_dclus); } else { MSDOS_I(inode)->i_start = new_dclus; MSDOS_I(inode)->i_logstart = new_dclus; /* * Since generic_write_sync() synchronizes regular files later, * we sync here only directories. */ if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) { ret = fat_sync_inode(inode); if (ret) return ret; } else mark_inode_dirty(inode); } if (new_fclus != (inode->i_blocks >> (sbi->cluster_bits - 9))) { fat_fs_error(sb, "clusters badly computed (%d != %llu)", new_fclus, (llu)(inode->i_blocks >> (sbi->cluster_bits - 9))); fat_cache_inval_inode(inode); } inode->i_blocks += nr_cluster << (sbi->cluster_bits - 9); return 0; } /* * The epoch of FAT timestamp is 1980. * : bits : value * date: 0 - 4: day (1 - 31) * date: 5 - 8: month (1 - 12) * date: 9 - 15: year (0 - 127) from 1980 * time: 0 - 4: sec (0 - 29) 2sec counts * time: 5 - 10: min (0 - 59) * time: 11 - 15: hour (0 - 23) */ #define SECS_PER_MIN 60 #define SECS_PER_HOUR (60 * 60) #define SECS_PER_DAY (SECS_PER_HOUR * 24) /* days between 1.1.70 and 1.1.80 (2 leap days) */ #define DAYS_DELTA (365 * 10 + 2) /* 120 (2100 - 1980) isn't leap year */ #define YEAR_2100 120 #define IS_LEAP_YEAR(y) (!((y) & 3) && (y) != YEAR_2100) /* Linear day numbers of the respective 1sts in non-leap years. */ static long days_in_year[] = { /* Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec */ 0, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 0, 0, 0, }; static inline int fat_tz_offset(const struct msdos_sb_info *sbi) { return (sbi->options.tz_set ? -sbi->options.time_offset : sys_tz.tz_minuteswest) * SECS_PER_MIN; } /* Convert a FAT time/date pair to a UNIX date (seconds since 1 1 70). */ void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 __time, __le16 __date, u8 time_cs) { u16 time = le16_to_cpu(__time), date = le16_to_cpu(__date); time64_t second; long day, leap_day, month, year; year = date >> 9; month = max(1, (date >> 5) & 0xf); day = max(1, date & 0x1f) - 1; leap_day = (year + 3) / 4; if (year > YEAR_2100) /* 2100 isn't leap year */ leap_day--; if (IS_LEAP_YEAR(year) && month > 2) leap_day++; second = (time & 0x1f) << 1; second += ((time >> 5) & 0x3f) * SECS_PER_MIN; second += (time >> 11) * SECS_PER_HOUR; second += (time64_t)(year * 365 + leap_day + days_in_year[month] + day + DAYS_DELTA) * SECS_PER_DAY; second += fat_tz_offset(sbi); if (time_cs) { ts->tv_sec = second + (time_cs / 100); ts->tv_nsec = (time_cs % 100) * 10000000; } else { ts->tv_sec = second; ts->tv_nsec = 0; } } /* Export fat_time_fat2unix() for the fat_test KUnit tests. */ EXPORT_SYMBOL_GPL(fat_time_fat2unix); /* Convert linear UNIX date to a FAT time/date pair. */ void fat_time_unix2fat(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 *time, __le16 *date, u8 *time_cs) { struct tm tm; time64_to_tm(ts->tv_sec, -fat_tz_offset(sbi), &tm); /* FAT can only support year between 1980 to 2107 */ if (tm.tm_year < 1980 - 1900) { *time = 0; *date = cpu_to_le16((0 << 9) | (1 << 5) | 1); if (time_cs) *time_cs = 0; return; } if (tm.tm_year > 2107 - 1900) { *time = cpu_to_le16((23 << 11) | (59 << 5) | 29); *date = cpu_to_le16((127 << 9) | (12 << 5) | 31); if (time_cs) *time_cs = 199; return; } /* from 1900 -> from 1980 */ tm.tm_year -= 80; /* 0~11 -> 1~12 */ tm.tm_mon++; /* 0~59 -> 0~29(2sec counts) */ tm.tm_sec >>= 1; *time = cpu_to_le16(tm.tm_hour << 11 | tm.tm_min << 5 | tm.tm_sec); *date = cpu_to_le16(tm.tm_year << 9 | tm.tm_mon << 5 | tm.tm_mday); if (time_cs) *time_cs = (ts->tv_sec & 1) * 100 + ts->tv_nsec / 10000000; } EXPORT_SYMBOL_GPL(fat_time_unix2fat); static inline struct timespec64 fat_timespec64_trunc_2secs(struct timespec64 ts) { return (struct timespec64){ ts.tv_sec & ~1ULL, 0 }; } /* * truncate atime to 24 hour granularity (00:00:00 in local timezone) */ struct timespec64 fat_truncate_atime(const struct msdos_sb_info *sbi, const struct timespec64 *ts) { /* to localtime */ time64_t seconds = ts->tv_sec - fat_tz_offset(sbi); s32 remainder; div_s64_rem(seconds, SECS_PER_DAY, &remainder); /* to day boundary, and back to unix time */ seconds = seconds + fat_tz_offset(sbi) - remainder; return (struct timespec64){ seconds, 0 }; } /* * truncate mtime to 2 second granularity */ struct timespec64 fat_truncate_mtime(const struct msdos_sb_info *sbi, const struct timespec64 *ts) { return fat_timespec64_trunc_2secs(*ts); } /* * truncate the various times with appropriate granularity: * all times in root node are always 0 */ int fat_truncate_time(struct inode *inode, struct timespec64 *now, int flags) { struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb); struct timespec64 ts; if (inode->i_ino == MSDOS_ROOT_INO) return 0; if (now == NULL) { now = &ts; ts = current_time(inode); } if (flags & S_ATIME) inode_set_atime_to_ts(inode, fat_truncate_atime(sbi, now)); /* * ctime and mtime share the same on-disk field, and should be * identical in memory. all mtime updates will be applied to ctime, * but ctime updates are ignored. */ if (flags & S_MTIME) inode_set_mtime_to_ts(inode, inode_set_ctime_to_ts(inode, fat_truncate_mtime(sbi, now))); return 0; } EXPORT_SYMBOL_GPL(fat_truncate_time); int fat_update_time(struct inode *inode, int flags) { int dirty_flags = 0; if (inode->i_ino == MSDOS_ROOT_INO) return 0; if (flags & (S_ATIME | S_CTIME | S_MTIME)) { fat_truncate_time(inode, NULL, flags); if (inode->i_sb->s_flags & SB_LAZYTIME) dirty_flags |= I_DIRTY_TIME; else dirty_flags |= I_DIRTY_SYNC; } __mark_inode_dirty(inode, dirty_flags); return 0; } EXPORT_SYMBOL_GPL(fat_update_time); int fat_sync_bhs(struct buffer_head **bhs, int nr_bhs) { int i, err = 0; for (i = 0; i < nr_bhs; i++) write_dirty_buffer(bhs[i], 0); for (i = 0; i < nr_bhs; i++) { wait_on_buffer(bhs[i]); if (!err && !buffer_uptodate(bhs[i])) err = -EIO; } return err; } |
30 40 39 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | // SPDX-License-Identifier: GPL-2.0-only /* * Scalar fixed time AES core transform * * Copyright (C) 2017 Linaro Ltd <ard.biesheuvel@linaro.org> */ #include <crypto/aes.h> #include <crypto/algapi.h> #include <linux/module.h> static int aesti_set_key(struct crypto_tfm *tfm, const u8 *in_key, unsigned int key_len) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); return aes_expandkey(ctx, in_key, key_len); } static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); unsigned long flags; /* * Temporarily disable interrupts to avoid races where cachelines are * evicted when the CPU is interrupted to do something else. */ local_irq_save(flags); aes_encrypt(ctx, out, in); local_irq_restore(flags); } static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); unsigned long flags; /* * Temporarily disable interrupts to avoid races where cachelines are * evicted when the CPU is interrupted to do something else. */ local_irq_save(flags); aes_decrypt(ctx, out, in); local_irq_restore(flags); } static struct crypto_alg aes_alg = { .cra_name = "aes", .cra_driver_name = "aes-fixed-time", .cra_priority = 100 + 1, .cra_flags = CRYPTO_ALG_TYPE_CIPHER, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct crypto_aes_ctx), .cra_module = THIS_MODULE, .cra_cipher.cia_min_keysize = AES_MIN_KEY_SIZE, .cra_cipher.cia_max_keysize = AES_MAX_KEY_SIZE, .cra_cipher.cia_setkey = aesti_set_key, .cra_cipher.cia_encrypt = aesti_encrypt, .cra_cipher.cia_decrypt = aesti_decrypt }; static int __init aes_init(void) { return crypto_register_alg(&aes_alg); } static void __exit aes_fini(void) { crypto_unregister_alg(&aes_alg); } module_init(aes_init); module_exit(aes_fini); MODULE_DESCRIPTION("Generic fixed time AES"); MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>"); MODULE_LICENSE("GPL v2"); |
8 8 8 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved. * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved. */ #include <linux/vmalloc.h> #include "rxe.h" #include "rxe_loc.h" #include "rxe_queue.h" int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe, int comp_vector) { int count; if (cqe <= 0) { rxe_dbg_dev(rxe, "cqe(%d) <= 0\n", cqe); goto err1; } if (cqe > rxe->attr.max_cqe) { rxe_dbg_dev(rxe, "cqe(%d) > max_cqe(%d)\n", cqe, rxe->attr.max_cqe); goto err1; } if (cq) { count = queue_count(cq->queue, QUEUE_TYPE_TO_CLIENT); if (cqe < count) { rxe_dbg_cq(cq, "cqe(%d) < current # elements in queue (%d)\n", cqe, count); goto err1; } } return 0; err1: return -EINVAL; } int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe, int comp_vector, struct ib_udata *udata, struct rxe_create_cq_resp __user *uresp) { int err; enum queue_type type; type = QUEUE_TYPE_TO_CLIENT; cq->queue = rxe_queue_init(rxe, &cqe, sizeof(struct rxe_cqe), type); if (!cq->queue) { rxe_dbg_dev(rxe, "unable to create cq\n"); return -ENOMEM; } err = do_mmap_info(rxe, uresp ? &uresp->mi : NULL, udata, cq->queue->buf, cq->queue->buf_size, &cq->queue->ip); if (err) { vfree(cq->queue->buf); kfree(cq->queue); return err; } cq->is_user = uresp; spin_lock_init(&cq->cq_lock); cq->ibcq.cqe = cqe; return 0; } int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe, struct rxe_resize_cq_resp __user *uresp, struct ib_udata *udata) { int err; err = rxe_queue_resize(cq->queue, (unsigned int *)&cqe, sizeof(struct rxe_cqe), udata, uresp ? &uresp->mi : NULL, NULL, &cq->cq_lock); if (!err) cq->ibcq.cqe = cqe; return err; } /* caller holds reference to cq */ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited) { struct ib_event ev; int full; void *addr; unsigned long flags; spin_lock_irqsave(&cq->cq_lock, flags); full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT); if (unlikely(full)) { rxe_err_cq(cq, "queue full\n"); spin_unlock_irqrestore(&cq->cq_lock, flags); if (cq->ibcq.event_handler) { ev.device = cq->ibcq.device; ev.element.cq = &cq->ibcq; ev.event = IB_EVENT_CQ_ERR; cq->ibcq.event_handler(&ev, cq->ibcq.cq_context); } return -EBUSY; } addr = queue_producer_addr(cq->queue, QUEUE_TYPE_TO_CLIENT); memcpy(addr, cqe, sizeof(*cqe)); queue_advance_producer(cq->queue, QUEUE_TYPE_TO_CLIENT); if ((cq->notify & IB_CQ_NEXT_COMP) || (cq->notify & IB_CQ_SOLICITED && solicited)) { cq->notify = 0; cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context); } spin_unlock_irqrestore(&cq->cq_lock, flags); return 0; } void rxe_cq_cleanup(struct rxe_pool_elem *elem) { struct rxe_cq *cq = container_of(elem, typeof(*cq), elem); if (cq->queue) rxe_queue_cleanup(cq->queue); } |
26 22 4 49 49 13 13 13 8 5 13 13 13 33 33 33 30 30 13 14 9 3 11 33 33 33 33 1 2 60 58 1 4 56 35 24 1 34 17 18 17 17 16 22 2 21 13 11 6 4 4 93 3 135 3 55 5 2 3 3 436 436 76 251 2 62 318 120 47 254 1 302 287 7 5 292 14 14 258 4 1 3 4 1 1 43 43 3 41 25 19 1 24 24 1 1 4 1 5 6 18 17 17 17 17 2 16 9 9 4 1 4 433 54 381 382 74 248 85 58 289 92 288 288 288 288 1 1 1 22 22 11 49 49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPVS An implementation of the IP virtual server support for the * LINUX operating system. IPVS is now implemented as a module * over the Netfilter framework. IPVS can be used to build a * high-performance and highly available server based on a * cluster of servers. * * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> * Peter Kese <peter.kese@ijs.si> * Julian Anastasov <ja@ssi.bg> * * The IPVS code for kernel 2.2 was done by Wensong Zhang and Peter Kese, * with changes/fixes from Julian Anastasov, Lars Marowsky-Bree, Horms * and others. * * Changes: * Paul `Rusty' Russell properly handle non-linear skbs * Harald Welte don't use nfcache */ #define KMSG_COMPONENT "IPVS" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/module.h> #include <linux/kernel.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/sctp.h> #include <linux/icmp.h> #include <linux/slab.h> #include <net/ip.h> #include <net/tcp.h> #include <net/udp.h> #include <net/icmp.h> /* for icmp_send */ #include <net/gue.h> #include <net/gre.h> #include <net/route.h> #include <net/ip6_checksum.h> #include <net/netns/generic.h> /* net_generic() */ #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #ifdef CONFIG_IP_VS_IPV6 #include <net/ipv6.h> #include <linux/netfilter_ipv6.h> #include <net/ip6_route.h> #endif #include <net/ip_vs.h> #include <linux/indirect_call_wrapper.h> EXPORT_SYMBOL(register_ip_vs_scheduler); EXPORT_SYMBOL(unregister_ip_vs_scheduler); EXPORT_SYMBOL(ip_vs_proto_name); EXPORT_SYMBOL(ip_vs_conn_new); EXPORT_SYMBOL(ip_vs_conn_in_get); EXPORT_SYMBOL(ip_vs_conn_out_get); #ifdef CONFIG_IP_VS_PROTO_TCP EXPORT_SYMBOL(ip_vs_tcp_conn_listen); #endif EXPORT_SYMBOL(ip_vs_conn_put); #ifdef CONFIG_IP_VS_DEBUG EXPORT_SYMBOL(ip_vs_get_debug_level); #endif EXPORT_SYMBOL(ip_vs_new_conn_out); #if defined(CONFIG_IP_VS_PROTO_TCP) && defined(CONFIG_IP_VS_PROTO_UDP) #define SNAT_CALL(f, ...) \ INDIRECT_CALL_2(f, tcp_snat_handler, udp_snat_handler, __VA_ARGS__) #elif defined(CONFIG_IP_VS_PROTO_TCP) #define SNAT_CALL(f, ...) INDIRECT_CALL_1(f, tcp_snat_handler, __VA_ARGS__) #elif defined(CONFIG_IP_VS_PROTO_UDP) #define SNAT_CALL(f, ...) INDIRECT_CALL_1(f, udp_snat_handler, __VA_ARGS__) #else #define SNAT_CALL(f, ...) f(__VA_ARGS__) #endif static unsigned int ip_vs_net_id __read_mostly; /* netns cnt used for uniqueness */ static atomic_t ipvs_netns_cnt = ATOMIC_INIT(0); /* ID used in ICMP lookups */ #define icmp_id(icmph) (((icmph)->un).echo.id) #define icmpv6_id(icmph) (icmph->icmp6_dataun.u_echo.identifier) const char *ip_vs_proto_name(unsigned int proto) { static char buf[20]; switch (proto) { case IPPROTO_IP: return "IP"; case IPPROTO_UDP: return "UDP"; case IPPROTO_TCP: return "TCP"; case IPPROTO_SCTP: return "SCTP"; case IPPROTO_ICMP: return "ICMP"; #ifdef CONFIG_IP_VS_IPV6 case IPPROTO_ICMPV6: return "ICMPv6"; #endif default: sprintf(buf, "IP_%u", proto); return buf; } } void ip_vs_init_hash_table(struct list_head *table, int rows) { while (--rows >= 0) INIT_LIST_HEAD(&table[rows]); } static inline void ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb) { struct ip_vs_dest *dest = cp->dest; struct netns_ipvs *ipvs = cp->ipvs; if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; struct ip_vs_service *svc; local_bh_disable(); s = this_cpu_ptr(dest->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.inpkts); u64_stats_add(&s->cnt.inbytes, skb->len); u64_stats_update_end(&s->syncp); svc = rcu_dereference(dest->svc); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.inpkts); u64_stats_add(&s->cnt.inbytes, skb->len); u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats->s.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.inpkts); u64_stats_add(&s->cnt.inbytes, skb->len); u64_stats_update_end(&s->syncp); local_bh_enable(); } } static inline void ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb) { struct ip_vs_dest *dest = cp->dest; struct netns_ipvs *ipvs = cp->ipvs; if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; struct ip_vs_service *svc; local_bh_disable(); s = this_cpu_ptr(dest->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.outpkts); u64_stats_add(&s->cnt.outbytes, skb->len); u64_stats_update_end(&s->syncp); svc = rcu_dereference(dest->svc); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.outpkts); u64_stats_add(&s->cnt.outbytes, skb->len); u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats->s.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.outpkts); u64_stats_add(&s->cnt.outbytes, skb->len); u64_stats_update_end(&s->syncp); local_bh_enable(); } } static inline void ip_vs_conn_stats(struct ip_vs_conn *cp, struct ip_vs_service *svc) { struct netns_ipvs *ipvs = svc->ipvs; struct ip_vs_cpu_stats *s; local_bh_disable(); s = this_cpu_ptr(cp->dest->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.conns); u64_stats_update_end(&s->syncp); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.conns); u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats->s.cpustats); u64_stats_update_begin(&s->syncp); u64_stats_inc(&s->cnt.conns); u64_stats_update_end(&s->syncp); local_bh_enable(); } static inline void ip_vs_set_state(struct ip_vs_conn *cp, int direction, const struct sk_buff *skb, struct ip_vs_proto_data *pd) { if (likely(pd->pp->state_transition)) pd->pp->state_transition(cp, direction, skb, pd); } static inline int ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc, struct sk_buff *skb, int protocol, const union nf_inet_addr *caddr, __be16 cport, const union nf_inet_addr *vaddr, __be16 vport, struct ip_vs_conn_param *p) { ip_vs_conn_fill_param(svc->ipvs, svc->af, protocol, caddr, cport, vaddr, vport, p); p->pe = rcu_dereference(svc->pe); if (p->pe && p->pe->fill_param) return p->pe->fill_param(p, skb); return 0; } /* * IPVS persistent scheduling function * It creates a connection entry according to its template if exists, * or selects a server and creates a connection entry plus a template. * Locking: we are svc user (svc->refcnt), so we hold all dests too * Protocols supported: TCP, UDP */ static struct ip_vs_conn * ip_vs_sched_persist(struct ip_vs_service *svc, struct sk_buff *skb, __be16 src_port, __be16 dst_port, int *ignored, struct ip_vs_iphdr *iph) { struct ip_vs_conn *cp = NULL; struct ip_vs_dest *dest; struct ip_vs_conn *ct; __be16 dport = 0; /* destination port to forward */ unsigned int flags; struct ip_vs_conn_param param; const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) }; union nf_inet_addr snet; /* source network of the client, after masking */ const union nf_inet_addr *src_addr, *dst_addr; if (likely(!ip_vs_iph_inverse(iph))) { src_addr = &iph->saddr; dst_addr = &iph->daddr; } else { src_addr = &iph->daddr; dst_addr = &iph->saddr; } /* Mask saddr with the netmask to adjust template granularity */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) ipv6_addr_prefix(&snet.in6, &src_addr->in6, (__force __u32) svc->netmask); else #endif snet.ip = src_addr->ip & svc->netmask; IP_VS_DBG_BUF(6, "p-schedule: src %s:%u dest %s:%u " "mnet %s\n", IP_VS_DBG_ADDR(svc->af, src_addr), ntohs(src_port), IP_VS_DBG_ADDR(svc->af, dst_addr), ntohs(dst_port), IP_VS_DBG_ADDR(svc->af, &snet)); /* * As far as we know, FTP is a very complicated network protocol, and * it uses control connection and data connections. For active FTP, * FTP server initialize data connection to the client, its source port * is often 20. For passive FTP, FTP server tells the clients the port * that it passively listens to, and the client issues the data * connection. In the tunneling or direct routing mode, the load * balancer is on the client-to-server half of connection, the port * number is unknown to the load balancer. So, a conn template like * <caddr, 0, vaddr, 0, daddr, 0> is created for persistent FTP * service, and a template like <caddr, 0, vaddr, vport, daddr, dport> * is created for other persistent services. */ { int protocol = iph->protocol; const union nf_inet_addr *vaddr = dst_addr; __be16 vport = 0; if (dst_port == svc->port) { /* non-FTP template: * <protocol, caddr, 0, vaddr, vport, daddr, dport> * FTP template: * <protocol, caddr, 0, vaddr, 0, daddr, 0> */ if (svc->port != FTPPORT) vport = dst_port; } else { /* Note: persistent fwmark-based services and * persistent port zero service are handled here. * fwmark template: * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0> * port zero template: * <protocol,caddr,0,vaddr,0,daddr,0> */ if (svc->fwmark) { protocol = IPPROTO_IP; vaddr = &fwmark; } } /* return *ignored = -1 so NF_DROP can be used */ if (ip_vs_conn_fill_param_persist(svc, skb, protocol, &snet, 0, vaddr, vport, ¶m) < 0) { *ignored = -1; return NULL; } } /* Check if a template already exists */ ct = ip_vs_ct_in_get(¶m); if (!ct || !ip_vs_check_template(ct, NULL)) { struct ip_vs_scheduler *sched; /* * No template found or the dest of the connection * template is not available. * return *ignored=0 i.e. ICMP and NF_DROP */ sched = rcu_dereference(svc->scheduler); if (sched) { /* read svc->sched_data after svc->scheduler */ smp_rmb(); dest = sched->schedule(svc, skb, iph); } else { dest = NULL; } if (!dest) { IP_VS_DBG(1, "p-schedule: no dest found.\n"); kfree(param.pe_data); *ignored = 0; return NULL; } if (dst_port == svc->port && svc->port != FTPPORT) dport = dest->port; /* Create a template * This adds param.pe_data to the template, * and thus param.pe_data will be destroyed * when the template expires */ ct = ip_vs_conn_new(¶m, dest->af, &dest->addr, dport, IP_VS_CONN_F_TEMPLATE, dest, skb->mark); if (ct == NULL) { kfree(param.pe_data); *ignored = -1; return NULL; } ct->timeout = svc->timeout; } else { /* set destination with the found template */ dest = ct->dest; kfree(param.pe_data); } dport = dst_port; if (dport == svc->port && dest->port) dport = dest->port; flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* * Create a new connection according to the template */ ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, src_addr, src_port, dst_addr, dst_port, ¶m); cp = ip_vs_conn_new(¶m, dest->af, &dest->addr, dport, flags, dest, skb->mark); if (cp == NULL) { ip_vs_conn_put(ct); *ignored = -1; return NULL; } /* * Add its control */ ip_vs_control_add(cp, ct); ip_vs_conn_put(ct); ip_vs_conn_stats(cp, svc); return cp; } /* * IPVS main scheduling function * It selects a server according to the virtual service, and * creates a connection entry. * Protocols supported: TCP, UDP * * Usage of *ignored * * 1 : protocol tried to schedule (eg. on SYN), found svc but the * svc/scheduler decides that this packet should be accepted with * NF_ACCEPT because it must not be scheduled. * * 0 : scheduler can not find destination, so try bypass or * return ICMP and then NF_DROP (ip_vs_leave). * * -1 : scheduler tried to schedule but fatal error occurred, eg. * ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param * failure such as missing Call-ID, ENOMEM on skb_linearize * or pe_data. In this case we should return NF_DROP without * any attempts to send ICMP with ip_vs_leave. */ struct ip_vs_conn * ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *ignored, struct ip_vs_iphdr *iph) { struct ip_vs_protocol *pp = pd->pp; struct ip_vs_conn *cp = NULL; struct ip_vs_scheduler *sched; struct ip_vs_dest *dest; __be16 _ports[2], *pptr, cport, vport; const void *caddr, *vaddr; unsigned int flags; *ignored = 1; /* * IPv6 frags, only the first hit here. */ pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (pptr == NULL) return NULL; if (likely(!ip_vs_iph_inverse(iph))) { cport = pptr[0]; caddr = &iph->saddr; vport = pptr[1]; vaddr = &iph->daddr; } else { cport = pptr[1]; caddr = &iph->daddr; vport = pptr[0]; vaddr = &iph->saddr; } /* * FTPDATA needs this check when using local real server. * Never schedule Active FTPDATA connections from real server. * For LVS-NAT they must be already created. For other methods * with persistence the connection is created on SYN+ACK. */ if (cport == FTPDATA) { IP_VS_DBG_PKT(12, svc->af, pp, skb, iph->off, "Not scheduling FTPDATA"); return NULL; } /* * Do not schedule replies from local real server. */ if ((!skb->dev || skb->dev->flags & IFF_LOOPBACK)) { iph->hdr_flags ^= IP_VS_HDR_INVERSE; cp = INDIRECT_CALL_1(pp->conn_in_get, ip_vs_conn_in_get_proto, svc->ipvs, svc->af, skb, iph); iph->hdr_flags ^= IP_VS_HDR_INVERSE; if (cp) { IP_VS_DBG_PKT(12, svc->af, pp, skb, iph->off, "Not scheduling reply for existing" " connection"); __ip_vs_conn_put(cp); return NULL; } } /* * Persistent service */ if (svc->flags & IP_VS_SVC_F_PERSISTENT) return ip_vs_sched_persist(svc, skb, cport, vport, ignored, iph); *ignored = 0; /* * Non-persistent service */ if (!svc->fwmark && vport != svc->port) { if (!svc->port) pr_err("Schedule: port zero only supported " "in persistent services, " "check your ipvs configuration\n"); return NULL; } sched = rcu_dereference(svc->scheduler); if (sched) { /* read svc->sched_data after svc->scheduler */ smp_rmb(); dest = sched->schedule(svc, skb, iph); } else { dest = NULL; } if (dest == NULL) { IP_VS_DBG(1, "Schedule: no dest found.\n"); return NULL; } flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* * Create a connection entry. */ { struct ip_vs_conn_param p; ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, caddr, cport, vaddr, vport, &p); cp = ip_vs_conn_new(&p, dest->af, &dest->addr, dest->port ? dest->port : vport, flags, dest, skb->mark); if (!cp) { *ignored = -1; return NULL; } } IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u " "d:%s:%u conn->flags:%X conn->refcnt:%d\n", ip_vs_fwd_tag(cp), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport), cp->flags, refcount_read(&cp->refcnt)); ip_vs_conn_stats(cp, svc); return cp; } static inline int ip_vs_addr_is_unicast(struct net *net, int af, union nf_inet_addr *addr) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) return ipv6_addr_type(&addr->in6) & IPV6_ADDR_UNICAST; #endif return (inet_addr_type(net, addr->ip) == RTN_UNICAST); } /* * Pass or drop the packet. * Called by ip_vs_in, when the virtual service is available but * no destination is available for a new connection. */ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb, struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph) { __be16 _ports[2], *pptr, dport; struct netns_ipvs *ipvs = svc->ipvs; struct net *net = ipvs->net; pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (!pptr) return NF_DROP; dport = likely(!ip_vs_iph_inverse(iph)) ? pptr[1] : pptr[0]; /* if it is fwmark-based service, the cache_bypass sysctl is up and the destination is a non-local unicast, then create a cache_bypass connection entry */ if (sysctl_cache_bypass(ipvs) && svc->fwmark && !(iph->hdr_flags & (IP_VS_HDR_INVERSE | IP_VS_HDR_ICMP)) && ip_vs_addr_is_unicast(net, svc->af, &iph->daddr)) { int ret; struct ip_vs_conn *cp; unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } }; /* create a new connection entry */ IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__); { struct ip_vs_conn_param p; ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, &iph->saddr, pptr[0], &iph->daddr, pptr[1], &p); cp = ip_vs_conn_new(&p, svc->af, &daddr, 0, IP_VS_CONN_F_BYPASS | flags, NULL, skb->mark); if (!cp) return NF_DROP; } /* statistics */ ip_vs_in_stats(cp, skb); /* set state */ ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); /* transmit the first SYN packet */ ret = cp->packet_xmit(skb, cp, pd->pp, iph); /* do not touch skb anymore */ if ((cp->flags & IP_VS_CONN_F_ONE_PACKET) && cp->control) atomic_inc(&cp->control->in_pkts); else atomic_inc(&cp->in_pkts); ip_vs_conn_put(cp); return ret; } /* * When the virtual ftp service is presented, packets destined * for other services on the VIP may get here (except services * listed in the ipvs table), pass the packets, because it is * not ipvs job to decide to drop the packets. */ if (svc->port == FTPPORT && dport != FTPPORT) return NF_ACCEPT; if (unlikely(ip_vs_iph_icmp(iph))) return NF_DROP; /* * Notify the client that the destination is unreachable, and * release the socket buffer. * Since it is in IP layer, the TCP socket is not actually * created, the TCP RST packet cannot be sent, instead that * ICMP_PORT_UNREACH is sent here no matter it is TCP/UDP. --WZ */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) { if (!skb->dev) skb->dev = net->loopback_dev; icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0); } else #endif icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); return NF_DROP; } #ifdef CONFIG_SYSCTL static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return ipvs->sysctl_snat_reroute; } static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) { return ipvs->sysctl_nat_icmp_send; } #else static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) { return 0; } #endif __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset) { return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0)); } static inline enum ip_defrag_users ip_vs_defrag_user(unsigned int hooknum) { if (NF_INET_LOCAL_IN == hooknum) return IP_DEFRAG_VS_IN; if (NF_INET_FORWARD == hooknum) return IP_DEFRAG_VS_FWD; return IP_DEFRAG_VS_OUT; } static inline int ip_vs_gather_frags(struct netns_ipvs *ipvs, struct sk_buff *skb, u_int32_t user) { int err; local_bh_disable(); err = ip_defrag(ipvs->net, skb, user); local_bh_enable(); if (!err) ip_send_check(ip_hdr(skb)); return err; } static int ip_vs_route_me_harder(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, unsigned int hooknum) { if (!sysctl_snat_reroute(ipvs)) return 0; /* Reroute replies only to remote clients (FORWARD and LOCAL_OUT) */ if (NF_INET_LOCAL_IN == hooknum) return 0; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { struct dst_entry *dst = skb_dst(skb); if (dst->dev && !(dst->dev->flags & IFF_LOOPBACK) && ip6_route_me_harder(ipvs->net, skb->sk, skb) != 0) return 1; } else #endif if (!(skb_rtable(skb)->rt_flags & RTCF_LOCAL) && ip_route_me_harder(ipvs->net, skb->sk, skb, RTN_LOCAL) != 0) return 1; return 0; } /* * Packet has been made sufficiently writable in caller * - inout: 1=in->out, 0=out->in */ void ip_vs_nat_icmp(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, int inout) { struct iphdr *iph = ip_hdr(skb); unsigned int icmp_offset = iph->ihl*4; struct icmphdr *icmph = (struct icmphdr *)(skb_network_header(skb) + icmp_offset); struct iphdr *ciph = (struct iphdr *)(icmph + 1); if (inout) { iph->saddr = cp->vaddr.ip; ip_send_check(iph); ciph->daddr = cp->vaddr.ip; ip_send_check(ciph); } else { iph->daddr = cp->daddr.ip; ip_send_check(iph); ciph->saddr = cp->daddr.ip; ip_send_check(ciph); } /* the TCP/UDP/SCTP port */ if (IPPROTO_TCP == ciph->protocol || IPPROTO_UDP == ciph->protocol || IPPROTO_SCTP == ciph->protocol) { __be16 *ports = (void *)ciph + ciph->ihl*4; if (inout) ports[1] = cp->vport; else ports[0] = cp->dport; } /* And finally the ICMP checksum */ icmph->checksum = 0; icmph->checksum = ip_vs_checksum_complete(skb, icmp_offset); skb->ip_summed = CHECKSUM_UNNECESSARY; if (inout) IP_VS_DBG_PKT(11, AF_INET, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered outgoing ICMP"); else IP_VS_DBG_PKT(11, AF_INET, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered incoming ICMP"); } #ifdef CONFIG_IP_VS_IPV6 void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, int inout) { struct ipv6hdr *iph = ipv6_hdr(skb); unsigned int icmp_offset = 0; unsigned int offs = 0; /* header offset*/ int protocol; struct icmp6hdr *icmph; struct ipv6hdr *ciph; unsigned short fragoffs; ipv6_find_hdr(skb, &icmp_offset, IPPROTO_ICMPV6, &fragoffs, NULL); icmph = (struct icmp6hdr *)(skb_network_header(skb) + icmp_offset); offs = icmp_offset + sizeof(struct icmp6hdr); ciph = (struct ipv6hdr *)(skb_network_header(skb) + offs); protocol = ipv6_find_hdr(skb, &offs, -1, &fragoffs, NULL); if (inout) { iph->saddr = cp->vaddr.in6; ciph->daddr = cp->vaddr.in6; } else { iph->daddr = cp->daddr.in6; ciph->saddr = cp->daddr.in6; } /* the TCP/UDP/SCTP port */ if (!fragoffs && (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol || IPPROTO_SCTP == protocol)) { __be16 *ports = (void *)(skb_network_header(skb) + offs); IP_VS_DBG(11, "%s() changed port %d to %d\n", __func__, ntohs(inout ? ports[1] : ports[0]), ntohs(inout ? cp->vport : cp->dport)); if (inout) ports[1] = cp->vport; else ports[0] = cp->dport; } /* And finally the ICMP checksum */ icmph->icmp6_cksum = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, skb->len - icmp_offset, IPPROTO_ICMPV6, 0); skb->csum_start = skb_network_header(skb) - skb->head + icmp_offset; skb->csum_offset = offsetof(struct icmp6hdr, icmp6_cksum); skb->ip_summed = CHECKSUM_PARTIAL; if (inout) IP_VS_DBG_PKT(11, AF_INET6, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered outgoing ICMPv6"); else IP_VS_DBG_PKT(11, AF_INET6, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered incoming ICMPv6"); } #endif /* Handle relevant response ICMP messages - forward to the right * destination host. */ static int handle_response_icmp(int af, struct sk_buff *skb, union nf_inet_addr *snet, __u8 protocol, struct ip_vs_conn *cp, struct ip_vs_protocol *pp, unsigned int offset, unsigned int ihl, unsigned int hooknum) { unsigned int verdict = NF_DROP; if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) goto after_nat; /* Ensure the checksum is correct */ if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) { /* Failed checksum! */ IP_VS_DBG_BUF(1, "Forward ICMP: failed checksum from %s!\n", IP_VS_DBG_ADDR(af, snet)); goto out; } if (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol || IPPROTO_SCTP == protocol) offset += 2 * sizeof(__u16); if (skb_ensure_writable(skb, offset)) goto out; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) ip_vs_nat_icmp_v6(skb, pp, cp, 1); else #endif ip_vs_nat_icmp(skb, pp, cp, 1); if (ip_vs_route_me_harder(cp->ipvs, af, skb, hooknum)) goto out; after_nat: /* do the statistics and put it back */ ip_vs_out_stats(cp, skb); skb->ipvs_property = 1; if (!(cp->flags & IP_VS_CONN_F_NFCT)) ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 0); verdict = NF_ACCEPT; out: __ip_vs_conn_put(cp); return verdict; } /* * Handle ICMP messages in the inside-to-outside direction (outgoing). * Find any that might be relevant, check against existing connections. * Currently handles error types - unreachable, quench, ttl exceeded. */ static int ip_vs_out_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum) { struct iphdr *iph; struct icmphdr _icmph, *ic; struct iphdr _ciph, *cih; /* The ip header contained within the ICMP */ struct ip_vs_iphdr ciph; struct ip_vs_conn *cp; struct ip_vs_protocol *pp; unsigned int offset, ihl; union nf_inet_addr snet; *related = 1; /* reassemble IP fragments */ if (ip_is_fragment(ip_hdr(skb))) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; } iph = ip_hdr(skb); offset = ihl = iph->ihl * 4; ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; IP_VS_DBG(12, "Outgoing ICMP (%d,%d) %pI4->%pI4\n", ic->type, ntohs(icmp_id(ic)), &iph->saddr, &iph->daddr); /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if ((ic->type != ICMP_DEST_UNREACH) && (ic->type != ICMP_SOURCE_QUENCH) && (ic->type != ICMP_TIME_EXCEEDED)) { *related = 0; return NF_ACCEPT; } /* Now find the contained IP header */ offset += sizeof(_icmph); cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ pp = ip_vs_proto_get(cih->protocol); if (!pp) return NF_ACCEPT; /* Is the embedded protocol header present? */ if (unlikely(cih->frag_off & htons(IP_OFFSET) && pp->dont_defrag)) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset, "Checking outgoing ICMP for"); ip_vs_fill_iph_skb_icmp(AF_INET, skb, offset, true, &ciph); /* The embedded headers contain source and dest in reverse order */ cp = INDIRECT_CALL_1(pp->conn_out_get, ip_vs_conn_out_get_proto, ipvs, AF_INET, skb, &ciph); if (!cp) return NF_ACCEPT; snet.ip = iph->saddr; return handle_response_icmp(AF_INET, skb, &snet, cih->protocol, cp, pp, ciph.len, ihl, hooknum); } #ifdef CONFIG_IP_VS_IPV6 static int ip_vs_out_icmp_v6(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum, struct ip_vs_iphdr *ipvsh) { struct icmp6hdr _icmph, *ic; struct ip_vs_iphdr ciph = {.flags = 0, .fragoffs = 0};/*Contained IP */ struct ip_vs_conn *cp; struct ip_vs_protocol *pp; union nf_inet_addr snet; unsigned int offset; *related = 1; ic = frag_safe_skb_hp(skb, ipvsh->len, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) { *related = 0; return NF_ACCEPT; } /* Fragment header that is before ICMP header tells us that: * it's not an error message since they can't be fragmented. */ if (ipvsh->flags & IP6_FH_F_FRAG) return NF_DROP; IP_VS_DBG(8, "Outgoing ICMPv6 (%d,%d) %pI6c->%pI6c\n", ic->icmp6_type, ntohs(icmpv6_id(ic)), &ipvsh->saddr, &ipvsh->daddr); if (!ip_vs_fill_iph_skb_icmp(AF_INET6, skb, ipvsh->len + sizeof(_icmph), true, &ciph)) return NF_ACCEPT; /* The packet looks wrong, ignore */ pp = ip_vs_proto_get(ciph.protocol); if (!pp) return NF_ACCEPT; /* The embedded headers contain source and dest in reverse order */ cp = INDIRECT_CALL_1(pp->conn_out_get, ip_vs_conn_out_get_proto, ipvs, AF_INET6, skb, &ciph); if (!cp) return NF_ACCEPT; snet.in6 = ciph.saddr.in6; offset = ciph.len; return handle_response_icmp(AF_INET6, skb, &snet, ciph.protocol, cp, pp, offset, sizeof(struct ipv6hdr), hooknum); } #endif /* * Check if sctp chunc is ABORT chunk */ static inline int is_sctp_abort(const struct sk_buff *skb, int nh_len) { struct sctp_chunkhdr *sch, schunk; sch = skb_header_pointer(skb, nh_len + sizeof(struct sctphdr), sizeof(schunk), &schunk); if (sch == NULL) return 0; if (sch->type == SCTP_CID_ABORT) return 1; return 0; } static inline int is_tcp_reset(const struct sk_buff *skb, int nh_len) { struct tcphdr _tcph, *th; th = skb_header_pointer(skb, nh_len, sizeof(_tcph), &_tcph); if (th == NULL) return 0; return th->rst; } static inline bool is_new_conn(const struct sk_buff *skb, struct ip_vs_iphdr *iph) { switch (iph->protocol) { case IPPROTO_TCP: { struct tcphdr _tcph, *th; th = skb_header_pointer(skb, iph->len, sizeof(_tcph), &_tcph); if (th == NULL) return false; return th->syn; } case IPPROTO_SCTP: { struct sctp_chunkhdr *sch, schunk; sch = skb_header_pointer(skb, iph->len + sizeof(struct sctphdr), sizeof(schunk), &schunk); if (sch == NULL) return false; return sch->type == SCTP_CID_INIT; } default: return false; } } static inline bool is_new_conn_expected(const struct ip_vs_conn *cp, int conn_reuse_mode) { /* Controlled (FTP DATA or persistence)? */ if (cp->control) return false; switch (cp->protocol) { case IPPROTO_TCP: return (cp->state == IP_VS_TCP_S_TIME_WAIT) || (cp->state == IP_VS_TCP_S_CLOSE) || ((conn_reuse_mode & 2) && (cp->state == IP_VS_TCP_S_FIN_WAIT) && (cp->flags & IP_VS_CONN_F_NOOUTPUT)); case IPPROTO_SCTP: return cp->state == IP_VS_SCTP_S_CLOSED; default: return false; } } /* Generic function to create new connections for outgoing RS packets * * Pre-requisites for successful connection creation: * 1) Virtual Service is NOT fwmark based: * In fwmark-VS actual vaddr and vport are unknown to IPVS * 2) Real Server and Virtual Service were NOT configured without port: * This is to allow match of different VS to the same RS ip-addr */ struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc, struct ip_vs_dest *dest, struct sk_buff *skb, const struct ip_vs_iphdr *iph, __be16 dport, __be16 cport) { struct ip_vs_conn_param param; struct ip_vs_conn *ct = NULL, *cp = NULL; const union nf_inet_addr *vaddr, *daddr, *caddr; union nf_inet_addr snet; __be16 vport; unsigned int flags; vaddr = &svc->addr; vport = svc->port; daddr = &iph->saddr; caddr = &iph->daddr; /* check pre-requisites are satisfied */ if (svc->fwmark) return NULL; if (!vport || !dport) return NULL; /* for persistent service first create connection template */ if (svc->flags & IP_VS_SVC_F_PERSISTENT) { /* apply netmask the same way ingress-side does */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) ipv6_addr_prefix(&snet.in6, &caddr->in6, (__force __u32)svc->netmask); else #endif snet.ip = caddr->ip & svc->netmask; /* fill params and create template if not existent */ if (ip_vs_conn_fill_param_persist(svc, skb, iph->protocol, &snet, 0, vaddr, vport, ¶m) < 0) return NULL; ct = ip_vs_ct_in_get(¶m); /* check if template exists and points to the same dest */ if (!ct || !ip_vs_check_template(ct, dest)) { ct = ip_vs_conn_new(¶m, dest->af, daddr, dport, IP_VS_CONN_F_TEMPLATE, dest, 0); if (!ct) { kfree(param.pe_data); return NULL; } ct->timeout = svc->timeout; } else { kfree(param.pe_data); } } /* connection flags */ flags = ((svc->flags & IP_VS_SVC_F_ONEPACKET) && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* create connection */ ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, caddr, cport, vaddr, vport, ¶m); cp = ip_vs_conn_new(¶m, dest->af, daddr, dport, flags, dest, 0); if (!cp) { if (ct) ip_vs_conn_put(ct); return NULL; } if (ct) { ip_vs_control_add(cp, ct); ip_vs_conn_put(ct); } ip_vs_conn_stats(cp, svc); /* return connection (will be used to handle outgoing packet) */ IP_VS_DBG_BUF(6, "New connection RS-initiated:%c c:%s:%u v:%s:%u " "d:%s:%u conn->flags:%X conn->refcnt:%d\n", ip_vs_fwd_tag(cp), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport), cp->flags, refcount_read(&cp->refcnt)); return cp; } /* Handle outgoing packets which are considered requests initiated by * real servers, so that subsequent responses from external client can be * routed to the right real server. * Used also for outgoing responses in OPS mode. * * Connection management is handled by persistent-engine specific callback. */ static struct ip_vs_conn *__ip_vs_rs_conn_out(unsigned int hooknum, struct netns_ipvs *ipvs, int af, struct sk_buff *skb, const struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest; struct ip_vs_conn *cp = NULL; __be16 _ports[2], *pptr; if (hooknum == NF_INET_LOCAL_IN) return NULL; pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (!pptr) return NULL; dest = ip_vs_find_real_service(ipvs, af, iph->protocol, &iph->saddr, pptr[0]); if (dest) { struct ip_vs_service *svc; struct ip_vs_pe *pe; svc = rcu_dereference(dest->svc); if (svc) { pe = rcu_dereference(svc->pe); if (pe && pe->conn_out) cp = pe->conn_out(svc, dest, skb, iph, pptr[0], pptr[1]); } } return cp; } /* Handle response packets: rewrite addresses and send away... */ static unsigned int handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph, unsigned int hooknum) { struct ip_vs_protocol *pp = pd->pp; if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) goto after_nat; IP_VS_DBG_PKT(11, af, pp, skb, iph->off, "Outgoing packet"); if (skb_ensure_writable(skb, iph->len)) goto drop; /* mangle the packet */ if (pp->snat_handler && !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph)) goto drop; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) ipv6_hdr(skb)->saddr = cp->vaddr.in6; else #endif { ip_hdr(skb)->saddr = cp->vaddr.ip; ip_send_check(ip_hdr(skb)); } /* * nf_iterate does not expect change in the skb->dst->dev. * It looks like it is not fatal to enable this code for hooks * where our handlers are at the end of the chain list and * when all next handlers use skb->dst->dev and not outdev. * It will definitely route properly the inout NAT traffic * when multiple paths are used. */ /* For policy routing, packets originating from this * machine itself may be routed differently to packets * passing through. We want this packet to be routed as * if it came from this machine itself. So re-compute * the routing information. */ if (ip_vs_route_me_harder(cp->ipvs, af, skb, hooknum)) goto drop; IP_VS_DBG_PKT(10, af, pp, skb, iph->off, "After SNAT"); after_nat: ip_vs_out_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_OUTPUT, skb, pd); skb->ipvs_property = 1; if (!(cp->flags & IP_VS_CONN_F_NFCT)) ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 0); ip_vs_conn_put(cp); return NF_ACCEPT; drop: ip_vs_conn_put(cp); kfree_skb(skb); return NF_STOLEN; } /* * Check if outgoing packet belongs to the established ip_vs_conn. */ static unsigned int ip_vs_out_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct netns_ipvs *ipvs = net_ipvs(state->net); unsigned int hooknum = state->hook; struct ip_vs_iphdr iph; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; struct ip_vs_conn *cp; int af = state->pf; struct sock *sk; /* Already marked as IPVS request or reply? */ if (skb->ipvs_property) return NF_ACCEPT; sk = skb_to_full_sk(skb); /* Bad... Do not break raw sockets */ if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { if (sk->sk_family == PF_INET && inet_test_bit(NODEFRAG, sk)) return NF_ACCEPT; } if (unlikely(!skb_dst(skb))) return NF_ACCEPT; if (!ipvs->enable) return NF_ACCEPT; ip_vs_fill_iph_skb(af, skb, false, &iph); #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_out_icmp_v6(ipvs, skb, &related, hooknum, &iph); if (related) return verdict; } } else #endif if (unlikely(iph.protocol == IPPROTO_ICMP)) { int related; int verdict = ip_vs_out_icmp(ipvs, skb, &related, hooknum); if (related) return verdict; } pd = ip_vs_proto_data_get(ipvs, iph.protocol); if (unlikely(!pd)) return NF_ACCEPT; pp = pd->pp; /* reassemble IP fragments */ #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET) #endif if (unlikely(ip_is_fragment(ip_hdr(skb)) && !pp->dont_defrag)) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; ip_vs_fill_iph_skb(AF_INET, skb, false, &iph); } /* * Check if the packet belongs to an existing entry */ cp = INDIRECT_CALL_1(pp->conn_out_get, ip_vs_conn_out_get_proto, ipvs, af, skb, &iph); if (likely(cp)) return handle_response(af, skb, pd, cp, &iph, hooknum); /* Check for real-server-started requests */ if (atomic_read(&ipvs->conn_out_counter)) { /* Currently only for UDP: * connection oriented protocols typically use * ephemeral ports for outgoing connections, so * related incoming responses would not match any VS */ if (pp->protocol == IPPROTO_UDP) { cp = __ip_vs_rs_conn_out(hooknum, ipvs, af, skb, &iph); if (likely(cp)) return handle_response(af, skb, pd, cp, &iph, hooknum); } } if (sysctl_nat_icmp_send(ipvs) && (pp->protocol == IPPROTO_TCP || pp->protocol == IPPROTO_UDP || pp->protocol == IPPROTO_SCTP)) { __be16 _ports[2], *pptr; pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports); if (pptr == NULL) return NF_ACCEPT; /* Not for me */ if (ip_vs_has_real_service(ipvs, af, iph.protocol, &iph.saddr, pptr[0])) { /* * Notify the real server: there is no * existing entry if it is not RST * packet or not TCP packet. */ if ((iph.protocol != IPPROTO_TCP && iph.protocol != IPPROTO_SCTP) || ((iph.protocol == IPPROTO_TCP && !is_tcp_reset(skb, iph.len)) || (iph.protocol == IPPROTO_SCTP && !is_sctp_abort(skb, iph.len)))) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (!skb->dev) skb->dev = ipvs->net->loopback_dev; icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0); } else #endif icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); return NF_DROP; } } } IP_VS_DBG_PKT(12, af, pp, skb, iph.off, "ip_vs_out: packet continues traversal as normal"); return NF_ACCEPT; } static unsigned int ip_vs_try_to_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph) { struct ip_vs_protocol *pp = pd->pp; if (!iph->fragoffs) { /* No (second) fragments need to enter here, as nf_defrag_ipv6 * replayed fragment zero will already have created the cp */ /* Schedule and create new connection entry into cpp */ if (!pp->conn_schedule(ipvs, af, skb, pd, verdict, cpp, iph)) return 0; } if (unlikely(!*cpp)) { /* sorry, all this trouble for a no-hit :) */ IP_VS_DBG_PKT(12, af, pp, skb, iph->off, "ip_vs_in: packet continues traversal as normal"); /* Fragment couldn't be mapped to a conn entry */ if (iph->fragoffs) IP_VS_DBG_PKT(7, af, pp, skb, iph->off, "unhandled fragment"); *verdict = NF_ACCEPT; return 0; } return 1; } /* Check the UDP tunnel and return its header length */ static int ipvs_udp_decap(struct netns_ipvs *ipvs, struct sk_buff *skb, unsigned int offset, __u16 af, const union nf_inet_addr *daddr, __u8 *proto) { struct udphdr _udph, *udph; struct ip_vs_dest *dest; udph = skb_header_pointer(skb, offset, sizeof(_udph), &_udph); if (!udph) goto unk; offset += sizeof(struct udphdr); dest = ip_vs_find_tunnel(ipvs, af, daddr, udph->dest); if (!dest) goto unk; if (dest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { struct guehdr _gueh, *gueh; gueh = skb_header_pointer(skb, offset, sizeof(_gueh), &_gueh); if (!gueh) goto unk; if (gueh->control != 0 || gueh->version != 0) goto unk; /* Later we can support also IPPROTO_IPV6 */ if (gueh->proto_ctype != IPPROTO_IPIP) goto unk; *proto = gueh->proto_ctype; return sizeof(struct udphdr) + sizeof(struct guehdr) + (gueh->hlen << 2); } unk: return 0; } /* Check the GRE tunnel and return its header length */ static int ipvs_gre_decap(struct netns_ipvs *ipvs, struct sk_buff *skb, unsigned int offset, __u16 af, const union nf_inet_addr *daddr, __u8 *proto) { struct gre_base_hdr _greh, *greh; struct ip_vs_dest *dest; greh = skb_header_pointer(skb, offset, sizeof(_greh), &_greh); if (!greh) goto unk; dest = ip_vs_find_tunnel(ipvs, af, daddr, 0); if (!dest) goto unk; if (dest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GRE) { IP_TUNNEL_DECLARE_FLAGS(flags); __be16 type; /* Only support version 0 and C (csum) */ if ((greh->flags & ~GRE_CSUM) != 0) goto unk; type = greh->protocol; /* Later we can support also IPPROTO_IPV6 */ if (type != htons(ETH_P_IP)) goto unk; *proto = IPPROTO_IPIP; gre_flags_to_tnl_flags(flags, greh->flags); return gre_calc_hlen(flags); } unk: return 0; } /* * Handle ICMP messages in the outside-to-inside direction (incoming). * Find any that might be relevant, check against existing connections, * forward to the right destination host if relevant. * Currently handles error types - unreachable, quench, ttl exceeded. */ static int ip_vs_in_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum) { struct iphdr *iph; struct icmphdr _icmph, *ic; struct iphdr _ciph, *cih; /* The ip header contained within the ICMP */ struct ip_vs_iphdr ciph; struct ip_vs_conn *cp; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; unsigned int offset, offset2, ihl, verdict; bool tunnel, new_cp = false; union nf_inet_addr *raddr; char *outer_proto = "IPIP"; *related = 1; /* reassemble IP fragments */ if (ip_is_fragment(ip_hdr(skb))) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; } iph = ip_hdr(skb); offset = ihl = iph->ihl * 4; ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; IP_VS_DBG(12, "Incoming ICMP (%d,%d) %pI4->%pI4\n", ic->type, ntohs(icmp_id(ic)), &iph->saddr, &iph->daddr); /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if ((ic->type != ICMP_DEST_UNREACH) && (ic->type != ICMP_SOURCE_QUENCH) && (ic->type != ICMP_TIME_EXCEEDED)) { *related = 0; return NF_ACCEPT; } /* Now find the contained IP header */ offset += sizeof(_icmph); cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ raddr = (union nf_inet_addr *)&cih->daddr; /* Special case for errors for IPIP/UDP/GRE tunnel packets */ tunnel = false; if (cih->protocol == IPPROTO_IPIP) { struct ip_vs_dest *dest; if (unlikely(cih->frag_off & htons(IP_OFFSET))) return NF_ACCEPT; /* Error for our IPIP must arrive at LOCAL_IN */ if (!(skb_rtable(skb)->rt_flags & RTCF_LOCAL)) return NF_ACCEPT; dest = ip_vs_find_tunnel(ipvs, AF_INET, raddr, 0); /* Only for known tunnel */ if (!dest || dest->tun_type != IP_VS_CONN_F_TUNNEL_TYPE_IPIP) return NF_ACCEPT; offset += cih->ihl * 4; cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ tunnel = true; } else if ((cih->protocol == IPPROTO_UDP || /* Can be UDP encap */ cih->protocol == IPPROTO_GRE) && /* Can be GRE encap */ /* Error for our tunnel must arrive at LOCAL_IN */ (skb_rtable(skb)->rt_flags & RTCF_LOCAL)) { __u8 iproto; int ulen; /* Non-first fragment has no UDP/GRE header */ if (unlikely(cih->frag_off & htons(IP_OFFSET))) return NF_ACCEPT; offset2 = offset + cih->ihl * 4; if (cih->protocol == IPPROTO_UDP) { ulen = ipvs_udp_decap(ipvs, skb, offset2, AF_INET, raddr, &iproto); outer_proto = "UDP"; } else { ulen = ipvs_gre_decap(ipvs, skb, offset2, AF_INET, raddr, &iproto); outer_proto = "GRE"; } if (ulen > 0) { /* Skip IP and UDP/GRE tunnel headers */ offset = offset2 + ulen; /* Now we should be at the original IP header */ cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih && cih->version == 4 && cih->ihl >= 5 && iproto == IPPROTO_IPIP) tunnel = true; else return NF_ACCEPT; } } pd = ip_vs_proto_data_get(ipvs, cih->protocol); if (!pd) return NF_ACCEPT; pp = pd->pp; /* Is the embedded protocol header present? */ if (unlikely(cih->frag_off & htons(IP_OFFSET) && pp->dont_defrag)) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset, "Checking incoming ICMP for"); offset2 = offset; ip_vs_fill_iph_skb_icmp(AF_INET, skb, offset, !tunnel, &ciph); offset = ciph.len; /* The embedded headers contain source and dest in reverse order. * For IPIP/UDP/GRE tunnel this is error for request, not for reply. */ cp = INDIRECT_CALL_1(pp->conn_in_get, ip_vs_conn_in_get_proto, ipvs, AF_INET, skb, &ciph); if (!cp) { int v; if (tunnel || !sysctl_schedule_icmp(ipvs)) return NF_ACCEPT; if (!ip_vs_try_to_schedule(ipvs, AF_INET, skb, pd, &v, &cp, &ciph)) return v; new_cp = true; } verdict = NF_DROP; /* Ensure the checksum is correct */ if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) { /* Failed checksum! */ IP_VS_DBG(1, "Incoming ICMP: failed checksum from %pI4!\n", &iph->saddr); goto out; } if (tunnel) { __be32 info = ic->un.gateway; __u8 type = ic->type; __u8 code = ic->code; /* Update the MTU */ if (ic->type == ICMP_DEST_UNREACH && ic->code == ICMP_FRAG_NEEDED) { struct ip_vs_dest *dest = cp->dest; u32 mtu = ntohs(ic->un.frag.mtu); __be16 frag_off = cih->frag_off; /* Strip outer IP and ICMP, go to IPIP/UDP/GRE header */ if (pskb_pull(skb, ihl + sizeof(_icmph)) == NULL) goto ignore_tunnel; offset2 -= ihl + sizeof(_icmph); skb_reset_network_header(skb); IP_VS_DBG(12, "ICMP for %s %pI4->%pI4: mtu=%u\n", outer_proto, &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, mtu); ipv4_update_pmtu(skb, ipvs->net, mtu, 0, 0); /* Client uses PMTUD? */ if (!(frag_off & htons(IP_DF))) goto ignore_tunnel; /* Prefer the resulting PMTU */ if (dest) { struct ip_vs_dest_dst *dest_dst; dest_dst = rcu_dereference(dest->dest_dst); if (dest_dst) mtu = dst_mtu(dest_dst->dst_cache); } if (mtu > 68 + sizeof(struct iphdr)) mtu -= sizeof(struct iphdr); info = htonl(mtu); } /* Strip outer IP, ICMP and IPIP/UDP/GRE, go to IP header of * original request. */ if (pskb_pull(skb, offset2) == NULL) goto ignore_tunnel; skb_reset_network_header(skb); IP_VS_DBG(12, "Sending ICMP for %pI4->%pI4: t=%u, c=%u, i=%u\n", &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, type, code, ntohl(info)); icmp_send(skb, type, code, info); /* ICMP can be shorter but anyways, account it */ ip_vs_out_stats(cp, skb); ignore_tunnel: consume_skb(skb); verdict = NF_STOLEN; goto out; } /* do the statistics and put it back */ ip_vs_in_stats(cp, skb); if (IPPROTO_TCP == cih->protocol || IPPROTO_UDP == cih->protocol || IPPROTO_SCTP == cih->protocol) offset += 2 * sizeof(__u16); verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum, &ciph); out: if (likely(!new_cp)) __ip_vs_conn_put(cp); else ip_vs_conn_put(cp); return verdict; } #ifdef CONFIG_IP_VS_IPV6 static int ip_vs_in_icmp_v6(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum, struct ip_vs_iphdr *iph) { struct icmp6hdr _icmph, *ic; struct ip_vs_iphdr ciph = {.flags = 0, .fragoffs = 0};/*Contained IP */ struct ip_vs_conn *cp; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; unsigned int offset, verdict; bool new_cp = false; *related = 1; ic = frag_safe_skb_hp(skb, iph->len, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) { *related = 0; return NF_ACCEPT; } /* Fragment header that is before ICMP header tells us that: * it's not an error message since they can't be fragmented. */ if (iph->flags & IP6_FH_F_FRAG) return NF_DROP; IP_VS_DBG(8, "Incoming ICMPv6 (%d,%d) %pI6c->%pI6c\n", ic->icmp6_type, ntohs(icmpv6_id(ic)), &iph->saddr, &iph->daddr); offset = iph->len + sizeof(_icmph); if (!ip_vs_fill_iph_skb_icmp(AF_INET6, skb, offset, true, &ciph)) return NF_ACCEPT; pd = ip_vs_proto_data_get(ipvs, ciph.protocol); if (!pd) return NF_ACCEPT; pp = pd->pp; /* Cannot handle fragmented embedded protocol */ if (ciph.fragoffs) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset, "Checking incoming ICMPv6 for"); /* The embedded headers contain source and dest in reverse order * if not from localhost */ cp = INDIRECT_CALL_1(pp->conn_in_get, ip_vs_conn_in_get_proto, ipvs, AF_INET6, skb, &ciph); if (!cp) { int v; if (!sysctl_schedule_icmp(ipvs)) return NF_ACCEPT; if (!ip_vs_try_to_schedule(ipvs, AF_INET6, skb, pd, &v, &cp, &ciph)) return v; new_cp = true; } /* VS/TUN, VS/DR and LOCALNODE just let it go */ if ((hooknum == NF_INET_LOCAL_OUT) && (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)) { verdict = NF_ACCEPT; goto out; } /* do the statistics and put it back */ ip_vs_in_stats(cp, skb); /* Need to mangle contained IPv6 header in ICMPv6 packet */ offset = ciph.len; if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol || IPPROTO_SCTP == ciph.protocol) offset += 2 * sizeof(__u16); /* Also mangle ports */ verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph); out: if (likely(!new_cp)) __ip_vs_conn_put(cp); else ip_vs_conn_put(cp); return verdict; } #endif /* * Check if it's for virtual services, look it up, * and send it on its way... */ static unsigned int ip_vs_in_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct netns_ipvs *ipvs = net_ipvs(state->net); unsigned int hooknum = state->hook; struct ip_vs_iphdr iph; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; struct ip_vs_conn *cp; int ret, pkts; struct sock *sk; int af = state->pf; /* Already marked as IPVS request or reply? */ if (skb->ipvs_property) return NF_ACCEPT; /* * Big tappo: * - remote client: only PACKET_HOST * - route: used for struct net when skb->dev is unset */ if (unlikely((skb->pkt_type != PACKET_HOST && hooknum != NF_INET_LOCAL_OUT) || !skb_dst(skb))) { ip_vs_fill_iph_skb(af, skb, false, &iph); IP_VS_DBG_BUF(12, "packet type=%d proto=%d daddr=%s" " ignored in hook %u\n", skb->pkt_type, iph.protocol, IP_VS_DBG_ADDR(af, &iph.daddr), hooknum); return NF_ACCEPT; } /* ipvs enabled in this netns ? */ if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; ip_vs_fill_iph_skb(af, skb, false, &iph); /* Bad... Do not break raw sockets */ sk = skb_to_full_sk(skb); if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { if (sk->sk_family == PF_INET && inet_test_bit(NODEFRAG, sk)) return NF_ACCEPT; } #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_in_icmp_v6(ipvs, skb, &related, hooknum, &iph); if (related) return verdict; } } else #endif if (unlikely(iph.protocol == IPPROTO_ICMP)) { int related; int verdict = ip_vs_in_icmp(ipvs, skb, &related, hooknum); if (related) return verdict; } /* Protocol supported? */ pd = ip_vs_proto_data_get(ipvs, iph.protocol); if (unlikely(!pd)) { /* The only way we'll see this packet again is if it's * encapsulated, so mark it with ipvs_property=1 so we * skip it if we're ignoring tunneled packets */ if (sysctl_ignore_tunneled(ipvs)) skb->ipvs_property = 1; return NF_ACCEPT; } pp = pd->pp; /* * Check if the packet belongs to an existing connection entry */ cp = INDIRECT_CALL_1(pp->conn_in_get, ip_vs_conn_in_get_proto, ipvs, af, skb, &iph); if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) { int conn_reuse_mode = sysctl_conn_reuse_mode(ipvs); bool old_ct = false, resched = false; if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest && unlikely(!atomic_read(&cp->dest->weight))) { resched = true; old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); } else if (conn_reuse_mode && is_new_conn_expected(cp, conn_reuse_mode)) { old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); if (!atomic_read(&cp->n_control)) { resched = true; } else { /* Do not reschedule controlling connection * that uses conntrack while it is still * referenced by controlled connection(s). */ resched = !old_ct; } } if (resched) { if (!old_ct) cp->flags &= ~IP_VS_CONN_F_NFCT; if (!atomic_read(&cp->n_control)) ip_vs_conn_expire_now(cp); __ip_vs_conn_put(cp); if (old_ct) return NF_DROP; cp = NULL; } } /* Check the server status */ if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ if (sysctl_expire_nodest_conn(ipvs)) { bool old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); if (!old_ct) cp->flags &= ~IP_VS_CONN_F_NFCT; ip_vs_conn_expire_now(cp); __ip_vs_conn_put(cp); if (old_ct) return NF_DROP; cp = NULL; } else { __ip_vs_conn_put(cp); return NF_DROP; } } if (unlikely(!cp)) { int v; if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) return v; } IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); ip_vs_in_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); if (cp->packet_xmit) ret = cp->packet_xmit(skb, cp, pp, &iph); /* do not touch skb anymore */ else { IP_VS_DBG_RL("warning: packet_xmit is null"); ret = NF_ACCEPT; } /* Increase its packet counter and check if it is needed * to be synchronized * * Sync connection if it is about to close to * encorage the standby servers to update the connections timeout * * For ONE_PKT let ip_vs_sync_conn() do the filter work. */ if (cp->flags & IP_VS_CONN_F_ONE_PACKET) pkts = sysctl_sync_threshold(ipvs); else pkts = atomic_inc_return(&cp->in_pkts); if (ipvs->sync_state & IP_VS_STATE_MASTER) ip_vs_sync_conn(ipvs, cp, pkts); else if ((cp->flags & IP_VS_CONN_F_ONE_PACKET) && cp->control) /* increment is done inside ip_vs_sync_conn too */ atomic_inc(&cp->control->in_pkts); ip_vs_conn_put(cp); return ret; } /* * It is hooked at the NF_INET_FORWARD chain, in order to catch ICMP * related packets destined for 0.0.0.0/0. * When fwmark-based virtual service is used, such as transparent * cache cluster, TCP packets can be marked and routed to ip_vs_in, * but ICMP destined for 0.0.0.0/0 cannot not be easily marked and * sent to ip_vs_in_icmp. So, catch them at the NF_INET_FORWARD chain * and send them to ip_vs_in_icmp. */ static unsigned int ip_vs_forward_icmp(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct netns_ipvs *ipvs = net_ipvs(state->net); int r; /* ipvs enabled in this netns ? */ if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; if (state->pf == NFPROTO_IPV4) { if (ip_hdr(skb)->protocol != IPPROTO_ICMP) return NF_ACCEPT; #ifdef CONFIG_IP_VS_IPV6 } else { struct ip_vs_iphdr iphdr; ip_vs_fill_iph_skb(AF_INET6, skb, false, &iphdr); if (iphdr.protocol != IPPROTO_ICMPV6) return NF_ACCEPT; return ip_vs_in_icmp_v6(ipvs, skb, &r, state->hook, &iphdr); #endif } return ip_vs_in_icmp(ipvs, skb, &r, state->hook); } static const struct nf_hook_ops ip_vs_ops4[] = { /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_NAT_SRC - 2, }, /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */ { .hook = ip_vs_in_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_NAT_SRC - 1, }, /* Before ip_vs_in, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP_PRI_NAT_DST + 1, }, /* After mangle, schedule and forward local requests */ { .hook = ip_vs_in_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP_PRI_NAT_DST + 2, }, /* After packet filtering (but before ip_vs_out_icmp), catch icmp * destined for 0.0.0.0/0, which is for incoming IPVS connections */ { .hook = ip_vs_forward_icmp, .pf = NFPROTO_IPV4, .hooknum = NF_INET_FORWARD, .priority = 99, }, /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_FORWARD, .priority = 100, }, }; #ifdef CONFIG_IP_VS_IPV6 static const struct nf_hook_ops ip_vs_ops6[] = { /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP6_PRI_NAT_SRC - 2, }, /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */ { .hook = ip_vs_in_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP6_PRI_NAT_SRC - 1, }, /* Before ip_vs_in, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP6_PRI_NAT_DST + 1, }, /* After mangle, schedule and forward local requests */ { .hook = ip_vs_in_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP6_PRI_NAT_DST + 2, }, /* After packet filtering (but before ip_vs_out_icmp), catch icmp * destined for 0.0.0.0/0, which is for incoming IPVS connections */ { .hook = ip_vs_forward_icmp, .pf = NFPROTO_IPV6, .hooknum = NF_INET_FORWARD, .priority = 99, }, /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_out_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_FORWARD, .priority = 100, }, }; #endif int ip_vs_register_hooks(struct netns_ipvs *ipvs, unsigned int af) { const struct nf_hook_ops *ops; unsigned int count; unsigned int afmask; int ret = 0; if (af == AF_INET6) { #ifdef CONFIG_IP_VS_IPV6 ops = ip_vs_ops6; count = ARRAY_SIZE(ip_vs_ops6); afmask = 2; #else return -EINVAL; #endif } else { ops = ip_vs_ops4; count = ARRAY_SIZE(ip_vs_ops4); afmask = 1; } if (!(ipvs->hooks_afmask & afmask)) { ret = nf_register_net_hooks(ipvs->net, ops, count); if (ret >= 0) ipvs->hooks_afmask |= afmask; } return ret; } void ip_vs_unregister_hooks(struct netns_ipvs *ipvs, unsigned int af) { const struct nf_hook_ops *ops; unsigned int count; unsigned int afmask; if (af == AF_INET6) { #ifdef CONFIG_IP_VS_IPV6 ops = ip_vs_ops6; count = ARRAY_SIZE(ip_vs_ops6); afmask = 2; #else return; #endif } else { ops = ip_vs_ops4; count = ARRAY_SIZE(ip_vs_ops4); afmask = 1; } if (ipvs->hooks_afmask & afmask) { nf_unregister_net_hooks(ipvs->net, ops, count); ipvs->hooks_afmask &= ~afmask; } } /* * Initialize IP Virtual Server netns mem. */ static int __net_init __ip_vs_init(struct net *net) { struct netns_ipvs *ipvs; ipvs = net_generic(net, ip_vs_net_id); if (ipvs == NULL) return -ENOMEM; /* Hold the beast until a service is registered */ ipvs->enable = 0; ipvs->net = net; /* Counters used for creating unique names */ ipvs->gen = atomic_read(&ipvs_netns_cnt); atomic_inc(&ipvs_netns_cnt); net->ipvs = ipvs; if (ip_vs_estimator_net_init(ipvs) < 0) goto estimator_fail; if (ip_vs_control_net_init(ipvs) < 0) goto control_fail; if (ip_vs_protocol_net_init(ipvs) < 0) goto protocol_fail; if (ip_vs_app_net_init(ipvs) < 0) goto app_fail; if (ip_vs_conn_net_init(ipvs) < 0) goto conn_fail; if (ip_vs_sync_net_init(ipvs) < 0) goto sync_fail; return 0; /* * Error handling */ sync_fail: ip_vs_conn_net_cleanup(ipvs); conn_fail: ip_vs_app_net_cleanup(ipvs); app_fail: ip_vs_protocol_net_cleanup(ipvs); protocol_fail: ip_vs_control_net_cleanup(ipvs); control_fail: ip_vs_estimator_net_cleanup(ipvs); estimator_fail: net->ipvs = NULL; return -ENOMEM; } static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list) { struct netns_ipvs *ipvs; struct net *net; ip_vs_service_nets_cleanup(net_list); /* ip_vs_flush() with locks */ list_for_each_entry(net, net_list, exit_list) { ipvs = net_ipvs(net); ip_vs_conn_net_cleanup(ipvs); ip_vs_app_net_cleanup(ipvs); ip_vs_protocol_net_cleanup(ipvs); ip_vs_control_net_cleanup(ipvs); ip_vs_estimator_net_cleanup(ipvs); IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen); net->ipvs = NULL; } } static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list) { struct netns_ipvs *ipvs; struct net *net; list_for_each_entry(net, net_list, exit_list) { ipvs = net_ipvs(net); ip_vs_unregister_hooks(ipvs, AF_INET); ip_vs_unregister_hooks(ipvs, AF_INET6); ipvs->enable = 0; /* Disable packet reception */ smp_wmb(); ip_vs_sync_net_cleanup(ipvs); } } static struct pernet_operations ipvs_core_ops = { .init = __ip_vs_init, .exit_batch = __ip_vs_cleanup_batch, .id = &ip_vs_net_id, .size = sizeof(struct netns_ipvs), }; static struct pernet_operations ipvs_core_dev_ops = { .exit_batch = __ip_vs_dev_cleanup_batch, }; /* * Initialize IP Virtual Server */ static int __init ip_vs_init(void) { int ret; ret = ip_vs_control_init(); if (ret < 0) { pr_err("can't setup control.\n"); goto exit; } ip_vs_protocol_init(); ret = ip_vs_conn_init(); if (ret < 0) { pr_err("can't setup connection table.\n"); goto cleanup_protocol; } ret = register_pernet_subsys(&ipvs_core_ops); /* Alloc ip_vs struct */ if (ret < 0) goto cleanup_conn; ret = register_pernet_device(&ipvs_core_dev_ops); if (ret < 0) goto cleanup_sub; ret = ip_vs_register_nl_ioctl(); if (ret < 0) { pr_err("can't register netlink/ioctl.\n"); goto cleanup_dev; } pr_info("ipvs loaded.\n"); return ret; cleanup_dev: unregister_pernet_device(&ipvs_core_dev_ops); cleanup_sub: unregister_pernet_subsys(&ipvs_core_ops); cleanup_conn: ip_vs_conn_cleanup(); cleanup_protocol: ip_vs_protocol_cleanup(); ip_vs_control_cleanup(); exit: return ret; } static void __exit ip_vs_cleanup(void) { ip_vs_unregister_nl_ioctl(); unregister_pernet_device(&ipvs_core_dev_ops); unregister_pernet_subsys(&ipvs_core_ops); /* free ip_vs struct */ ip_vs_conn_cleanup(); ip_vs_protocol_cleanup(); ip_vs_control_cleanup(); /* common rcu_barrier() used by: * - ip_vs_control_cleanup() */ rcu_barrier(); pr_info("ipvs unloaded.\n"); } module_init(ip_vs_init); module_exit(ip_vs_cleanup); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("IP Virtual Server"); |
13 1 1 1 1 1 1 1 1 1 1 1 1 5 1100 1101 286 286 285 14 9 5 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 | // SPDX-License-Identifier: GPL-2.0 /* * Shared Memory Communications over RDMA (SMC-R) and RoCE * * IB infrastructure: * Establish SMC-R as an Infiniband Client to be notified about added and * removed IB devices of type RDMA. * Determine device and port characteristics for these IB devices. * * Copyright IBM Corp. 2016 * * Author(s): Ursula Braun <ubraun@linux.vnet.ibm.com> */ #include <linux/etherdevice.h> #include <linux/if_vlan.h> #include <linux/random.h> #include <linux/workqueue.h> #include <linux/scatterlist.h> #include <linux/wait.h> #include <linux/mutex.h> #include <linux/inetdevice.h> #include <rdma/ib_verbs.h> #include <rdma/ib_cache.h> #include "smc_pnet.h" #include "smc_ib.h" #include "smc_core.h" #include "smc_wr.h" #include "smc.h" #include "smc_netlink.h" #define SMC_MAX_CQE 32766 /* max. # of completion queue elements */ #define SMC_QP_MIN_RNR_TIMER 5 #define SMC_QP_TIMEOUT 15 /* 4096 * 2 ** timeout usec */ #define SMC_QP_RETRY_CNT 7 /* 7: infinite */ #define SMC_QP_RNR_RETRY 7 /* 7: infinite */ struct smc_ib_devices smc_ib_devices = { /* smc-registered ib devices */ .mutex = __MUTEX_INITIALIZER(smc_ib_devices.mutex), .list = LIST_HEAD_INIT(smc_ib_devices.list), }; u8 local_systemid[SMC_SYSTEMID_LEN]; /* unique system identifier */ static int smc_ib_modify_qp_init(struct smc_link *lnk) { struct ib_qp_attr qp_attr; memset(&qp_attr, 0, sizeof(qp_attr)); qp_attr.qp_state = IB_QPS_INIT; qp_attr.pkey_index = 0; qp_attr.port_num = lnk->ibport; qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; return ib_modify_qp(lnk->roce_qp, &qp_attr, IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_ACCESS_FLAGS | IB_QP_PORT); } static int smc_ib_modify_qp_rtr(struct smc_link *lnk) { enum ib_qp_attr_mask qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB_QP_RQ_PSN | IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER; struct ib_qp_attr qp_attr; u8 hop_lim = 1; memset(&qp_attr, 0, sizeof(qp_attr)); qp_attr.qp_state = IB_QPS_RTR; qp_attr.path_mtu = min(lnk->path_mtu, lnk->peer_mtu); qp_attr.ah_attr.type = RDMA_AH_ATTR_TYPE_ROCE; rdma_ah_set_port_num(&qp_attr.ah_attr, lnk->ibport); if (lnk->lgr->smc_version == SMC_V2 && lnk->lgr->uses_gateway) hop_lim = IPV6_DEFAULT_HOPLIMIT; rdma_ah_set_grh(&qp_attr.ah_attr, NULL, 0, lnk->sgid_index, hop_lim, 0); rdma_ah_set_dgid_raw(&qp_attr.ah_attr, lnk->peer_gid); if (lnk->lgr->smc_version == SMC_V2 && lnk->lgr->uses_gateway) memcpy(&qp_attr.ah_attr.roce.dmac, lnk->lgr->nexthop_mac, sizeof(lnk->lgr->nexthop_mac)); else memcpy(&qp_attr.ah_attr.roce.dmac, lnk->peer_mac, sizeof(lnk->peer_mac)); qp_attr.dest_qp_num = lnk->peer_qpn; qp_attr.rq_psn = lnk->peer_psn; /* starting receive packet seq # */ qp_attr.max_dest_rd_atomic = 1; /* max # of resources for incoming * requests */ qp_attr.min_rnr_timer = SMC_QP_MIN_RNR_TIMER; return ib_modify_qp(lnk->roce_qp, &qp_attr, qp_attr_mask); } int smc_ib_modify_qp_rts(struct smc_link *lnk) { struct ib_qp_attr qp_attr; memset(&qp_attr, 0, sizeof(qp_attr)); qp_attr.qp_state = IB_QPS_RTS; qp_attr.timeout = SMC_QP_TIMEOUT; /* local ack timeout */ qp_attr.retry_cnt = SMC_QP_RETRY_CNT; /* retry count */ qp_attr.rnr_retry = SMC_QP_RNR_RETRY; /* RNR retries, 7=infinite */ qp_attr.sq_psn = lnk->psn_initial; /* starting send packet seq # */ qp_attr.max_rd_atomic = 1; /* # of outstanding RDMA reads and * atomic ops allowed */ return ib_modify_qp(lnk->roce_qp, &qp_attr, IB_QP_STATE | IB_QP_TIMEOUT | IB_QP_RETRY_CNT | IB_QP_SQ_PSN | IB_QP_RNR_RETRY | IB_QP_MAX_QP_RD_ATOMIC); } int smc_ib_modify_qp_error(struct smc_link *lnk) { struct ib_qp_attr qp_attr; memset(&qp_attr, 0, sizeof(qp_attr)); qp_attr.qp_state = IB_QPS_ERR; return ib_modify_qp(lnk->roce_qp, &qp_attr, IB_QP_STATE); } int smc_ib_ready_link(struct smc_link *lnk) { struct smc_link_group *lgr = smc_get_lgr(lnk); int rc = 0; rc = smc_ib_modify_qp_init(lnk); if (rc) goto out; rc = smc_ib_modify_qp_rtr(lnk); if (rc) goto out; smc_wr_remember_qp_attr(lnk); rc = ib_req_notify_cq(lnk->smcibdev->roce_cq_recv, IB_CQ_SOLICITED_MASK); if (rc) goto out; rc = smc_wr_rx_post_init(lnk); if (rc) goto out; smc_wr_remember_qp_attr(lnk); if (lgr->role == SMC_SERV) { rc = smc_ib_modify_qp_rts(lnk); if (rc) goto out; smc_wr_remember_qp_attr(lnk); } out: return rc; } static int smc_ib_fill_mac(struct smc_ib_device *smcibdev, u8 ibport) { const struct ib_gid_attr *attr; int rc; attr = rdma_get_gid_attr(smcibdev->ibdev, ibport, 0); if (IS_ERR(attr)) return -ENODEV; rc = rdma_read_gid_l2_fields(attr, NULL, smcibdev->mac[ibport - 1]); rdma_put_gid_attr(attr); return rc; } /* Create an identifier unique for this instance of SMC-R. * The MAC-address of the first active registered IB device * plus a random 2-byte number is used to create this identifier. * This name is delivered to the peer during connection initialization. */ static inline void smc_ib_define_local_systemid(struct smc_ib_device *smcibdev, u8 ibport) { memcpy(&local_systemid[2], &smcibdev->mac[ibport - 1], sizeof(smcibdev->mac[ibport - 1])); } bool smc_ib_is_valid_local_systemid(void) { return !is_zero_ether_addr(&local_systemid[2]); } static void smc_ib_init_local_systemid(void) { get_random_bytes(&local_systemid[0], 2); } bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) { return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; } int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, u8 nexthop_mac[], u8 *uses_gateway) { struct neighbour *neigh = NULL; struct rtable *rt = NULL; struct flowi4 fl4 = { .saddr = saddr, .daddr = daddr }; if (daddr == cpu_to_be32(INADDR_NONE)) goto out; rt = ip_route_output_flow(net, &fl4, NULL); if (IS_ERR(rt)) goto out; if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) goto out_rt; neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); if (!neigh) goto out_rt; memcpy(nexthop_mac, neigh->ha, ETH_ALEN); *uses_gateway = rt->rt_uses_gateway; neigh_release(neigh); ip_rt_put(rt); return 0; out_rt: ip_rt_put(rt); out: return -ENOENT; } static int smc_ib_determine_gid_rcu(const struct net_device *ndev, const struct ib_gid_attr *attr, u8 gid[], u8 *sgid_index, struct smc_init_info_smcrv2 *smcrv2) { if (!smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE) { if (gid) memcpy(gid, &attr->gid, SMC_GID_SIZE); if (sgid_index) *sgid_index = attr->index; return 0; } if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { struct in_device *in_dev = __in_dev_get_rcu(ndev); struct net *net = dev_net(ndev); const struct in_ifaddr *ifa; bool subnet_match = false; if (!in_dev) goto out; in_dev_for_each_ifa_rcu(ifa, in_dev) { if (!inet_ifa_match(smcrv2->saddr, ifa)) continue; subnet_match = true; break; } if (!subnet_match) goto out; if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, smcrv2->daddr, smcrv2->nexthop_mac, &smcrv2->uses_gateway)) goto out; if (gid) memcpy(gid, &attr->gid, SMC_GID_SIZE); if (sgid_index) *sgid_index = attr->index; return 0; } out: return -ENODEV; } /* determine the gid for an ib-device port and vlan id */ int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, unsigned short vlan_id, u8 gid[], u8 *sgid_index, struct smc_init_info_smcrv2 *smcrv2) { const struct ib_gid_attr *attr; const struct net_device *ndev; int i; for (i = 0; i < smcibdev->pattr[ibport - 1].gid_tbl_len; i++) { attr = rdma_get_gid_attr(smcibdev->ibdev, ibport, i); if (IS_ERR(attr)) continue; rcu_read_lock(); ndev = rdma_read_gid_attr_ndev_rcu(attr); if (!IS_ERR(ndev) && ((!vlan_id && !is_vlan_dev(ndev)) || (vlan_id && is_vlan_dev(ndev) && vlan_dev_vlan_id(ndev) == vlan_id))) { if (!smc_ib_determine_gid_rcu(ndev, attr, gid, sgid_index, smcrv2)) { rcu_read_unlock(); rdma_put_gid_attr(attr); return 0; } } rcu_read_unlock(); rdma_put_gid_attr(attr); } return -ENODEV; } /* check if gid is still defined on smcibdev */ static bool smc_ib_check_link_gid(u8 gid[SMC_GID_SIZE], bool smcrv2, struct smc_ib_device *smcibdev, u8 ibport) { const struct ib_gid_attr *attr; bool rc = false; int i; for (i = 0; !rc && i < smcibdev->pattr[ibport - 1].gid_tbl_len; i++) { attr = rdma_get_gid_attr(smcibdev->ibdev, ibport, i); if (IS_ERR(attr)) continue; rcu_read_lock(); if ((!smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE) || (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && !(ipv6_addr_type((const struct in6_addr *)&attr->gid) & IPV6_ADDR_LINKLOCAL))) if (!memcmp(gid, &attr->gid, SMC_GID_SIZE)) rc = true; rcu_read_unlock(); rdma_put_gid_attr(attr); } return rc; } /* check all links if the gid is still defined on smcibdev */ static void smc_ib_gid_check(struct smc_ib_device *smcibdev, u8 ibport) { struct smc_link_group *lgr; int i; spin_lock_bh(&smc_lgr_list.lock); list_for_each_entry(lgr, &smc_lgr_list.list, list) { if (strncmp(smcibdev->pnetid[ibport - 1], lgr->pnet_id, SMC_MAX_PNETID_LEN)) continue; /* lgr is not affected */ if (list_empty(&lgr->list)) continue; for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (lgr->lnk[i].state == SMC_LNK_UNUSED || lgr->lnk[i].smcibdev != smcibdev) continue; if (!smc_ib_check_link_gid(lgr->lnk[i].gid, lgr->smc_version == SMC_V2, smcibdev, ibport)) smcr_port_err(smcibdev, ibport); } } spin_unlock_bh(&smc_lgr_list.lock); } static int smc_ib_remember_port_attr(struct smc_ib_device *smcibdev, u8 ibport) { int rc; memset(&smcibdev->pattr[ibport - 1], 0, sizeof(smcibdev->pattr[ibport - 1])); rc = ib_query_port(smcibdev->ibdev, ibport, &smcibdev->pattr[ibport - 1]); if (rc) goto out; /* the SMC protocol requires specification of the RoCE MAC address */ rc = smc_ib_fill_mac(smcibdev, ibport); if (rc) goto out; if (!smc_ib_is_valid_local_systemid() && smc_ib_port_active(smcibdev, ibport)) /* create unique system identifier */ smc_ib_define_local_systemid(smcibdev, ibport); out: return rc; } /* process context wrapper for might_sleep smc_ib_remember_port_attr */ static void smc_ib_port_event_work(struct work_struct *work) { struct smc_ib_device *smcibdev = container_of( work, struct smc_ib_device, port_event_work); u8 port_idx; for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { smc_ib_remember_port_attr(smcibdev, port_idx + 1); clear_bit(port_idx, &smcibdev->port_event_mask); if (!smc_ib_port_active(smcibdev, port_idx + 1)) { set_bit(port_idx, smcibdev->ports_going_away); smcr_port_err(smcibdev, port_idx + 1); } else { clear_bit(port_idx, smcibdev->ports_going_away); smcr_port_add(smcibdev, port_idx + 1); smc_ib_gid_check(smcibdev, port_idx + 1); } } } /* can be called in IRQ context */ static void smc_ib_global_event_handler(struct ib_event_handler *handler, struct ib_event *ibevent) { struct smc_ib_device *smcibdev; bool schedule = false; u8 port_idx; smcibdev = container_of(handler, struct smc_ib_device, event_handler); switch (ibevent->event) { case IB_EVENT_DEVICE_FATAL: /* terminate all ports on device */ for (port_idx = 0; port_idx < SMC_MAX_PORTS; port_idx++) { set_bit(port_idx, &smcibdev->port_event_mask); if (!test_and_set_bit(port_idx, smcibdev->ports_going_away)) schedule = true; } if (schedule) schedule_work(&smcibdev->port_event_work); break; case IB_EVENT_PORT_ACTIVE: port_idx = ibevent->element.port_num - 1; if (port_idx >= SMC_MAX_PORTS) break; set_bit(port_idx, &smcibdev->port_event_mask); if (test_and_clear_bit(port_idx, smcibdev->ports_going_away)) schedule_work(&smcibdev->port_event_work); break; case IB_EVENT_PORT_ERR: port_idx = ibevent->element.port_num - 1; if (port_idx >= SMC_MAX_PORTS) break; set_bit(port_idx, &smcibdev->port_event_mask); if (!test_and_set_bit(port_idx, smcibdev->ports_going_away)) schedule_work(&smcibdev->port_event_work); break; case IB_EVENT_GID_CHANGE: port_idx = ibevent->element.port_num - 1; if (port_idx >= SMC_MAX_PORTS) break; set_bit(port_idx, &smcibdev->port_event_mask); schedule_work(&smcibdev->port_event_work); break; default: break; } } void smc_ib_dealloc_protection_domain(struct smc_link *lnk) { if (lnk->roce_pd) ib_dealloc_pd(lnk->roce_pd); lnk->roce_pd = NULL; } int smc_ib_create_protection_domain(struct smc_link *lnk) { int rc; lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev, 0); rc = PTR_ERR_OR_ZERO(lnk->roce_pd); if (IS_ERR(lnk->roce_pd)) lnk->roce_pd = NULL; return rc; } static bool smcr_diag_is_dev_critical(struct smc_lgr_list *smc_lgr, struct smc_ib_device *smcibdev) { struct smc_link_group *lgr; bool rc = false; int i; spin_lock_bh(&smc_lgr->lock); list_for_each_entry(lgr, &smc_lgr->list, list) { if (lgr->is_smcd) continue; for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (lgr->lnk[i].state == SMC_LNK_UNUSED || lgr->lnk[i].smcibdev != smcibdev) continue; if (lgr->type == SMC_LGR_SINGLE || lgr->type == SMC_LGR_ASYMMETRIC_LOCAL) { rc = true; goto out; } } } out: spin_unlock_bh(&smc_lgr->lock); return rc; } static int smc_nl_handle_dev_port(struct sk_buff *skb, struct ib_device *ibdev, struct smc_ib_device *smcibdev, int port) { char smc_pnet[SMC_MAX_PNETID_LEN + 1]; struct nlattr *port_attrs; unsigned char port_state; int lnk_count = 0; port_attrs = nla_nest_start(skb, SMC_NLA_DEV_PORT + port); if (!port_attrs) goto errout; if (nla_put_u8(skb, SMC_NLA_DEV_PORT_PNET_USR, smcibdev->pnetid_by_user[port])) goto errattr; memcpy(smc_pnet, &smcibdev->pnetid[port], SMC_MAX_PNETID_LEN); smc_pnet[SMC_MAX_PNETID_LEN] = 0; if (nla_put_string(skb, SMC_NLA_DEV_PORT_PNETID, smc_pnet)) goto errattr; if (nla_put_u32(skb, SMC_NLA_DEV_PORT_NETDEV, smcibdev->ndev_ifidx[port])) goto errattr; if (nla_put_u8(skb, SMC_NLA_DEV_PORT_VALID, 1)) goto errattr; port_state = smc_ib_port_active(smcibdev, port + 1); if (nla_put_u8(skb, SMC_NLA_DEV_PORT_STATE, port_state)) goto errattr; lnk_count = atomic_read(&smcibdev->lnk_cnt_by_port[port]); if (nla_put_u32(skb, SMC_NLA_DEV_PORT_LNK_CNT, lnk_count)) goto errattr; nla_nest_end(skb, port_attrs); return 0; errattr: nla_nest_cancel(skb, port_attrs); errout: return -EMSGSIZE; } static bool smc_nl_handle_pci_values(const struct smc_pci_dev *smc_pci_dev, struct sk_buff *skb) { if (nla_put_u32(skb, SMC_NLA_DEV_PCI_FID, smc_pci_dev->pci_fid)) return false; if (nla_put_u16(skb, SMC_NLA_DEV_PCI_CHID, smc_pci_dev->pci_pchid)) return false; if (nla_put_u16(skb, SMC_NLA_DEV_PCI_VENDOR, smc_pci_dev->pci_vendor)) return false; if (nla_put_u16(skb, SMC_NLA_DEV_PCI_DEVICE, smc_pci_dev->pci_device)) return false; if (nla_put_string(skb, SMC_NLA_DEV_PCI_ID, smc_pci_dev->pci_id)) return false; return true; } static int smc_nl_handle_smcr_dev(struct smc_ib_device *smcibdev, struct sk_buff *skb, struct netlink_callback *cb) { char smc_ibname[IB_DEVICE_NAME_MAX]; struct smc_pci_dev smc_pci_dev; struct pci_dev *pci_dev; unsigned char is_crit; struct nlattr *attrs; void *nlh; int i; nlh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, &smc_gen_nl_family, NLM_F_MULTI, SMC_NETLINK_GET_DEV_SMCR); if (!nlh) goto errmsg; attrs = nla_nest_start(skb, SMC_GEN_DEV_SMCR); if (!attrs) goto errout; is_crit = smcr_diag_is_dev_critical(&smc_lgr_list, smcibdev); if (nla_put_u8(skb, SMC_NLA_DEV_IS_CRIT, is_crit)) goto errattr; if (smcibdev->ibdev->dev.parent) { memset(&smc_pci_dev, 0, sizeof(smc_pci_dev)); pci_dev = to_pci_dev(smcibdev->ibdev->dev.parent); smc_set_pci_values(pci_dev, &smc_pci_dev); if (!smc_nl_handle_pci_values(&smc_pci_dev, skb)) goto errattr; } snprintf(smc_ibname, sizeof(smc_ibname), "%s", smcibdev->ibdev->name); if (nla_put_string(skb, SMC_NLA_DEV_IB_NAME, smc_ibname)) goto errattr; for (i = 1; i <= SMC_MAX_PORTS; i++) { if (!rdma_is_port_valid(smcibdev->ibdev, i)) continue; if (smc_nl_handle_dev_port(skb, smcibdev->ibdev, smcibdev, i - 1)) goto errattr; } nla_nest_end(skb, attrs); genlmsg_end(skb, nlh); return 0; errattr: nla_nest_cancel(skb, attrs); errout: genlmsg_cancel(skb, nlh); errmsg: return -EMSGSIZE; } static void smc_nl_prep_smcr_dev(struct smc_ib_devices *dev_list, struct sk_buff *skb, struct netlink_callback *cb) { struct smc_nl_dmp_ctx *cb_ctx = smc_nl_dmp_ctx(cb); struct smc_ib_device *smcibdev; int snum = cb_ctx->pos[0]; int num = 0; mutex_lock(&dev_list->mutex); list_for_each_entry(smcibdev, &dev_list->list, list) { if (num < snum) goto next; if (smc_nl_handle_smcr_dev(smcibdev, skb, cb)) goto errout; next: num++; } errout: mutex_unlock(&dev_list->mutex); cb_ctx->pos[0] = num; } int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb) { smc_nl_prep_smcr_dev(&smc_ib_devices, skb, cb); return skb->len; } static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv) { struct smc_link *lnk = (struct smc_link *)priv; struct smc_ib_device *smcibdev = lnk->smcibdev; u8 port_idx; switch (ibevent->event) { case IB_EVENT_QP_FATAL: case IB_EVENT_QP_ACCESS_ERR: port_idx = ibevent->element.qp->port - 1; if (port_idx >= SMC_MAX_PORTS) break; set_bit(port_idx, &smcibdev->port_event_mask); if (!test_and_set_bit(port_idx, smcibdev->ports_going_away)) schedule_work(&smcibdev->port_event_work); break; default: break; } } void smc_ib_destroy_queue_pair(struct smc_link *lnk) { if (lnk->roce_qp) ib_destroy_qp(lnk->roce_qp); lnk->roce_qp = NULL; } /* create a queue pair within the protection domain for a link */ int smc_ib_create_queue_pair(struct smc_link *lnk) { struct ib_qp_init_attr qp_attr = { .event_handler = smc_ib_qp_event_handler, .qp_context = lnk, .send_cq = lnk->smcibdev->roce_cq_send, .recv_cq = lnk->smcibdev->roce_cq_recv, .srq = NULL, .cap = { /* include unsolicited rdma_writes as well, * there are max. 2 RDMA_WRITE per 1 WR_SEND */ .max_send_wr = SMC_WR_BUF_CNT * 3, .max_recv_wr = SMC_WR_BUF_CNT * 3, .max_send_sge = SMC_IB_MAX_SEND_SGE, .max_recv_sge = lnk->wr_rx_sge_cnt, .max_inline_data = 0, }, .sq_sig_type = IB_SIGNAL_REQ_WR, .qp_type = IB_QPT_RC, }; int rc; lnk->roce_qp = ib_create_qp(lnk->roce_pd, &qp_attr); rc = PTR_ERR_OR_ZERO(lnk->roce_qp); if (IS_ERR(lnk->roce_qp)) lnk->roce_qp = NULL; else smc_wr_remember_qp_attr(lnk); return rc; } void smc_ib_put_memory_region(struct ib_mr *mr) { ib_dereg_mr(mr); } static int smc_ib_map_mr_sg(struct smc_buf_desc *buf_slot, u8 link_idx) { unsigned int offset = 0; int sg_num; /* map the largest prefix of a dma mapped SG list */ sg_num = ib_map_mr_sg(buf_slot->mr[link_idx], buf_slot->sgt[link_idx].sgl, buf_slot->sgt[link_idx].orig_nents, &offset, PAGE_SIZE); return sg_num; } /* Allocate a memory region and map the dma mapped SG list of buf_slot */ int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags, struct smc_buf_desc *buf_slot, u8 link_idx) { if (buf_slot->mr[link_idx]) return 0; /* already done */ buf_slot->mr[link_idx] = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, 1 << buf_slot->order); if (IS_ERR(buf_slot->mr[link_idx])) { int rc; rc = PTR_ERR(buf_slot->mr[link_idx]); buf_slot->mr[link_idx] = NULL; return rc; } if (smc_ib_map_mr_sg(buf_slot, link_idx) != buf_slot->sgt[link_idx].orig_nents) return -EINVAL; return 0; } bool smc_ib_is_sg_need_sync(struct smc_link *lnk, struct smc_buf_desc *buf_slot) { struct scatterlist *sg; unsigned int i; bool ret = false; /* for now there is just one DMA address */ for_each_sg(buf_slot->sgt[lnk->link_idx].sgl, sg, buf_slot->sgt[lnk->link_idx].nents, i) { if (!sg_dma_len(sg)) break; if (dma_need_sync(lnk->smcibdev->ibdev->dma_device, sg_dma_address(sg))) { ret = true; goto out; } } out: return ret; } /* synchronize buffer usage for cpu access */ void smc_ib_sync_sg_for_cpu(struct smc_link *lnk, struct smc_buf_desc *buf_slot, enum dma_data_direction data_direction) { struct scatterlist *sg; unsigned int i; if (!(buf_slot->is_dma_need_sync & (1U << lnk->link_idx))) return; /* for now there is just one DMA address */ for_each_sg(buf_slot->sgt[lnk->link_idx].sgl, sg, buf_slot->sgt[lnk->link_idx].nents, i) { if (!sg_dma_len(sg)) break; ib_dma_sync_single_for_cpu(lnk->smcibdev->ibdev, sg_dma_address(sg), sg_dma_len(sg), data_direction); } } /* synchronize buffer usage for device access */ void smc_ib_sync_sg_for_device(struct smc_link *lnk, struct smc_buf_desc *buf_slot, enum dma_data_direction data_direction) { struct scatterlist *sg; unsigned int i; if (!(buf_slot->is_dma_need_sync & (1U << lnk->link_idx))) return; /* for now there is just one DMA address */ for_each_sg(buf_slot->sgt[lnk->link_idx].sgl, sg, buf_slot->sgt[lnk->link_idx].nents, i) { if (!sg_dma_len(sg)) break; ib_dma_sync_single_for_device(lnk->smcibdev->ibdev, sg_dma_address(sg), sg_dma_len(sg), data_direction); } } /* Map a new TX or RX buffer SG-table to DMA */ int smc_ib_buf_map_sg(struct smc_link *lnk, struct smc_buf_desc *buf_slot, enum dma_data_direction data_direction) { int mapped_nents; mapped_nents = ib_dma_map_sg(lnk->smcibdev->ibdev, buf_slot->sgt[lnk->link_idx].sgl, buf_slot->sgt[lnk->link_idx].orig_nents, data_direction); if (!mapped_nents) return -ENOMEM; return mapped_nents; } void smc_ib_buf_unmap_sg(struct smc_link *lnk, struct smc_buf_desc *buf_slot, enum dma_data_direction data_direction) { if (!buf_slot->sgt[lnk->link_idx].sgl->dma_address) return; /* already unmapped */ ib_dma_unmap_sg(lnk->smcibdev->ibdev, buf_slot->sgt[lnk->link_idx].sgl, buf_slot->sgt[lnk->link_idx].orig_nents, data_direction); buf_slot->sgt[lnk->link_idx].sgl->dma_address = 0; } long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev) { struct ib_cq_init_attr cqattr = { .cqe = SMC_MAX_CQE, .comp_vector = 0 }; int cqe_size_order, smc_order; long rc; mutex_lock(&smcibdev->mutex); rc = 0; if (smcibdev->initialized) goto out; /* the calculated number of cq entries fits to mlx5 cq allocation */ cqe_size_order = cache_line_size() == 128 ? 7 : 6; smc_order = MAX_PAGE_ORDER - cqe_size_order; if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE) cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2; smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev, smc_wr_tx_cq_handler, NULL, smcibdev, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send); if (IS_ERR(smcibdev->roce_cq_send)) { smcibdev->roce_cq_send = NULL; goto out; } smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev, smc_wr_rx_cq_handler, NULL, smcibdev, &cqattr); rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv); if (IS_ERR(smcibdev->roce_cq_recv)) { smcibdev->roce_cq_recv = NULL; goto err; } smc_wr_add_dev(smcibdev); smcibdev->initialized = 1; goto out; err: ib_destroy_cq(smcibdev->roce_cq_send); out: mutex_unlock(&smcibdev->mutex); return rc; } static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev) { mutex_lock(&smcibdev->mutex); if (!smcibdev->initialized) goto out; smcibdev->initialized = 0; ib_destroy_cq(smcibdev->roce_cq_recv); ib_destroy_cq(smcibdev->roce_cq_send); smc_wr_remove_dev(smcibdev); out: mutex_unlock(&smcibdev->mutex); } static struct ib_client smc_ib_client; static void smc_copy_netdev_ifindex(struct smc_ib_device *smcibdev, int port) { struct ib_device *ibdev = smcibdev->ibdev; struct net_device *ndev; ndev = ib_device_get_netdev(ibdev, port + 1); if (ndev) { smcibdev->ndev_ifidx[port] = ndev->ifindex; dev_put(ndev); } } void smc_ib_ndev_change(struct net_device *ndev, unsigned long event) { struct smc_ib_device *smcibdev; struct ib_device *libdev; struct net_device *lndev; u8 port_cnt; int i; mutex_lock(&smc_ib_devices.mutex); list_for_each_entry(smcibdev, &smc_ib_devices.list, list) { port_cnt = smcibdev->ibdev->phys_port_cnt; for (i = 0; i < min_t(size_t, port_cnt, SMC_MAX_PORTS); i++) { libdev = smcibdev->ibdev; lndev = ib_device_get_netdev(libdev, i + 1); dev_put(lndev); if (lndev != ndev) continue; if (event == NETDEV_REGISTER) smcibdev->ndev_ifidx[i] = ndev->ifindex; if (event == NETDEV_UNREGISTER) smcibdev->ndev_ifidx[i] = 0; } } mutex_unlock(&smc_ib_devices.mutex); } /* callback function for ib_register_client() */ static int smc_ib_add_dev(struct ib_device *ibdev) { struct smc_ib_device *smcibdev; u8 port_cnt; int i; if (ibdev->node_type != RDMA_NODE_IB_CA) return -EOPNOTSUPP; smcibdev = kzalloc(sizeof(*smcibdev), GFP_KERNEL); if (!smcibdev) return -ENOMEM; smcibdev->ibdev = ibdev; INIT_WORK(&smcibdev->port_event_work, smc_ib_port_event_work); atomic_set(&smcibdev->lnk_cnt, 0); init_waitqueue_head(&smcibdev->lnks_deleted); mutex_init(&smcibdev->mutex); mutex_lock(&smc_ib_devices.mutex); list_add_tail(&smcibdev->list, &smc_ib_devices.list); mutex_unlock(&smc_ib_devices.mutex); ib_set_client_data(ibdev, &smc_ib_client, smcibdev); INIT_IB_EVENT_HANDLER(&smcibdev->event_handler, smcibdev->ibdev, smc_ib_global_event_handler); ib_register_event_handler(&smcibdev->event_handler); /* trigger reading of the port attributes */ port_cnt = smcibdev->ibdev->phys_port_cnt; pr_warn_ratelimited("smc: adding ib device %s with port count %d\n", smcibdev->ibdev->name, port_cnt); for (i = 0; i < min_t(size_t, port_cnt, SMC_MAX_PORTS); i++) { set_bit(i, &smcibdev->port_event_mask); /* determine pnetids of the port */ if (smc_pnetid_by_dev_port(ibdev->dev.parent, i, smcibdev->pnetid[i])) smc_pnetid_by_table_ib(smcibdev, i + 1); smc_copy_netdev_ifindex(smcibdev, i); pr_warn_ratelimited("smc: ib device %s port %d has pnetid " "%.16s%s\n", smcibdev->ibdev->name, i + 1, smcibdev->pnetid[i], smcibdev->pnetid_by_user[i] ? " (user defined)" : ""); } schedule_work(&smcibdev->port_event_work); return 0; } /* callback function for ib_unregister_client() */ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) { struct smc_ib_device *smcibdev = client_data; mutex_lock(&smc_ib_devices.mutex); list_del_init(&smcibdev->list); /* remove from smc_ib_devices */ mutex_unlock(&smc_ib_devices.mutex); pr_warn_ratelimited("smc: removing ib device %s\n", smcibdev->ibdev->name); smc_smcr_terminate_all(smcibdev); smc_ib_cleanup_per_ibdev(smcibdev); ib_unregister_event_handler(&smcibdev->event_handler); cancel_work_sync(&smcibdev->port_event_work); kfree(smcibdev); } static struct ib_client smc_ib_client = { .name = "smc_ib", .add = smc_ib_add_dev, .remove = smc_ib_remove_dev, }; int __init smc_ib_register_client(void) { smc_ib_init_local_systemid(); return ib_register_client(&smc_ib_client); } void smc_ib_unregister_client(void) { ib_unregister_client(&smc_ib_client); } |
222 2 2 34 36 30 2 32 32 32 32 11 1 31 2 4 28 28 4 24 28 28 28 19 9 14 6 2 3 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 | // SPDX-License-Identifier: GPL-2.0 /* * * Copyright (C) 2019-2021 Paragon Software GmbH, All rights reserved. * */ #include <linux/fs.h> #include "debug.h" #include "ntfs.h" #include "ntfs_fs.h" /* * al_is_valid_le * * Return: True if @le is valid. */ static inline bool al_is_valid_le(const struct ntfs_inode *ni, struct ATTR_LIST_ENTRY *le) { if (!le || !ni->attr_list.le || !ni->attr_list.size) return false; return PtrOffset(ni->attr_list.le, le) + le16_to_cpu(le->size) <= ni->attr_list.size; } void al_destroy(struct ntfs_inode *ni) { run_close(&ni->attr_list.run); kvfree(ni->attr_list.le); ni->attr_list.le = NULL; ni->attr_list.size = 0; ni->attr_list.dirty = false; } /* * ntfs_load_attr_list * * This method makes sure that the ATTRIB list, if present, * has been properly set up. */ int ntfs_load_attr_list(struct ntfs_inode *ni, struct ATTRIB *attr) { int err; size_t lsize; void *le = NULL; if (ni->attr_list.size) return 0; if (!attr->non_res) { lsize = le32_to_cpu(attr->res.data_size); /* attr is resident: lsize < record_size (1K or 4K) */ le = kvmalloc(al_aligned(lsize), GFP_KERNEL); if (!le) { err = -ENOMEM; goto out; } memcpy(le, resident_data(attr), lsize); } else if (attr->nres.svcn) { err = -EINVAL; goto out; } else { u16 run_off = le16_to_cpu(attr->nres.run_off); lsize = le64_to_cpu(attr->nres.data_size); run_init(&ni->attr_list.run); if (run_off > le32_to_cpu(attr->size)) { err = -EINVAL; goto out; } err = run_unpack_ex(&ni->attr_list.run, ni->mi.sbi, ni->mi.rno, 0, le64_to_cpu(attr->nres.evcn), 0, Add2Ptr(attr, run_off), le32_to_cpu(attr->size) - run_off); if (err < 0) goto out; /* attr is nonresident. * The worst case: * 1T (2^40) extremely fragmented file. * cluster = 4K (2^12) => 2^28 fragments * 2^9 fragments per one record => 2^19 records * 2^5 bytes of ATTR_LIST_ENTRY per one record => 2^24 bytes. * * the result is 16M bytes per attribute list. * Use kvmalloc to allocate in range [several Kbytes - dozen Mbytes] */ le = kvmalloc(al_aligned(lsize), GFP_KERNEL); if (!le) { err = -ENOMEM; goto out; } err = ntfs_read_run_nb(ni->mi.sbi, &ni->attr_list.run, 0, le, lsize, NULL); if (err) goto out; } ni->attr_list.size = lsize; ni->attr_list.le = le; return 0; out: ni->attr_list.le = le; al_destroy(ni); return err; } /* * al_enumerate * * Return: * * The next list le. * * If @le is NULL then return the first le. */ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni, struct ATTR_LIST_ENTRY *le) { size_t off; u16 sz; const unsigned le_min_size = le_size(0); if (!le) { le = ni->attr_list.le; } else { sz = le16_to_cpu(le->size); if (sz < le_min_size) { /* Impossible 'cause we should not return such le. */ return NULL; } le = Add2Ptr(le, sz); } /* Check boundary. */ off = PtrOffset(ni->attr_list.le, le); if (off + le_min_size > ni->attr_list.size) { /* The regular end of list. */ return NULL; } sz = le16_to_cpu(le->size); /* Check le for errors. */ if (sz < le_min_size || off + sz > ni->attr_list.size || sz < le->name_off + le->name_len * sizeof(short)) { return NULL; } return le; } /* * al_find_le * * Find the first le in the list which matches type, name and VCN. * * Return: NULL if not found. */ struct ATTR_LIST_ENTRY *al_find_le(struct ntfs_inode *ni, struct ATTR_LIST_ENTRY *le, const struct ATTRIB *attr) { CLST svcn = attr_svcn(attr); return al_find_ex(ni, le, attr->type, attr_name(attr), attr->name_len, &svcn); } /* * al_find_ex * * Find the first le in the list which matches type, name and VCN. * * Return: NULL if not found. */ struct ATTR_LIST_ENTRY *al_find_ex(struct ntfs_inode *ni, struct ATTR_LIST_ENTRY *le, enum ATTR_TYPE type, const __le16 *name, u8 name_len, const CLST *vcn) { struct ATTR_LIST_ENTRY *ret = NULL; u32 type_in = le32_to_cpu(type); while ((le = al_enumerate(ni, le))) { u64 le_vcn; int diff = le32_to_cpu(le->type) - type_in; /* List entries are sorted by type, name and VCN. */ if (diff < 0) continue; if (diff > 0) return ret; if (le->name_len != name_len) continue; le_vcn = le64_to_cpu(le->vcn); if (!le_vcn) { /* * Compare entry names only for entry with vcn == 0. */ diff = ntfs_cmp_names(le_name(le), name_len, name, name_len, ni->mi.sbi->upcase, true); if (diff < 0) continue; if (diff > 0) return ret; } if (!vcn) return le; if (*vcn == le_vcn) return le; if (*vcn < le_vcn) return ret; ret = le; } return ret; } /* * al_find_le_to_insert * * Find the first list entry which matches type, name and VCN. */ static struct ATTR_LIST_ENTRY *al_find_le_to_insert(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name, u8 name_len, CLST vcn) { struct ATTR_LIST_ENTRY *le = NULL, *prev; u32 type_in = le32_to_cpu(type); /* List entries are sorted by type, name and VCN. */ while ((le = al_enumerate(ni, prev = le))) { int diff = le32_to_cpu(le->type) - type_in; if (diff < 0) continue; if (diff > 0) return le; if (!le->vcn) { /* * Compare entry names only for entry with vcn == 0. */ diff = ntfs_cmp_names(le_name(le), le->name_len, name, name_len, ni->mi.sbi->upcase, true); if (diff < 0) continue; if (diff > 0) return le; } if (le64_to_cpu(le->vcn) >= vcn) return le; } return prev ? Add2Ptr(prev, le16_to_cpu(prev->size)) : ni->attr_list.le; } /* * al_add_le * * Add an "attribute list entry" to the list. */ int al_add_le(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name, u8 name_len, CLST svcn, __le16 id, const struct MFT_REF *ref, struct ATTR_LIST_ENTRY **new_le) { int err; struct ATTRIB *attr; struct ATTR_LIST_ENTRY *le; size_t off; u16 sz; size_t asize, new_asize, old_size; u64 new_size; typeof(ni->attr_list) *al = &ni->attr_list; /* * Compute the size of the new 'le' */ sz = le_size(name_len); old_size = al->size; new_size = old_size + sz; asize = al_aligned(old_size); new_asize = al_aligned(new_size); /* Scan forward to the point at which the new 'le' should be inserted. */ le = al_find_le_to_insert(ni, type, name, name_len, svcn); off = PtrOffset(al->le, le); if (new_size > asize) { void *ptr = kmalloc(new_asize, GFP_NOFS); if (!ptr) return -ENOMEM; memcpy(ptr, al->le, off); memcpy(Add2Ptr(ptr, off + sz), le, old_size - off); le = Add2Ptr(ptr, off); kvfree(al->le); al->le = ptr; } else { memmove(Add2Ptr(le, sz), le, old_size - off); } *new_le = le; al->size = new_size; le->type = type; le->size = cpu_to_le16(sz); le->name_len = name_len; le->name_off = offsetof(struct ATTR_LIST_ENTRY, name); le->vcn = cpu_to_le64(svcn); le->ref = *ref; le->id = id; memcpy(le->name, name, sizeof(short) * name_len); err = attr_set_size(ni, ATTR_LIST, NULL, 0, &al->run, new_size, &new_size, true, &attr); if (err) { /* Undo memmove above. */ memmove(le, Add2Ptr(le, sz), old_size - off); al->size = old_size; return err; } al->dirty = true; if (attr && attr->non_res) { err = ntfs_sb_write_run(ni->mi.sbi, &al->run, 0, al->le, al->size, 0); if (err) return err; al->dirty = false; } return 0; } /* * al_remove_le - Remove @le from attribute list. */ bool al_remove_le(struct ntfs_inode *ni, struct ATTR_LIST_ENTRY *le) { u16 size; size_t off; typeof(ni->attr_list) *al = &ni->attr_list; if (!al_is_valid_le(ni, le)) return false; /* Save on stack the size of 'le' */ size = le16_to_cpu(le->size); off = PtrOffset(al->le, le); memmove(le, Add2Ptr(le, size), al->size - (off + size)); al->size -= size; al->dirty = true; return true; } int al_update(struct ntfs_inode *ni, int sync) { int err; struct ATTRIB *attr; typeof(ni->attr_list) *al = &ni->attr_list; if (!al->dirty || !al->size) return 0; /* * Attribute list increased on demand in al_add_le. * Attribute list decreased here. */ err = attr_set_size(ni, ATTR_LIST, NULL, 0, &al->run, al->size, NULL, false, &attr); if (err) goto out; if (!attr->non_res) { memcpy(resident_data(attr), al->le, al->size); } else { err = ntfs_sb_write_run(ni->mi.sbi, &al->run, 0, al->le, al->size, sync); if (err) goto out; attr->nres.valid_size = attr->nres.data_size; } ni->mi.dirty = true; al->dirty = false; out: return err; } |
15 15 15 15 15 6 2 2 24 24 24 7 6 7 3 1 2 2 2 7 7 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 | // SPDX-License-Identifier: GPL-2.0 #include "bcachefs.h" #include "acl.h" #include "bkey_methods.h" #include "btree_update.h" #include "extents.h" #include "fs.h" #include "rebalance.h" #include "str_hash.h" #include "xattr.h" #include <linux/dcache.h> #include <linux/posix_acl_xattr.h> #include <linux/xattr.h> static const struct xattr_handler *bch2_xattr_type_to_handler(unsigned); static u64 bch2_xattr_hash(const struct bch_hash_info *info, const struct xattr_search_key *key) { struct bch_str_hash_ctx ctx; bch2_str_hash_init(&ctx, info); bch2_str_hash_update(&ctx, info, &key->type, sizeof(key->type)); bch2_str_hash_update(&ctx, info, key->name.name, key->name.len); return bch2_str_hash_end(&ctx, info); } static u64 xattr_hash_key(const struct bch_hash_info *info, const void *key) { return bch2_xattr_hash(info, key); } static u64 xattr_hash_bkey(const struct bch_hash_info *info, struct bkey_s_c k) { struct bkey_s_c_xattr x = bkey_s_c_to_xattr(k); return bch2_xattr_hash(info, &X_SEARCH(x.v->x_type, x.v->x_name, x.v->x_name_len)); } static bool xattr_cmp_key(struct bkey_s_c _l, const void *_r) { struct bkey_s_c_xattr l = bkey_s_c_to_xattr(_l); const struct xattr_search_key *r = _r; return l.v->x_type != r->type || l.v->x_name_len != r->name.len || memcmp(l.v->x_name, r->name.name, r->name.len); } static bool xattr_cmp_bkey(struct bkey_s_c _l, struct bkey_s_c _r) { struct bkey_s_c_xattr l = bkey_s_c_to_xattr(_l); struct bkey_s_c_xattr r = bkey_s_c_to_xattr(_r); return l.v->x_type != r.v->x_type || l.v->x_name_len != r.v->x_name_len || memcmp(l.v->x_name, r.v->x_name, r.v->x_name_len); } const struct bch_hash_desc bch2_xattr_hash_desc = { .btree_id = BTREE_ID_xattrs, .key_type = KEY_TYPE_xattr, .hash_key = xattr_hash_key, .hash_bkey = xattr_hash_bkey, .cmp_key = xattr_cmp_key, .cmp_bkey = xattr_cmp_bkey, }; int bch2_xattr_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); unsigned val_u64s = xattr_val_u64s(xattr.v->x_name_len, le16_to_cpu(xattr.v->x_val_len)); int ret = 0; bkey_fsck_err_on(bkey_val_u64s(k.k) < val_u64s, c, xattr_val_size_too_small, "value too small (%zu < %u)", bkey_val_u64s(k.k), val_u64s); /* XXX why +4 ? */ val_u64s = xattr_val_u64s(xattr.v->x_name_len, le16_to_cpu(xattr.v->x_val_len) + 4); bkey_fsck_err_on(bkey_val_u64s(k.k) > val_u64s, c, xattr_val_size_too_big, "value too big (%zu > %u)", bkey_val_u64s(k.k), val_u64s); bkey_fsck_err_on(!bch2_xattr_type_to_handler(xattr.v->x_type), c, xattr_invalid_type, "invalid type (%u)", xattr.v->x_type); bkey_fsck_err_on(memchr(xattr.v->x_name, '\0', xattr.v->x_name_len), c, xattr_name_invalid_chars, "xattr name has invalid characters"); fsck_err: return ret; } void bch2_xattr_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) { const struct xattr_handler *handler; struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); handler = bch2_xattr_type_to_handler(xattr.v->x_type); if (handler && handler->prefix) prt_printf(out, "%s", handler->prefix); else if (handler) prt_printf(out, "(type %u)", xattr.v->x_type); else prt_printf(out, "(unknown type %u)", xattr.v->x_type); unsigned name_len = xattr.v->x_name_len; unsigned val_len = le16_to_cpu(xattr.v->x_val_len); unsigned max_name_val_bytes = bkey_val_bytes(xattr.k) - offsetof(struct bch_xattr, x_name); val_len = min_t(int, val_len, max_name_val_bytes - name_len); name_len = min(name_len, max_name_val_bytes); prt_printf(out, "%.*s:%.*s", name_len, xattr.v->x_name, val_len, (char *) xattr_val(xattr.v)); if (xattr.v->x_type == KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS || xattr.v->x_type == KEY_TYPE_XATTR_INDEX_POSIX_ACL_DEFAULT) { prt_char(out, ' '); bch2_acl_to_text(out, xattr_val(xattr.v), le16_to_cpu(xattr.v->x_val_len)); } } static int bch2_xattr_get_trans(struct btree_trans *trans, struct bch_inode_info *inode, const char *name, void *buffer, size_t size, int type) { struct bch_hash_info hash = bch2_hash_info_init(trans->c, &inode->ei_inode); struct xattr_search_key search = X_SEARCH(type, name, strlen(name)); struct btree_iter iter; struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, &hash, inode_inum(inode), &search, 0); int ret = bkey_err(k); if (ret) return ret; struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); ret = le16_to_cpu(xattr.v->x_val_len); if (buffer) { if (ret > size) ret = -ERANGE; else memcpy(buffer, xattr_val(xattr.v), ret); } bch2_trans_iter_exit(trans, &iter); return ret; } int bch2_xattr_set(struct btree_trans *trans, subvol_inum inum, struct bch_inode_unpacked *inode_u, const struct bch_hash_info *hash_info, const char *name, const void *value, size_t size, int type, int flags) { struct bch_fs *c = trans->c; struct btree_iter inode_iter = { NULL }; int ret; ret = bch2_subvol_is_ro_trans(trans, inum.subvol) ?: bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_intent); if (ret) return ret; inode_u->bi_ctime = bch2_current_time(c); ret = bch2_inode_write(trans, &inode_iter, inode_u); bch2_trans_iter_exit(trans, &inode_iter); if (ret) return ret; if (value) { struct bkey_i_xattr *xattr; unsigned namelen = strlen(name); unsigned u64s = BKEY_U64s + xattr_val_u64s(namelen, size); if (u64s > U8_MAX) return -ERANGE; xattr = bch2_trans_kmalloc(trans, u64s * sizeof(u64)); if (IS_ERR(xattr)) return PTR_ERR(xattr); bkey_xattr_init(&xattr->k_i); xattr->k.u64s = u64s; xattr->v.x_type = type; xattr->v.x_name_len = namelen; xattr->v.x_val_len = cpu_to_le16(size); memcpy(xattr->v.x_name, name, namelen); memcpy(xattr_val(&xattr->v), value, size); ret = bch2_hash_set(trans, bch2_xattr_hash_desc, hash_info, inum, &xattr->k_i, (flags & XATTR_CREATE ? STR_HASH_must_create : 0)| (flags & XATTR_REPLACE ? STR_HASH_must_replace : 0)); } else { struct xattr_search_key search = X_SEARCH(type, name, strlen(name)); ret = bch2_hash_delete(trans, bch2_xattr_hash_desc, hash_info, inum, &search); } if (bch2_err_matches(ret, ENOENT)) ret = flags & XATTR_REPLACE ? -ENODATA : 0; return ret; } struct xattr_buf { char *buf; size_t len; size_t used; }; static int __bch2_xattr_emit(const char *prefix, const char *name, size_t name_len, struct xattr_buf *buf) { const size_t prefix_len = strlen(prefix); const size_t total_len = prefix_len + name_len + 1; if (buf->buf) { if (buf->used + total_len > buf->len) return -ERANGE; memcpy(buf->buf + buf->used, prefix, prefix_len); memcpy(buf->buf + buf->used + prefix_len, name, name_len); buf->buf[buf->used + prefix_len + name_len] = '\0'; } buf->used += total_len; return 0; } static inline const char *bch2_xattr_prefix(unsigned type, struct dentry *dentry) { const struct xattr_handler *handler = bch2_xattr_type_to_handler(type); if (!xattr_handler_can_list(handler, dentry)) return NULL; return xattr_prefix(handler); } static int bch2_xattr_emit(struct dentry *dentry, const struct bch_xattr *xattr, struct xattr_buf *buf) { const char *prefix; prefix = bch2_xattr_prefix(xattr->x_type, dentry); if (!prefix) return 0; return __bch2_xattr_emit(prefix, xattr->x_name, xattr->x_name_len, buf); } static int bch2_xattr_list_bcachefs(struct bch_fs *c, struct bch_inode_unpacked *inode, struct xattr_buf *buf, bool all) { const char *prefix = all ? "bcachefs_effective." : "bcachefs."; unsigned id; int ret = 0; u64 v; for (id = 0; id < Inode_opt_nr; id++) { v = bch2_inode_opt_get(inode, id); if (!v) continue; if (!all && !(inode->bi_fields_set & (1 << id))) continue; ret = __bch2_xattr_emit(prefix, bch2_inode_opts[id], strlen(bch2_inode_opts[id]), buf); if (ret) break; } return ret; } ssize_t bch2_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size) { struct bch_fs *c = dentry->d_sb->s_fs_info; struct bch_inode_info *inode = to_bch_ei(dentry->d_inode); struct xattr_buf buf = { .buf = buffer, .len = buffer_size }; u64 offset = 0, inum = inode->ei_inode.bi_inum; int ret = bch2_trans_run(c, for_each_btree_key_in_subvolume_max(trans, iter, BTREE_ID_xattrs, POS(inum, offset), POS(inum, U64_MAX), inode->ei_inum.subvol, 0, k, ({ if (k.k->type != KEY_TYPE_xattr) continue; bch2_xattr_emit(dentry, bkey_s_c_to_xattr(k).v, &buf); }))) ?: bch2_xattr_list_bcachefs(c, &inode->ei_inode, &buf, false) ?: bch2_xattr_list_bcachefs(c, &inode->ei_inode, &buf, true); return ret ? bch2_err_class(ret) : buf.used; } static int bch2_xattr_get_handler(const struct xattr_handler *handler, struct dentry *dentry, struct inode *vinode, const char *name, void *buffer, size_t size) { struct bch_inode_info *inode = to_bch_ei(vinode); struct bch_fs *c = inode->v.i_sb->s_fs_info; int ret = bch2_trans_do(c, bch2_xattr_get_trans(trans, inode, name, buffer, size, handler->flags)); if (ret < 0 && bch2_err_matches(ret, ENOENT)) ret = -ENODATA; return bch2_err_class(ret); } static int bch2_xattr_set_handler(const struct xattr_handler *handler, struct mnt_idmap *idmap, struct dentry *dentry, struct inode *vinode, const char *name, const void *value, size_t size, int flags) { struct bch_inode_info *inode = to_bch_ei(vinode); struct bch_fs *c = inode->v.i_sb->s_fs_info; struct bch_hash_info hash = bch2_hash_info_init(c, &inode->ei_inode); struct bch_inode_unpacked inode_u; int ret; ret = bch2_trans_run(c, commit_do(trans, NULL, NULL, 0, bch2_xattr_set(trans, inode_inum(inode), &inode_u, &hash, name, value, size, handler->flags, flags)) ?: (bch2_inode_update_after_write(trans, inode, &inode_u, ATTR_CTIME), 0)); return bch2_err_class(ret); } static const struct xattr_handler bch_xattr_user_handler = { .prefix = XATTR_USER_PREFIX, .get = bch2_xattr_get_handler, .set = bch2_xattr_set_handler, .flags = KEY_TYPE_XATTR_INDEX_USER, }; static bool bch2_xattr_trusted_list(struct dentry *dentry) { return capable(CAP_SYS_ADMIN); } static const struct xattr_handler bch_xattr_trusted_handler = { .prefix = XATTR_TRUSTED_PREFIX, .list = bch2_xattr_trusted_list, .get = bch2_xattr_get_handler, .set = bch2_xattr_set_handler, .flags = KEY_TYPE_XATTR_INDEX_TRUSTED, }; static const struct xattr_handler bch_xattr_security_handler = { .prefix = XATTR_SECURITY_PREFIX, .get = bch2_xattr_get_handler, .set = bch2_xattr_set_handler, .flags = KEY_TYPE_XATTR_INDEX_SECURITY, }; #ifndef NO_BCACHEFS_FS static int opt_to_inode_opt(int id) { switch (id) { #define x(name, ...) \ case Opt_##name: return Inode_opt_##name; BCH_INODE_OPTS() #undef x default: return -1; } } static int __bch2_xattr_bcachefs_get(const struct xattr_handler *handler, struct dentry *dentry, struct inode *vinode, const char *name, void *buffer, size_t size, bool all) { struct bch_inode_info *inode = to_bch_ei(vinode); struct bch_fs *c = inode->v.i_sb->s_fs_info; struct bch_opts opts = bch2_inode_opts_to_opts(&inode->ei_inode); const struct bch_option *opt; int id, inode_opt_id; struct printbuf out = PRINTBUF; int ret; u64 v; id = bch2_opt_lookup(name); if (id < 0 || !bch2_opt_is_inode_opt(id)) return -EINVAL; inode_opt_id = opt_to_inode_opt(id); if (inode_opt_id < 0) return -EINVAL; opt = bch2_opt_table + id; if (!bch2_opt_defined_by_id(&opts, id)) return -ENODATA; if (!all && !(inode->ei_inode.bi_fields_set & (1 << inode_opt_id))) return -ENODATA; v = bch2_opt_get_by_id(&opts, id); bch2_opt_to_text(&out, c, c->disk_sb.sb, opt, v, 0); ret = out.pos; if (out.allocation_failure) { ret = -ENOMEM; } else if (buffer) { if (out.pos > size) ret = -ERANGE; else memcpy(buffer, out.buf, out.pos); } printbuf_exit(&out); return ret; } static int bch2_xattr_bcachefs_get(const struct xattr_handler *handler, struct dentry *dentry, struct inode *vinode, const char *name, void *buffer, size_t size) { return __bch2_xattr_bcachefs_get(handler, dentry, vinode, name, buffer, size, false); } struct inode_opt_set { int id; u64 v; bool defined; }; static int inode_opt_set_fn(struct btree_trans *trans, struct bch_inode_info *inode, struct bch_inode_unpacked *bi, void *p) { struct inode_opt_set *s = p; if (s->defined) bi->bi_fields_set |= 1U << s->id; else bi->bi_fields_set &= ~(1U << s->id); bch2_inode_opt_set(bi, s->id, s->v); return 0; } static int bch2_xattr_bcachefs_set(const struct xattr_handler *handler, struct mnt_idmap *idmap, struct dentry *dentry, struct inode *vinode, const char *name, const void *value, size_t size, int flags) { struct bch_inode_info *inode = to_bch_ei(vinode); struct bch_fs *c = inode->v.i_sb->s_fs_info; const struct bch_option *opt; char *buf; struct inode_opt_set s; int opt_id, inode_opt_id, ret; opt_id = bch2_opt_lookup(name); if (opt_id < 0) return -EINVAL; opt = bch2_opt_table + opt_id; inode_opt_id = opt_to_inode_opt(opt_id); if (inode_opt_id < 0) return -EINVAL; s.id = inode_opt_id; if (value) { u64 v = 0; buf = kmalloc(size + 1, GFP_KERNEL); if (!buf) return -ENOMEM; memcpy(buf, value, size); buf[size] = '\0'; ret = bch2_opt_parse(c, opt, buf, &v, NULL); kfree(buf); if (ret < 0) goto err_class_exit; ret = bch2_opt_check_may_set(c, opt_id, v); if (ret < 0) goto err_class_exit; s.v = v + 1; s.defined = true; } else { /* * Check if this option was set on the parent - if so, switched * back to inheriting from the parent: * * rename() also has to deal with keeping inherited options up * to date - see bch2_reinherit_attrs() */ spin_lock(&dentry->d_lock); if (!IS_ROOT(dentry)) { struct bch_inode_info *dir = to_bch_ei(d_inode(dentry->d_parent)); s.v = bch2_inode_opt_get(&dir->ei_inode, inode_opt_id); } else { s.v = 0; } spin_unlock(&dentry->d_lock); s.defined = false; } mutex_lock(&inode->ei_update_lock); if (inode_opt_id == Inode_opt_project) { /* * inode fields accessible via the xattr interface are stored * with a +1 bias, so that 0 means unset: */ ret = bch2_set_projid(c, inode, s.v ? s.v - 1 : 0); if (ret) goto err; } ret = bch2_write_inode(c, inode, inode_opt_set_fn, &s, 0); err: mutex_unlock(&inode->ei_update_lock); err_class_exit: return bch2_err_class(ret); } static const struct xattr_handler bch_xattr_bcachefs_handler = { .prefix = "bcachefs.", .get = bch2_xattr_bcachefs_get, .set = bch2_xattr_bcachefs_set, }; static int bch2_xattr_bcachefs_get_effective( const struct xattr_handler *handler, struct dentry *dentry, struct inode *vinode, const char *name, void *buffer, size_t size) { return __bch2_xattr_bcachefs_get(handler, dentry, vinode, name, buffer, size, true); } /* Noop - xattrs in the bcachefs_effective namespace are inherited */ static int bch2_xattr_bcachefs_set_effective(const struct xattr_handler *handler, struct mnt_idmap *idmap, struct dentry *dentry, struct inode *vinode, const char *name, const void *value, size_t size, int flags) { return 0; } static const struct xattr_handler bch_xattr_bcachefs_effective_handler = { .prefix = "bcachefs_effective.", .get = bch2_xattr_bcachefs_get_effective, .set = bch2_xattr_bcachefs_set_effective, }; #endif /* NO_BCACHEFS_FS */ const struct xattr_handler * const bch2_xattr_handlers[] = { &bch_xattr_user_handler, &bch_xattr_trusted_handler, &bch_xattr_security_handler, #ifndef NO_BCACHEFS_FS &bch_xattr_bcachefs_handler, &bch_xattr_bcachefs_effective_handler, #endif NULL }; static const struct xattr_handler *bch_xattr_handler_map[] = { [KEY_TYPE_XATTR_INDEX_USER] = &bch_xattr_user_handler, [KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS] = &nop_posix_acl_access, [KEY_TYPE_XATTR_INDEX_POSIX_ACL_DEFAULT] = &nop_posix_acl_default, [KEY_TYPE_XATTR_INDEX_TRUSTED] = &bch_xattr_trusted_handler, [KEY_TYPE_XATTR_INDEX_SECURITY] = &bch_xattr_security_handler, }; static const struct xattr_handler *bch2_xattr_type_to_handler(unsigned type) { return type < ARRAY_SIZE(bch_xattr_handler_map) ? bch_xattr_handler_map[type] : NULL; } |
20 20 6 11 10 10 10 10 10 10 10 10 10 10 8 2 8 8 8 8 10 10 8 8 8 8 8 8 10 4 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 | // SPDX-License-Identifier: GPL-2.0-or-later /* * RTC class driver for "CMOS RTC": PCs, ACPI, etc * * Copyright (C) 1996 Paul Gortmaker (drivers/char/rtc.c) * Copyright (C) 2006 David Brownell (convert to new framework) */ /* * The original "cmos clock" chip was an MC146818 chip, now obsolete. * That defined the register interface now provided by all PCs, some * non-PC systems, and incorporated into ACPI. Modern PC chipsets * integrate an MC146818 clone in their southbridge, and boards use * that instead of discrete clones like the DS12887 or M48T86. There * are also clones that connect using the LPC bus. * * That register API is also used directly by various other drivers * (notably for integrated NVRAM), infrastructure (x86 has code to * bypass the RTC framework, directly reading the RTC during boot * and updating minutes/seconds for systems using NTP synch) and * utilities (like userspace 'hwclock', if no /dev node exists). * * So **ALL** calls to CMOS_READ and CMOS_WRITE must be done with * interrupts disabled, holding the global rtc_lock, to exclude those * other drivers and utilities on correctly configured systems. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kernel.h> #include <linux/module.h> #include <linux/init.h> #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/platform_device.h> #include <linux/log2.h> #include <linux/pm.h> #include <linux/of.h> #include <linux/of_platform.h> #ifdef CONFIG_X86 #include <asm/i8259.h> #include <asm/processor.h> #include <linux/dmi.h> #endif /* this is for "generic access to PC-style RTC" using CMOS_READ/CMOS_WRITE */ #include <linux/mc146818rtc.h> #ifdef CONFIG_ACPI /* * Use ACPI SCI to replace HPET interrupt for RTC Alarm event * * If cleared, ACPI SCI is only used to wake up the system from suspend * * If set, ACPI SCI is used to handle UIE/AIE and system wakeup */ static bool use_acpi_alarm; module_param(use_acpi_alarm, bool, 0444); static inline int cmos_use_acpi_alarm(void) { return use_acpi_alarm; } #else /* !CONFIG_ACPI */ static inline int cmos_use_acpi_alarm(void) { return 0; } #endif struct cmos_rtc { struct rtc_device *rtc; struct device *dev; int irq; struct resource *iomem; time64_t alarm_expires; void (*wake_on)(struct device *); void (*wake_off)(struct device *); u8 enabled_wake; u8 suspend_ctrl; /* newer hardware extends the original register set */ u8 day_alrm; u8 mon_alrm; u8 century; struct rtc_wkalrm saved_wkalrm; }; /* both platform and pnp busses use negative numbers for invalid irqs */ #define is_valid_irq(n) ((n) > 0) static const char driver_name[] = "rtc_cmos"; /* The RTC_INTR register may have e.g. RTC_PF set even if RTC_PIE is clear; * always mask it against the irq enable bits in RTC_CONTROL. Bit values * are the same: PF==PIE, AF=AIE, UF=UIE; so RTC_IRQMASK works with both. */ #define RTC_IRQMASK (RTC_PF | RTC_AF | RTC_UF) static inline int is_intr(u8 rtc_intr) { if (!(rtc_intr & RTC_IRQF)) return 0; return rtc_intr & RTC_IRQMASK; } /*----------------------------------------------------------------*/ /* Much modern x86 hardware has HPETs (10+ MHz timers) which, because * many BIOS programmers don't set up "sane mode" IRQ routing, are mostly * used in a broken "legacy replacement" mode. The breakage includes * HPET #1 hijacking the IRQ for this RTC, and being unavailable for * other (better) use. * * When that broken mode is in use, platform glue provides a partial * emulation of hardware RTC IRQ facilities using HPET #1. We don't * want to use HPET for anything except those IRQs though... */ #ifdef CONFIG_HPET_EMULATE_RTC #include <asm/hpet.h> #else static inline int is_hpet_enabled(void) { return 0; } static inline int hpet_mask_rtc_irq_bit(unsigned long mask) { return 0; } static inline int hpet_set_rtc_irq_bit(unsigned long mask) { return 0; } static inline int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec) { return 0; } static inline int hpet_set_periodic_freq(unsigned long freq) { return 0; } static inline int hpet_rtc_dropped_irq(void) { return 0; } static inline int hpet_rtc_timer_init(void) { return 0; } extern irq_handler_t hpet_rtc_interrupt; static inline int hpet_register_irq_handler(irq_handler_t handler) { return 0; } static inline int hpet_unregister_irq_handler(irq_handler_t handler) { return 0; } #endif /* Don't use HPET for RTC Alarm event if ACPI Fixed event is used */ static inline int use_hpet_alarm(void) { return is_hpet_enabled() && !cmos_use_acpi_alarm(); } /*----------------------------------------------------------------*/ #ifdef RTC_PORT /* Most newer x86 systems have two register banks, the first used * for RTC and NVRAM and the second only for NVRAM. Caller must * own rtc_lock ... and we won't worry about access during NMI. */ #define can_bank2 true static inline unsigned char cmos_read_bank2(unsigned char addr) { outb(addr, RTC_PORT(2)); return inb(RTC_PORT(3)); } static inline void cmos_write_bank2(unsigned char val, unsigned char addr) { outb(addr, RTC_PORT(2)); outb(val, RTC_PORT(3)); } #else #define can_bank2 false static inline unsigned char cmos_read_bank2(unsigned char addr) { return 0; } static inline void cmos_write_bank2(unsigned char val, unsigned char addr) { } #endif /*----------------------------------------------------------------*/ static int cmos_read_time(struct device *dev, struct rtc_time *t) { int ret; /* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. */ if (!pm_trace_rtc_valid()) return -EIO; ret = mc146818_get_time(t, 1000); if (ret < 0) { dev_err_ratelimited(dev, "unable to read current time\n"); return ret; } return 0; } static int cmos_set_time(struct device *dev, struct rtc_time *t) { /* NOTE: this ignores the issue whereby updating the seconds * takes effect exactly 500ms after we write the register. * (Also queueing and other delays before we get this far.) */ return mc146818_set_time(t); } struct cmos_read_alarm_callback_param { struct cmos_rtc *cmos; struct rtc_time *time; unsigned char rtc_control; }; static void cmos_read_alarm_callback(unsigned char __always_unused seconds, void *param_in) { struct cmos_read_alarm_callback_param *p = (struct cmos_read_alarm_callback_param *)param_in; struct rtc_time *time = p->time; time->tm_sec = CMOS_READ(RTC_SECONDS_ALARM); time->tm_min = CMOS_READ(RTC_MINUTES_ALARM); time->tm_hour = CMOS_READ(RTC_HOURS_ALARM); if (p->cmos->day_alrm) { /* ignore upper bits on readback per ACPI spec */ time->tm_mday = CMOS_READ(p->cmos->day_alrm) & 0x3f; if (!time->tm_mday) time->tm_mday = -1; if (p->cmos->mon_alrm) { time->tm_mon = CMOS_READ(p->cmos->mon_alrm); if (!time->tm_mon) time->tm_mon = -1; } } p->rtc_control = CMOS_READ(RTC_CONTROL); } static int cmos_read_alarm(struct device *dev, struct rtc_wkalrm *t) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct cmos_read_alarm_callback_param p = { .cmos = cmos, .time = &t->time, }; /* This not only a rtc_op, but also called directly */ if (!is_valid_irq(cmos->irq)) return -ETIMEDOUT; /* Basic alarms only support hour, minute, and seconds fields. * Some also support day and month, for alarms up to a year in * the future. */ /* Some Intel chipsets disconnect the alarm registers when the clock * update is in progress - during this time reads return bogus values * and writes may fail silently. See for example "7th Generation Intel® * Processor Family I/O for U/Y Platforms [...] Datasheet", section * 27.7.1 * * Use the mc146818_avoid_UIP() function to avoid this. */ if (!mc146818_avoid_UIP(cmos_read_alarm_callback, 10, &p)) return -EIO; if (!(p.rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { if (((unsigned)t->time.tm_sec) < 0x60) t->time.tm_sec = bcd2bin(t->time.tm_sec); else t->time.tm_sec = -1; if (((unsigned)t->time.tm_min) < 0x60) t->time.tm_min = bcd2bin(t->time.tm_min); else t->time.tm_min = -1; if (((unsigned)t->time.tm_hour) < 0x24) t->time.tm_hour = bcd2bin(t->time.tm_hour); else t->time.tm_hour = -1; if (cmos->day_alrm) { if (((unsigned)t->time.tm_mday) <= 0x31) t->time.tm_mday = bcd2bin(t->time.tm_mday); else t->time.tm_mday = -1; if (cmos->mon_alrm) { if (((unsigned)t->time.tm_mon) <= 0x12) t->time.tm_mon = bcd2bin(t->time.tm_mon)-1; else t->time.tm_mon = -1; } } } t->enabled = !!(p.rtc_control & RTC_AIE); t->pending = 0; return 0; } static void cmos_checkintr(struct cmos_rtc *cmos, unsigned char rtc_control) { unsigned char rtc_intr; /* NOTE after changing RTC_xIE bits we always read INTR_FLAGS; * allegedly some older rtcs need that to handle irqs properly */ rtc_intr = CMOS_READ(RTC_INTR_FLAGS); if (use_hpet_alarm()) return; rtc_intr &= (rtc_control & RTC_IRQMASK) | RTC_IRQF; if (is_intr(rtc_intr)) rtc_update_irq(cmos->rtc, 1, rtc_intr); } static void cmos_irq_enable(struct cmos_rtc *cmos, unsigned char mask) { unsigned char rtc_control; /* flush any pending IRQ status, notably for update irqs, * before we enable new IRQs */ rtc_control = CMOS_READ(RTC_CONTROL); cmos_checkintr(cmos, rtc_control); rtc_control |= mask; CMOS_WRITE(rtc_control, RTC_CONTROL); if (use_hpet_alarm()) hpet_set_rtc_irq_bit(mask); if ((mask & RTC_AIE) && cmos_use_acpi_alarm()) { if (cmos->wake_on) cmos->wake_on(cmos->dev); } cmos_checkintr(cmos, rtc_control); } static void cmos_irq_disable(struct cmos_rtc *cmos, unsigned char mask) { unsigned char rtc_control; rtc_control = CMOS_READ(RTC_CONTROL); rtc_control &= ~mask; CMOS_WRITE(rtc_control, RTC_CONTROL); if (use_hpet_alarm()) hpet_mask_rtc_irq_bit(mask); if ((mask & RTC_AIE) && cmos_use_acpi_alarm()) { if (cmos->wake_off) cmos->wake_off(cmos->dev); } cmos_checkintr(cmos, rtc_control); } static int cmos_validate_alarm(struct device *dev, struct rtc_wkalrm *t) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct rtc_time now; cmos_read_time(dev, &now); if (!cmos->day_alrm) { time64_t t_max_date; time64_t t_alrm; t_max_date = rtc_tm_to_time64(&now); t_max_date += 24 * 60 * 60 - 1; t_alrm = rtc_tm_to_time64(&t->time); if (t_alrm > t_max_date) { dev_err(dev, "Alarms can be up to one day in the future\n"); return -EINVAL; } } else if (!cmos->mon_alrm) { struct rtc_time max_date = now; time64_t t_max_date; time64_t t_alrm; int max_mday; if (max_date.tm_mon == 11) { max_date.tm_mon = 0; max_date.tm_year += 1; } else { max_date.tm_mon += 1; } max_mday = rtc_month_days(max_date.tm_mon, max_date.tm_year); if (max_date.tm_mday > max_mday) max_date.tm_mday = max_mday; t_max_date = rtc_tm_to_time64(&max_date); t_max_date -= 1; t_alrm = rtc_tm_to_time64(&t->time); if (t_alrm > t_max_date) { dev_err(dev, "Alarms can be up to one month in the future\n"); return -EINVAL; } } else { struct rtc_time max_date = now; time64_t t_max_date; time64_t t_alrm; int max_mday; max_date.tm_year += 1; max_mday = rtc_month_days(max_date.tm_mon, max_date.tm_year); if (max_date.tm_mday > max_mday) max_date.tm_mday = max_mday; t_max_date = rtc_tm_to_time64(&max_date); t_max_date -= 1; t_alrm = rtc_tm_to_time64(&t->time); if (t_alrm > t_max_date) { dev_err(dev, "Alarms can be up to one year in the future\n"); return -EINVAL; } } return 0; } struct cmos_set_alarm_callback_param { struct cmos_rtc *cmos; unsigned char mon, mday, hrs, min, sec; struct rtc_wkalrm *t; }; /* Note: this function may be executed by mc146818_avoid_UIP() more then * once */ static void cmos_set_alarm_callback(unsigned char __always_unused seconds, void *param_in) { struct cmos_set_alarm_callback_param *p = (struct cmos_set_alarm_callback_param *)param_in; /* next rtc irq must not be from previous alarm setting */ cmos_irq_disable(p->cmos, RTC_AIE); /* update alarm */ CMOS_WRITE(p->hrs, RTC_HOURS_ALARM); CMOS_WRITE(p->min, RTC_MINUTES_ALARM); CMOS_WRITE(p->sec, RTC_SECONDS_ALARM); /* the system may support an "enhanced" alarm */ if (p->cmos->day_alrm) { CMOS_WRITE(p->mday, p->cmos->day_alrm); if (p->cmos->mon_alrm) CMOS_WRITE(p->mon, p->cmos->mon_alrm); } if (use_hpet_alarm()) { /* * FIXME the HPET alarm glue currently ignores day_alrm * and mon_alrm ... */ hpet_set_alarm_time(p->t->time.tm_hour, p->t->time.tm_min, p->t->time.tm_sec); } if (p->t->enabled) cmos_irq_enable(p->cmos, RTC_AIE); } static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct cmos_set_alarm_callback_param p = { .cmos = cmos, .t = t }; unsigned char rtc_control; int ret; /* This not only a rtc_op, but also called directly */ if (!is_valid_irq(cmos->irq)) return -EIO; ret = cmos_validate_alarm(dev, t); if (ret < 0) return ret; p.mon = t->time.tm_mon + 1; p.mday = t->time.tm_mday; p.hrs = t->time.tm_hour; p.min = t->time.tm_min; p.sec = t->time.tm_sec; spin_lock_irq(&rtc_lock); rtc_control = CMOS_READ(RTC_CONTROL); spin_unlock_irq(&rtc_lock); if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { /* Writing 0xff means "don't care" or "match all". */ p.mon = (p.mon <= 12) ? bin2bcd(p.mon) : 0xff; p.mday = (p.mday >= 1 && p.mday <= 31) ? bin2bcd(p.mday) : 0xff; p.hrs = (p.hrs < 24) ? bin2bcd(p.hrs) : 0xff; p.min = (p.min < 60) ? bin2bcd(p.min) : 0xff; p.sec = (p.sec < 60) ? bin2bcd(p.sec) : 0xff; } /* * Some Intel chipsets disconnect the alarm registers when the clock * update is in progress - during this time writes fail silently. * * Use mc146818_avoid_UIP() to avoid this. */ if (!mc146818_avoid_UIP(cmos_set_alarm_callback, 10, &p)) return -ETIMEDOUT; cmos->alarm_expires = rtc_tm_to_time64(&t->time); return 0; } static int cmos_alarm_irq_enable(struct device *dev, unsigned int enabled) { struct cmos_rtc *cmos = dev_get_drvdata(dev); unsigned long flags; spin_lock_irqsave(&rtc_lock, flags); if (enabled) cmos_irq_enable(cmos, RTC_AIE); else cmos_irq_disable(cmos, RTC_AIE); spin_unlock_irqrestore(&rtc_lock, flags); return 0; } #if IS_ENABLED(CONFIG_RTC_INTF_PROC) static int cmos_procfs(struct device *dev, struct seq_file *seq) { struct cmos_rtc *cmos = dev_get_drvdata(dev); unsigned char rtc_control, valid; spin_lock_irq(&rtc_lock); rtc_control = CMOS_READ(RTC_CONTROL); valid = CMOS_READ(RTC_VALID); spin_unlock_irq(&rtc_lock); /* NOTE: at least ICH6 reports battery status using a different * (non-RTC) bit; and SQWE is ignored on many current systems. */ seq_printf(seq, "periodic_IRQ\t: %s\n" "update_IRQ\t: %s\n" "HPET_emulated\t: %s\n" // "square_wave\t: %s\n" "BCD\t\t: %s\n" "DST_enable\t: %s\n" "periodic_freq\t: %d\n" "batt_status\t: %s\n", (rtc_control & RTC_PIE) ? "yes" : "no", (rtc_control & RTC_UIE) ? "yes" : "no", use_hpet_alarm() ? "yes" : "no", // (rtc_control & RTC_SQWE) ? "yes" : "no", (rtc_control & RTC_DM_BINARY) ? "no" : "yes", (rtc_control & RTC_DST_EN) ? "yes" : "no", cmos->rtc->irq_freq, (valid & RTC_VRT) ? "okay" : "dead"); return 0; } #else #define cmos_procfs NULL #endif static const struct rtc_class_ops cmos_rtc_ops = { .read_time = cmos_read_time, .set_time = cmos_set_time, .read_alarm = cmos_read_alarm, .set_alarm = cmos_set_alarm, .proc = cmos_procfs, .alarm_irq_enable = cmos_alarm_irq_enable, }; /*----------------------------------------------------------------*/ /* * All these chips have at least 64 bytes of address space, shared by * RTC registers and NVRAM. Most of those bytes of NVRAM are used * by boot firmware. Modern chips have 128 or 256 bytes. */ #define NVRAM_OFFSET (RTC_REG_D + 1) static int cmos_nvram_read(void *priv, unsigned int off, void *val, size_t count) { unsigned char *buf = val; off += NVRAM_OFFSET; for (; count; count--, off++, buf++) { guard(spinlock_irq)(&rtc_lock); if (off < 128) *buf = CMOS_READ(off); else if (can_bank2) *buf = cmos_read_bank2(off); else return -EIO; } return 0; } static int cmos_nvram_write(void *priv, unsigned int off, void *val, size_t count) { struct cmos_rtc *cmos = priv; unsigned char *buf = val; /* NOTE: on at least PCs and Ataris, the boot firmware uses a * checksum on part of the NVRAM data. That's currently ignored * here. If userspace is smart enough to know what fields of * NVRAM to update, updating checksums is also part of its job. */ off += NVRAM_OFFSET; for (; count; count--, off++, buf++) { /* don't trash RTC registers */ if (off == cmos->day_alrm || off == cmos->mon_alrm || off == cmos->century) continue; guard(spinlock_irq)(&rtc_lock); if (off < 128) CMOS_WRITE(*buf, off); else if (can_bank2) cmos_write_bank2(*buf, off); else return -EIO; } return 0; } /*----------------------------------------------------------------*/ static struct cmos_rtc cmos_rtc; static irqreturn_t cmos_interrupt(int irq, void *p) { u8 irqstat; u8 rtc_control; spin_lock(&rtc_lock); /* When the HPET interrupt handler calls us, the interrupt * status is passed as arg1 instead of the irq number. But * always clear irq status, even when HPET is in the way. * * Note that HPET and RTC are almost certainly out of phase, * giving different IRQ status ... */ irqstat = CMOS_READ(RTC_INTR_FLAGS); rtc_control = CMOS_READ(RTC_CONTROL); if (use_hpet_alarm()) irqstat = (unsigned long)irq & 0xF0; /* If we were suspended, RTC_CONTROL may not be accurate since the * bios may have cleared it. */ if (!cmos_rtc.suspend_ctrl) irqstat &= (rtc_control & RTC_IRQMASK) | RTC_IRQF; else irqstat &= (cmos_rtc.suspend_ctrl & RTC_IRQMASK) | RTC_IRQF; /* All Linux RTC alarms should be treated as if they were oneshot. * Similar code may be needed in system wakeup paths, in case the * alarm woke the system. */ if (irqstat & RTC_AIE) { cmos_rtc.suspend_ctrl &= ~RTC_AIE; rtc_control &= ~RTC_AIE; CMOS_WRITE(rtc_control, RTC_CONTROL); if (use_hpet_alarm()) hpet_mask_rtc_irq_bit(RTC_AIE); CMOS_READ(RTC_INTR_FLAGS); } spin_unlock(&rtc_lock); if (is_intr(irqstat)) { rtc_update_irq(p, 1, irqstat); return IRQ_HANDLED; } else return IRQ_NONE; } #ifdef CONFIG_ACPI #include <linux/acpi.h> static u32 rtc_handler(void *context) { struct device *dev = context; struct cmos_rtc *cmos = dev_get_drvdata(dev); unsigned char rtc_control = 0; unsigned char rtc_intr; unsigned long flags; /* * Always update rtc irq when ACPI is used as RTC Alarm. * Or else, ACPI SCI is enabled during suspend/resume only, * update rtc irq in that case. */ if (cmos_use_acpi_alarm()) cmos_interrupt(0, (void *)cmos->rtc); else { /* Fix me: can we use cmos_interrupt() here as well? */ spin_lock_irqsave(&rtc_lock, flags); if (cmos_rtc.suspend_ctrl) rtc_control = CMOS_READ(RTC_CONTROL); if (rtc_control & RTC_AIE) { cmos_rtc.suspend_ctrl &= ~RTC_AIE; CMOS_WRITE(rtc_control, RTC_CONTROL); rtc_intr = CMOS_READ(RTC_INTR_FLAGS); rtc_update_irq(cmos->rtc, 1, rtc_intr); } spin_unlock_irqrestore(&rtc_lock, flags); } pm_wakeup_hard_event(dev); acpi_clear_event(ACPI_EVENT_RTC); acpi_disable_event(ACPI_EVENT_RTC, 0); return ACPI_INTERRUPT_HANDLED; } static void acpi_rtc_event_setup(struct device *dev) { if (acpi_disabled) return; acpi_install_fixed_event_handler(ACPI_EVENT_RTC, rtc_handler, dev); /* * After the RTC handler is installed, the Fixed_RTC event should * be disabled. Only when the RTC alarm is set will it be enabled. */ acpi_clear_event(ACPI_EVENT_RTC); acpi_disable_event(ACPI_EVENT_RTC, 0); } static void acpi_rtc_event_cleanup(void) { if (acpi_disabled) return; acpi_remove_fixed_event_handler(ACPI_EVENT_RTC, rtc_handler); } static void rtc_wake_on(struct device *dev) { acpi_clear_event(ACPI_EVENT_RTC); acpi_enable_event(ACPI_EVENT_RTC, 0); } static void rtc_wake_off(struct device *dev) { acpi_disable_event(ACPI_EVENT_RTC, 0); } #ifdef CONFIG_X86 static void use_acpi_alarm_quirks(void) { switch (boot_cpu_data.x86_vendor) { case X86_VENDOR_INTEL: if (dmi_get_bios_year() < 2015) return; break; case X86_VENDOR_AMD: case X86_VENDOR_HYGON: if (dmi_get_bios_year() < 2021) return; break; default: return; } if (!is_hpet_enabled()) return; use_acpi_alarm = true; } #else static inline void use_acpi_alarm_quirks(void) { } #endif static void acpi_cmos_wake_setup(struct device *dev) { if (acpi_disabled) return; use_acpi_alarm_quirks(); cmos_rtc.wake_on = rtc_wake_on; cmos_rtc.wake_off = rtc_wake_off; /* ACPI tables bug workaround. */ if (acpi_gbl_FADT.month_alarm && !acpi_gbl_FADT.day_alarm) { dev_dbg(dev, "bogus FADT month_alarm (%d)\n", acpi_gbl_FADT.month_alarm); acpi_gbl_FADT.month_alarm = 0; } cmos_rtc.day_alrm = acpi_gbl_FADT.day_alarm; cmos_rtc.mon_alrm = acpi_gbl_FADT.month_alarm; cmos_rtc.century = acpi_gbl_FADT.century; if (acpi_gbl_FADT.flags & ACPI_FADT_S4_RTC_WAKE) dev_info(dev, "RTC can wake from S4\n"); /* RTC always wakes from S1/S2/S3, and often S4/STD */ device_init_wakeup(dev, 1); } static void cmos_check_acpi_rtc_status(struct device *dev, unsigned char *rtc_control) { struct cmos_rtc *cmos = dev_get_drvdata(dev); acpi_event_status rtc_status; acpi_status status; if (acpi_gbl_FADT.flags & ACPI_FADT_FIXED_RTC) return; status = acpi_get_event_status(ACPI_EVENT_RTC, &rtc_status); if (ACPI_FAILURE(status)) { dev_err(dev, "Could not get RTC status\n"); } else if (rtc_status & ACPI_EVENT_FLAG_SET) { unsigned char mask; *rtc_control &= ~RTC_AIE; CMOS_WRITE(*rtc_control, RTC_CONTROL); mask = CMOS_READ(RTC_INTR_FLAGS); rtc_update_irq(cmos->rtc, 1, mask); } } #else /* !CONFIG_ACPI */ static inline void acpi_rtc_event_setup(struct device *dev) { } static inline void acpi_rtc_event_cleanup(void) { } static inline void acpi_cmos_wake_setup(struct device *dev) { } static inline void cmos_check_acpi_rtc_status(struct device *dev, unsigned char *rtc_control) { } #endif /* CONFIG_ACPI */ #ifdef CONFIG_PNP #define INITSECTION #else #define INITSECTION __init #endif #define SECS_PER_DAY (24 * 60 * 60) #define SECS_PER_MONTH (28 * SECS_PER_DAY) #define SECS_PER_YEAR (365 * SECS_PER_DAY) static int INITSECTION cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) { struct cmos_rtc_board_info *info = dev_get_platdata(dev); int retval = 0; unsigned char rtc_control; unsigned address_space; u32 flags = 0; struct nvmem_config nvmem_cfg = { .name = "cmos_nvram", .word_size = 1, .stride = 1, .reg_read = cmos_nvram_read, .reg_write = cmos_nvram_write, .priv = &cmos_rtc, }; /* there can be only one ... */ if (cmos_rtc.dev) return -EBUSY; if (!ports) return -ENODEV; /* Claim I/O ports ASAP, minimizing conflict with legacy driver. * * REVISIT non-x86 systems may instead use memory space resources * (needing ioremap etc), not i/o space resources like this ... */ if (RTC_IOMAPPED) ports = request_region(ports->start, resource_size(ports), driver_name); else ports = request_mem_region(ports->start, resource_size(ports), driver_name); if (!ports) { dev_dbg(dev, "i/o registers already in use\n"); return -EBUSY; } cmos_rtc.irq = rtc_irq; cmos_rtc.iomem = ports; /* Heuristic to deduce NVRAM size ... do what the legacy NVRAM * driver did, but don't reject unknown configs. Old hardware * won't address 128 bytes. Newer chips have multiple banks, * though they may not be listed in one I/O resource. */ #if defined(CONFIG_ATARI) address_space = 64; #elif defined(__i386__) || defined(__x86_64__) || defined(__arm__) \ || defined(__sparc__) || defined(__mips__) \ || defined(__powerpc__) address_space = 128; #else #warning Assuming 128 bytes of RTC+NVRAM address space, not 64 bytes. address_space = 128; #endif if (can_bank2 && ports->end > (ports->start + 1)) address_space = 256; /* For ACPI systems extension info comes from the FADT. On others, * board specific setup provides it as appropriate. Systems where * the alarm IRQ isn't automatically a wakeup IRQ (like ACPI, and * some almost-clones) can provide hooks to make that behave. * * Note that ACPI doesn't preclude putting these registers into * "extended" areas of the chip, including some that we won't yet * expect CMOS_READ and friends to handle. */ if (info) { if (info->flags) flags = info->flags; if (info->address_space) address_space = info->address_space; cmos_rtc.day_alrm = info->rtc_day_alarm; cmos_rtc.mon_alrm = info->rtc_mon_alarm; cmos_rtc.century = info->rtc_century; if (info->wake_on && info->wake_off) { cmos_rtc.wake_on = info->wake_on; cmos_rtc.wake_off = info->wake_off; } } else { acpi_cmos_wake_setup(dev); } if (cmos_rtc.day_alrm >= 128) cmos_rtc.day_alrm = 0; if (cmos_rtc.mon_alrm >= 128) cmos_rtc.mon_alrm = 0; if (cmos_rtc.century >= 128) cmos_rtc.century = 0; cmos_rtc.dev = dev; dev_set_drvdata(dev, &cmos_rtc); cmos_rtc.rtc = devm_rtc_allocate_device(dev); if (IS_ERR(cmos_rtc.rtc)) { retval = PTR_ERR(cmos_rtc.rtc); goto cleanup0; } if (cmos_rtc.mon_alrm) cmos_rtc.rtc->alarm_offset_max = SECS_PER_YEAR - 1; else if (cmos_rtc.day_alrm) cmos_rtc.rtc->alarm_offset_max = SECS_PER_MONTH - 1; else cmos_rtc.rtc->alarm_offset_max = SECS_PER_DAY - 1; rename_region(ports, dev_name(&cmos_rtc.rtc->dev)); if (!mc146818_does_rtc_work()) { dev_warn(dev, "broken or not accessible\n"); retval = -ENXIO; goto cleanup1; } spin_lock_irq(&rtc_lock); if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { /* force periodic irq to CMOS reset default of 1024Hz; * * REVISIT it's been reported that at least one x86_64 ALI * mobo doesn't use 32KHz here ... for portability we might * need to do something about other clock frequencies. */ cmos_rtc.rtc->irq_freq = 1024; if (use_hpet_alarm()) hpet_set_periodic_freq(cmos_rtc.rtc->irq_freq); CMOS_WRITE(RTC_REF_CLCK_32KHZ | 0x06, RTC_FREQ_SELECT); } /* disable irqs */ if (is_valid_irq(rtc_irq)) cmos_irq_disable(&cmos_rtc, RTC_PIE | RTC_AIE | RTC_UIE); rtc_control = CMOS_READ(RTC_CONTROL); spin_unlock_irq(&rtc_lock); if (is_valid_irq(rtc_irq) && !(rtc_control & RTC_24H)) { dev_warn(dev, "only 24-hr supported\n"); retval = -ENXIO; goto cleanup1; } if (use_hpet_alarm()) hpet_rtc_timer_init(); if (is_valid_irq(rtc_irq)) { irq_handler_t rtc_cmos_int_handler; if (use_hpet_alarm()) { rtc_cmos_int_handler = hpet_rtc_interrupt; retval = hpet_register_irq_handler(cmos_interrupt); if (retval) { hpet_mask_rtc_irq_bit(RTC_IRQMASK); dev_warn(dev, "hpet_register_irq_handler " " failed in rtc_init()."); goto cleanup1; } } else rtc_cmos_int_handler = cmos_interrupt; retval = request_irq(rtc_irq, rtc_cmos_int_handler, 0, dev_name(&cmos_rtc.rtc->dev), cmos_rtc.rtc); if (retval < 0) { dev_dbg(dev, "IRQ %d is already in use\n", rtc_irq); goto cleanup1; } } else { clear_bit(RTC_FEATURE_ALARM, cmos_rtc.rtc->features); } cmos_rtc.rtc->ops = &cmos_rtc_ops; retval = devm_rtc_register_device(cmos_rtc.rtc); if (retval) goto cleanup2; /* Set the sync offset for the periodic 11min update correct */ cmos_rtc.rtc->set_offset_nsec = NSEC_PER_SEC / 2; /* export at least the first block of NVRAM */ nvmem_cfg.size = address_space - NVRAM_OFFSET; devm_rtc_nvmem_register(cmos_rtc.rtc, &nvmem_cfg); /* * Everything has gone well so far, so by default register a handler for * the ACPI RTC fixed event. */ if (!info) acpi_rtc_event_setup(dev); dev_info(dev, "%s%s, %d bytes nvram%s\n", !is_valid_irq(rtc_irq) ? "no alarms" : cmos_rtc.mon_alrm ? "alarms up to one year" : cmos_rtc.day_alrm ? "alarms up to one month" : "alarms up to one day", cmos_rtc.century ? ", y3k" : "", nvmem_cfg.size, use_hpet_alarm() ? ", hpet irqs" : ""); return 0; cleanup2: if (is_valid_irq(rtc_irq)) free_irq(rtc_irq, cmos_rtc.rtc); cleanup1: cmos_rtc.dev = NULL; cleanup0: if (RTC_IOMAPPED) release_region(ports->start, resource_size(ports)); else release_mem_region(ports->start, resource_size(ports)); return retval; } static void cmos_do_shutdown(int rtc_irq) { spin_lock_irq(&rtc_lock); if (is_valid_irq(rtc_irq)) cmos_irq_disable(&cmos_rtc, RTC_IRQMASK); spin_unlock_irq(&rtc_lock); } static void cmos_do_remove(struct device *dev) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct resource *ports; cmos_do_shutdown(cmos->irq); if (is_valid_irq(cmos->irq)) { free_irq(cmos->irq, cmos->rtc); if (use_hpet_alarm()) hpet_unregister_irq_handler(cmos_interrupt); } if (!dev_get_platdata(dev)) acpi_rtc_event_cleanup(); cmos->rtc = NULL; ports = cmos->iomem; if (RTC_IOMAPPED) release_region(ports->start, resource_size(ports)); else release_mem_region(ports->start, resource_size(ports)); cmos->iomem = NULL; cmos->dev = NULL; } static int cmos_aie_poweroff(struct device *dev) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct rtc_time now; time64_t t_now; int retval = 0; unsigned char rtc_control; if (!cmos->alarm_expires) return -EINVAL; spin_lock_irq(&rtc_lock); rtc_control = CMOS_READ(RTC_CONTROL); spin_unlock_irq(&rtc_lock); /* We only care about the situation where AIE is disabled. */ if (rtc_control & RTC_AIE) return -EBUSY; cmos_read_time(dev, &now); t_now = rtc_tm_to_time64(&now); /* * When enabling "RTC wake-up" in BIOS setup, the machine reboots * automatically right after shutdown on some buggy boxes. * This automatic rebooting issue won't happen when the alarm * time is larger than now+1 seconds. * * If the alarm time is equal to now+1 seconds, the issue can be * prevented by cancelling the alarm. */ if (cmos->alarm_expires == t_now + 1) { struct rtc_wkalrm alarm; /* Cancel the AIE timer by configuring the past time. */ rtc_time64_to_tm(t_now - 1, &alarm.time); alarm.enabled = 0; retval = cmos_set_alarm(dev, &alarm); } else if (cmos->alarm_expires > t_now + 1) { retval = -EBUSY; } return retval; } static int cmos_suspend(struct device *dev) { struct cmos_rtc *cmos = dev_get_drvdata(dev); unsigned char tmp; /* only the alarm might be a wakeup event source */ spin_lock_irq(&rtc_lock); cmos->suspend_ctrl = tmp = CMOS_READ(RTC_CONTROL); if (tmp & (RTC_PIE|RTC_AIE|RTC_UIE)) { unsigned char mask; if (device_may_wakeup(dev)) mask = RTC_IRQMASK & ~RTC_AIE; else mask = RTC_IRQMASK; tmp &= ~mask; CMOS_WRITE(tmp, RTC_CONTROL); if (use_hpet_alarm()) hpet_mask_rtc_irq_bit(mask); cmos_checkintr(cmos, tmp); } spin_unlock_irq(&rtc_lock); if ((tmp & RTC_AIE) && !cmos_use_acpi_alarm()) { cmos->enabled_wake = 1; if (cmos->wake_on) cmos->wake_on(dev); else enable_irq_wake(cmos->irq); } memset(&cmos->saved_wkalrm, 0, sizeof(struct rtc_wkalrm)); cmos_read_alarm(dev, &cmos->saved_wkalrm); dev_dbg(dev, "suspend%s, ctrl %02x\n", (tmp & RTC_AIE) ? ", alarm may wake" : "", tmp); return 0; } /* We want RTC alarms to wake us from e.g. ACPI G2/S5 "soft off", even * after a detour through G3 "mechanical off", although the ACPI spec * says wakeup should only work from G1/S4 "hibernate". To most users, * distinctions between S4 and S5 are pointless. So when the hardware * allows, don't draw that distinction. */ static inline int cmos_poweroff(struct device *dev) { if (!IS_ENABLED(CONFIG_PM)) return -ENOSYS; return cmos_suspend(dev); } static void cmos_check_wkalrm(struct device *dev) { struct cmos_rtc *cmos = dev_get_drvdata(dev); struct rtc_wkalrm current_alarm; time64_t t_now; time64_t t_current_expires; time64_t t_saved_expires; struct rtc_time now; /* Check if we have RTC Alarm armed */ if (!(cmos->suspend_ctrl & RTC_AIE)) return; cmos_read_time(dev, &now); t_now = rtc_tm_to_time64(&now); /* * ACPI RTC wake event is cleared after resume from STR, * ACK the rtc irq here */ if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) { local_irq_disable(); cmos_interrupt(0, (void *)cmos->rtc); local_irq_enable(); return; } memset(¤t_alarm, 0, sizeof(struct rtc_wkalrm)); cmos_read_alarm(dev, ¤t_alarm); t_current_expires = rtc_tm_to_time64(¤t_alarm.time); t_saved_expires = rtc_tm_to_time64(&cmos->saved_wkalrm.time); if (t_current_expires != t_saved_expires || cmos->saved_wkalrm.enabled != current_alarm.enabled) { cmos_set_alarm(dev, &cmos->saved_wkalrm); } } static int __maybe_unused cmos_resume(struct device *dev) { struct cmos_rtc *cmos = dev_get_drvdata(dev); unsigned char tmp; if (cmos->enabled_wake && !cmos_use_acpi_alarm()) { if (cmos->wake_off) cmos->wake_off(dev); else disable_irq_wake(cmos->irq); cmos->enabled_wake = 0; } /* The BIOS might have changed the alarm, restore it */ cmos_check_wkalrm(dev); spin_lock_irq(&rtc_lock); tmp = cmos->suspend_ctrl; cmos->suspend_ctrl = 0; /* re-enable any irqs previously active */ if (tmp & RTC_IRQMASK) { unsigned char mask; if (device_may_wakeup(dev) && use_hpet_alarm()) hpet_rtc_timer_init(); do { CMOS_WRITE(tmp, RTC_CONTROL); if (use_hpet_alarm()) hpet_set_rtc_irq_bit(tmp & RTC_IRQMASK); mask = CMOS_READ(RTC_INTR_FLAGS); mask &= (tmp & RTC_IRQMASK) | RTC_IRQF; if (!use_hpet_alarm() || !is_intr(mask)) break; /* force one-shot behavior if HPET blocked * the wake alarm's irq */ rtc_update_irq(cmos->rtc, 1, mask); tmp &= ~RTC_AIE; hpet_mask_rtc_irq_bit(RTC_AIE); } while (mask & RTC_AIE); if (tmp & RTC_AIE) cmos_check_acpi_rtc_status(dev, &tmp); } spin_unlock_irq(&rtc_lock); dev_dbg(dev, "resume, ctrl %02x\n", tmp); return 0; } static SIMPLE_DEV_PM_OPS(cmos_pm_ops, cmos_suspend, cmos_resume); /*----------------------------------------------------------------*/ /* On non-x86 systems, a "CMOS" RTC lives most naturally on platform_bus. * ACPI systems always list these as PNPACPI devices, and pre-ACPI PCs * probably list them in similar PNPBIOS tables; so PNP is more common. * * We don't use legacy "poke at the hardware" probing. Ancient PCs that * predate even PNPBIOS should set up platform_bus devices. */ #ifdef CONFIG_PNP #include <linux/pnp.h> static int cmos_pnp_probe(struct pnp_dev *pnp, const struct pnp_device_id *id) { int irq; if (pnp_port_start(pnp, 0) == 0x70 && !pnp_irq_valid(pnp, 0)) { irq = 0; #ifdef CONFIG_X86 /* Some machines contain a PNP entry for the RTC, but * don't define the IRQ. It should always be safe to * hardcode it on systems with a legacy PIC. */ if (nr_legacy_irqs()) irq = RTC_IRQ; #endif } else { irq = pnp_irq(pnp, 0); } return cmos_do_probe(&pnp->dev, pnp_get_resource(pnp, IORESOURCE_IO, 0), irq); } static void cmos_pnp_remove(struct pnp_dev *pnp) { cmos_do_remove(&pnp->dev); } static void cmos_pnp_shutdown(struct pnp_dev *pnp) { struct device *dev = &pnp->dev; struct cmos_rtc *cmos = dev_get_drvdata(dev); if (system_state == SYSTEM_POWER_OFF) { int retval = cmos_poweroff(dev); if (cmos_aie_poweroff(dev) < 0 && !retval) return; } cmos_do_shutdown(cmos->irq); } static const struct pnp_device_id rtc_ids[] = { { .id = "PNP0b00", }, { .id = "PNP0b01", }, { .id = "PNP0b02", }, { }, }; MODULE_DEVICE_TABLE(pnp, rtc_ids); static struct pnp_driver cmos_pnp_driver = { .name = driver_name, .id_table = rtc_ids, .probe = cmos_pnp_probe, .remove = cmos_pnp_remove, .shutdown = cmos_pnp_shutdown, /* flag ensures resume() gets called, and stops syslog spam */ .flags = PNP_DRIVER_RES_DO_NOT_CHANGE, .driver = { .pm = &cmos_pm_ops, }, }; #endif /* CONFIG_PNP */ #ifdef CONFIG_OF static const struct of_device_id of_cmos_match[] = { { .compatible = "motorola,mc146818", }, { }, }; MODULE_DEVICE_TABLE(of, of_cmos_match); static __init void cmos_of_init(struct platform_device *pdev) { struct device_node *node = pdev->dev.of_node; const __be32 *val; if (!node) return; val = of_get_property(node, "ctrl-reg", NULL); if (val) CMOS_WRITE(be32_to_cpup(val), RTC_CONTROL); val = of_get_property(node, "freq-reg", NULL); if (val) CMOS_WRITE(be32_to_cpup(val), RTC_FREQ_SELECT); } #else static inline void cmos_of_init(struct platform_device *pdev) {} #endif /*----------------------------------------------------------------*/ /* Platform setup should have set up an RTC device, when PNP is * unavailable ... this could happen even on (older) PCs. */ static int __init cmos_platform_probe(struct platform_device *pdev) { struct resource *resource; int irq; cmos_of_init(pdev); if (RTC_IOMAPPED) resource = platform_get_resource(pdev, IORESOURCE_IO, 0); else resource = platform_get_resource(pdev, IORESOURCE_MEM, 0); irq = platform_get_irq(pdev, 0); if (irq < 0) irq = -1; return cmos_do_probe(&pdev->dev, resource, irq); } static void cmos_platform_remove(struct platform_device *pdev) { cmos_do_remove(&pdev->dev); } static void cmos_platform_shutdown(struct platform_device *pdev) { struct device *dev = &pdev->dev; struct cmos_rtc *cmos = dev_get_drvdata(dev); if (system_state == SYSTEM_POWER_OFF) { int retval = cmos_poweroff(dev); if (cmos_aie_poweroff(dev) < 0 && !retval) return; } cmos_do_shutdown(cmos->irq); } /* work with hotplug and coldplug */ MODULE_ALIAS("platform:rtc_cmos"); static struct platform_driver cmos_platform_driver = { .remove = cmos_platform_remove, .shutdown = cmos_platform_shutdown, .driver = { .name = driver_name, .pm = &cmos_pm_ops, .of_match_table = of_match_ptr(of_cmos_match), } }; #ifdef CONFIG_PNP static bool pnp_driver_registered; #endif static bool platform_driver_registered; static int __init cmos_init(void) { int retval = 0; #ifdef CONFIG_PNP retval = pnp_register_driver(&cmos_pnp_driver); if (retval == 0) pnp_driver_registered = true; #endif if (!cmos_rtc.dev) { retval = platform_driver_probe(&cmos_platform_driver, cmos_platform_probe); if (retval == 0) platform_driver_registered = true; } if (retval == 0) return 0; #ifdef CONFIG_PNP if (pnp_driver_registered) pnp_unregister_driver(&cmos_pnp_driver); #endif return retval; } module_init(cmos_init); static void __exit cmos_exit(void) { #ifdef CONFIG_PNP if (pnp_driver_registered) pnp_unregister_driver(&cmos_pnp_driver); #endif if (platform_driver_registered) platform_driver_unregister(&cmos_platform_driver); } module_exit(cmos_exit); MODULE_AUTHOR("David Brownell"); MODULE_DESCRIPTION("Driver for PC-style 'CMOS' RTCs"); MODULE_LICENSE("GPL"); |
22 54 89 1 22 6 3 22 22 4 14 6 6 22 22 5 2 1 1 1 2 2 2 99 9 90 51 51 39 5 11 11 23 21 22 22 17 17 17 17 21 23 22 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 | // SPDX-License-Identifier: GPL-2.0 /* * fs/partitions/msdos.c * * Code extracted from drivers/block/genhd.c * Copyright (C) 1991-1998 Linus Torvalds * * Thanks to Branko Lankester, lankeste@fwi.uva.nl, who found a bug * in the early extended-partition checks and added DM partitions * * Support for DiskManager v6.0x added by Mark Lord, * with information provided by OnTrack. This now works for linux fdisk * and LILO, as well as loadlin and bootln. Note that disks other than * /dev/hda *must* have a "DOS" type 0x51 partition in the first slot (hda1). * * More flexible handling of extended partitions - aeb, 950831 * * Check partition table on IDE disks for common CHS translations * * Re-organised Feb 1998 Russell King * * BSD disklabel support by Yossi Gottlieb <yogo@math.tau.ac.il> * updated by Marc Espie <Marc.Espie@openbsd.org> * * Unixware slices support by Andrzej Krzysztofowicz <ankry@mif.pg.gda.pl> * and Krzysztof G. Baranowski <kgb@knm.org.pl> */ #include <linux/msdos_fs.h> #include <linux/msdos_partition.h> #include "check.h" #include "efi.h" /* * Many architectures don't like unaligned accesses, while * the nr_sects and start_sect partition table entries are * at a 2 (mod 4) address. */ #include <linux/unaligned.h> static inline sector_t nr_sects(struct msdos_partition *p) { return (sector_t)get_unaligned_le32(&p->nr_sects); } static inline sector_t start_sect(struct msdos_partition *p) { return (sector_t)get_unaligned_le32(&p->start_sect); } static inline int is_extended_partition(struct msdos_partition *p) { return (p->sys_ind == DOS_EXTENDED_PARTITION || p->sys_ind == WIN98_EXTENDED_PARTITION || p->sys_ind == LINUX_EXTENDED_PARTITION); } #define MSDOS_LABEL_MAGIC1 0x55 #define MSDOS_LABEL_MAGIC2 0xAA static inline int msdos_magic_present(unsigned char *p) { return (p[0] == MSDOS_LABEL_MAGIC1 && p[1] == MSDOS_LABEL_MAGIC2); } /* Value is EBCDIC 'IBMA' */ #define AIX_LABEL_MAGIC1 0xC9 #define AIX_LABEL_MAGIC2 0xC2 #define AIX_LABEL_MAGIC3 0xD4 #define AIX_LABEL_MAGIC4 0xC1 static int aix_magic_present(struct parsed_partitions *state, unsigned char *p) { struct msdos_partition *pt = (struct msdos_partition *) (p + 0x1be); Sector sect; unsigned char *d; int slot, ret = 0; if (!(p[0] == AIX_LABEL_MAGIC1 && p[1] == AIX_LABEL_MAGIC2 && p[2] == AIX_LABEL_MAGIC3 && p[3] == AIX_LABEL_MAGIC4)) return 0; /* * Assume the partition table is valid if Linux partitions exists. * Note that old Solaris/x86 partitions use the same indicator as * Linux swap partitions, so we consider that a Linux partition as * well. */ for (slot = 1; slot <= 4; slot++, pt++) { if (pt->sys_ind == SOLARIS_X86_PARTITION || pt->sys_ind == LINUX_RAID_PARTITION || pt->sys_ind == LINUX_DATA_PARTITION || pt->sys_ind == LINUX_LVM_PARTITION || is_extended_partition(pt)) return 0; } d = read_part_sector(state, 7, §); if (d) { if (d[0] == '_' && d[1] == 'L' && d[2] == 'V' && d[3] == 'M') ret = 1; put_dev_sector(sect); } return ret; } static void set_info(struct parsed_partitions *state, int slot, u32 disksig) { struct partition_meta_info *info = &state->parts[slot].info; snprintf(info->uuid, sizeof(info->uuid), "%08x-%02x", disksig, slot); info->volname[0] = 0; state->parts[slot].has_info = true; } /* * Create devices for each logical partition in an extended partition. * The logical partitions form a linked list, with each entry being * a partition table with two entries. The first entry * is the real data partition (with a start relative to the partition * table start). The second is a pointer to the next logical partition * (with a start relative to the entire extended partition). * We do not create a Linux partition for the partition tables, but * only for the actual data partitions. */ static void parse_extended(struct parsed_partitions *state, sector_t first_sector, sector_t first_size, u32 disksig) { struct msdos_partition *p; Sector sect; unsigned char *data; sector_t this_sector, this_size; sector_t sector_size; int loopct = 0; /* number of links followed without finding a data partition */ int i; sector_size = queue_logical_block_size(state->disk->queue) / 512; this_sector = first_sector; this_size = first_size; while (1) { if (++loopct > 100) return; if (state->next == state->limit) return; data = read_part_sector(state, this_sector, §); if (!data) return; if (!msdos_magic_present(data + 510)) goto done; p = (struct msdos_partition *) (data + 0x1be); /* * Usually, the first entry is the real data partition, * the 2nd entry is the next extended partition, or empty, * and the 3rd and 4th entries are unused. * However, DRDOS sometimes has the extended partition as * the first entry (when the data partition is empty), * and OS/2 seems to use all four entries. */ /* * First process the data partition(s) */ for (i = 0; i < 4; i++, p++) { sector_t offs, size, next; if (!nr_sects(p) || is_extended_partition(p)) continue; /* Check the 3rd and 4th entries - these sometimes contain random garbage */ offs = start_sect(p)*sector_size; size = nr_sects(p)*sector_size; next = this_sector + offs; if (i >= 2) { if (offs + size > this_size) continue; if (next < first_sector) continue; if (next + size > first_sector + first_size) continue; } put_partition(state, state->next, next, size); set_info(state, state->next, disksig); if (p->sys_ind == LINUX_RAID_PARTITION) state->parts[state->next].flags = ADDPART_FLAG_RAID; loopct = 0; if (++state->next == state->limit) goto done; } /* * Next, process the (first) extended partition, if present. * (So far, there seems to be no reason to make * parse_extended() recursive and allow a tree * of extended partitions.) * It should be a link to the next logical partition. */ p -= 4; for (i = 0; i < 4; i++, p++) if (nr_sects(p) && is_extended_partition(p)) break; if (i == 4) goto done; /* nothing left to do */ this_sector = first_sector + start_sect(p) * sector_size; this_size = nr_sects(p) * sector_size; put_dev_sector(sect); } done: put_dev_sector(sect); } #define SOLARIS_X86_NUMSLICE 16 #define SOLARIS_X86_VTOC_SANE (0x600DDEEEUL) struct solaris_x86_slice { __le16 s_tag; /* ID tag of partition */ __le16 s_flag; /* permission flags */ __le32 s_start; /* start sector no of partition */ __le32 s_size; /* # of blocks in partition */ }; struct solaris_x86_vtoc { unsigned int v_bootinfo[3]; /* info needed by mboot */ __le32 v_sanity; /* to verify vtoc sanity */ __le32 v_version; /* layout version */ char v_volume[8]; /* volume name */ __le16 v_sectorsz; /* sector size in bytes */ __le16 v_nparts; /* number of partitions */ unsigned int v_reserved[10]; /* free space */ struct solaris_x86_slice v_slice[SOLARIS_X86_NUMSLICE]; /* slice headers */ unsigned int timestamp[SOLARIS_X86_NUMSLICE]; /* timestamp */ char v_asciilabel[128]; /* for compatibility */ }; /* james@bpgc.com: Solaris has a nasty indicator: 0x82 which also indicates linux swap. Be careful before believing this is Solaris. */ static void parse_solaris_x86(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_SOLARIS_X86_PARTITION Sector sect; struct solaris_x86_vtoc *v; int i; short max_nparts; v = read_part_sector(state, offset + 1, §); if (!v) return; if (le32_to_cpu(v->v_sanity) != SOLARIS_X86_VTOC_SANE) { put_dev_sector(sect); return; } { char tmp[1 + BDEVNAME_SIZE + 10 + 11 + 1]; snprintf(tmp, sizeof(tmp), " %s%d: <solaris:", state->name, origin); strlcat(state->pp_buf, tmp, PAGE_SIZE); } if (le32_to_cpu(v->v_version) != 1) { char tmp[64]; snprintf(tmp, sizeof(tmp), " cannot handle version %d vtoc>\n", le32_to_cpu(v->v_version)); strlcat(state->pp_buf, tmp, PAGE_SIZE); put_dev_sector(sect); return; } /* Ensure we can handle previous case of VTOC with 8 entries gracefully */ max_nparts = le16_to_cpu(v->v_nparts) > 8 ? SOLARIS_X86_NUMSLICE : 8; for (i = 0; i < max_nparts && state->next < state->limit; i++) { struct solaris_x86_slice *s = &v->v_slice[i]; char tmp[3 + 10 + 1 + 1]; if (s->s_size == 0) continue; snprintf(tmp, sizeof(tmp), " [s%d]", i); strlcat(state->pp_buf, tmp, PAGE_SIZE); /* solaris partitions are relative to current MS-DOS * one; must add the offset of the current partition */ put_partition(state, state->next++, le32_to_cpu(s->s_start)+offset, le32_to_cpu(s->s_size)); } put_dev_sector(sect); strlcat(state->pp_buf, " >\n", PAGE_SIZE); #endif } /* check against BSD src/sys/sys/disklabel.h for consistency */ #define BSD_DISKMAGIC (0x82564557UL) /* The disk magic number */ #define BSD_MAXPARTITIONS 16 #define OPENBSD_MAXPARTITIONS 16 #define BSD_FS_UNUSED 0 /* disklabel unused partition entry ID */ struct bsd_disklabel { __le32 d_magic; /* the magic number */ __s16 d_type; /* drive type */ __s16 d_subtype; /* controller/d_type specific */ char d_typename[16]; /* type name, e.g. "eagle" */ char d_packname[16]; /* pack identifier */ __u32 d_secsize; /* # of bytes per sector */ __u32 d_nsectors; /* # of data sectors per track */ __u32 d_ntracks; /* # of tracks per cylinder */ __u32 d_ncylinders; /* # of data cylinders per unit */ __u32 d_secpercyl; /* # of data sectors per cylinder */ __u32 d_secperunit; /* # of data sectors per unit */ __u16 d_sparespertrack; /* # of spare sectors per track */ __u16 d_sparespercyl; /* # of spare sectors per cylinder */ __u32 d_acylinders; /* # of alt. cylinders per unit */ __u16 d_rpm; /* rotational speed */ __u16 d_interleave; /* hardware sector interleave */ __u16 d_trackskew; /* sector 0 skew, per track */ __u16 d_cylskew; /* sector 0 skew, per cylinder */ __u32 d_headswitch; /* head switch time, usec */ __u32 d_trkseek; /* track-to-track seek, usec */ __u32 d_flags; /* generic flags */ #define NDDATA 5 __u32 d_drivedata[NDDATA]; /* drive-type specific information */ #define NSPARE 5 __u32 d_spare[NSPARE]; /* reserved for future use */ __le32 d_magic2; /* the magic number (again) */ __le16 d_checksum; /* xor of data incl. partitions */ /* filesystem and partition information: */ __le16 d_npartitions; /* number of partitions in following */ __le32 d_bbsize; /* size of boot area at sn0, bytes */ __le32 d_sbsize; /* max size of fs superblock, bytes */ struct bsd_partition { /* the partition table */ __le32 p_size; /* number of sectors in partition */ __le32 p_offset; /* starting sector */ __le32 p_fsize; /* filesystem basic fragment size */ __u8 p_fstype; /* filesystem type, see below */ __u8 p_frag; /* filesystem fragments per block */ __le16 p_cpg; /* filesystem cylinders per group */ } d_partitions[BSD_MAXPARTITIONS]; /* actually may be more */ }; #if defined(CONFIG_BSD_DISKLABEL) /* * Create devices for BSD partitions listed in a disklabel, under a * dos-like partition. See parse_extended() for more information. */ static void parse_bsd(struct parsed_partitions *state, sector_t offset, sector_t size, int origin, char *flavour, int max_partitions) { Sector sect; struct bsd_disklabel *l; struct bsd_partition *p; char tmp[64]; l = read_part_sector(state, offset + 1, §); if (!l) return; if (le32_to_cpu(l->d_magic) != BSD_DISKMAGIC) { put_dev_sector(sect); return; } snprintf(tmp, sizeof(tmp), " %s%d: <%s:", state->name, origin, flavour); strlcat(state->pp_buf, tmp, PAGE_SIZE); if (le16_to_cpu(l->d_npartitions) < max_partitions) max_partitions = le16_to_cpu(l->d_npartitions); for (p = l->d_partitions; p - l->d_partitions < max_partitions; p++) { sector_t bsd_start, bsd_size; if (state->next == state->limit) break; if (p->p_fstype == BSD_FS_UNUSED) continue; bsd_start = le32_to_cpu(p->p_offset); bsd_size = le32_to_cpu(p->p_size); /* FreeBSD has relative offset if C partition offset is zero */ if (memcmp(flavour, "bsd\0", 4) == 0 && le32_to_cpu(l->d_partitions[2].p_offset) == 0) bsd_start += offset; if (offset == bsd_start && size == bsd_size) /* full parent partition, we have it already */ continue; if (offset > bsd_start || offset+size < bsd_start+bsd_size) { strlcat(state->pp_buf, "bad subpartition - ignored\n", PAGE_SIZE); continue; } put_partition(state, state->next++, bsd_start, bsd_size); } put_dev_sector(sect); if (le16_to_cpu(l->d_npartitions) > max_partitions) { snprintf(tmp, sizeof(tmp), " (ignored %d more)", le16_to_cpu(l->d_npartitions) - max_partitions); strlcat(state->pp_buf, tmp, PAGE_SIZE); } strlcat(state->pp_buf, " >\n", PAGE_SIZE); } #endif static void parse_freebsd(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_BSD_DISKLABEL parse_bsd(state, offset, size, origin, "bsd", BSD_MAXPARTITIONS); #endif } static void parse_netbsd(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_BSD_DISKLABEL parse_bsd(state, offset, size, origin, "netbsd", BSD_MAXPARTITIONS); #endif } static void parse_openbsd(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_BSD_DISKLABEL parse_bsd(state, offset, size, origin, "openbsd", OPENBSD_MAXPARTITIONS); #endif } #define UNIXWARE_DISKMAGIC (0xCA5E600DUL) /* The disk magic number */ #define UNIXWARE_DISKMAGIC2 (0x600DDEEEUL) /* The slice table magic nr */ #define UNIXWARE_NUMSLICE 16 #define UNIXWARE_FS_UNUSED 0 /* Unused slice entry ID */ struct unixware_slice { __le16 s_label; /* label */ __le16 s_flags; /* permission flags */ __le32 start_sect; /* starting sector */ __le32 nr_sects; /* number of sectors in slice */ }; struct unixware_disklabel { __le32 d_type; /* drive type */ __le32 d_magic; /* the magic number */ __le32 d_version; /* version number */ char d_serial[12]; /* serial number of the device */ __le32 d_ncylinders; /* # of data cylinders per device */ __le32 d_ntracks; /* # of tracks per cylinder */ __le32 d_nsectors; /* # of data sectors per track */ __le32 d_secsize; /* # of bytes per sector */ __le32 d_part_start; /* # of first sector of this partition*/ __le32 d_unknown1[12]; /* ? */ __le32 d_alt_tbl; /* byte offset of alternate table */ __le32 d_alt_len; /* byte length of alternate table */ __le32 d_phys_cyl; /* # of physical cylinders per device */ __le32 d_phys_trk; /* # of physical tracks per cylinder */ __le32 d_phys_sec; /* # of physical sectors per track */ __le32 d_phys_bytes; /* # of physical bytes per sector */ __le32 d_unknown2; /* ? */ __le32 d_unknown3; /* ? */ __le32 d_pad[8]; /* pad */ struct unixware_vtoc { __le32 v_magic; /* the magic number */ __le32 v_version; /* version number */ char v_name[8]; /* volume name */ __le16 v_nslices; /* # of slices */ __le16 v_unknown1; /* ? */ __le32 v_reserved[10]; /* reserved */ struct unixware_slice v_slice[UNIXWARE_NUMSLICE]; /* slice headers */ } vtoc; }; /* 408 */ /* * Create devices for Unixware partitions listed in a disklabel, under a * dos-like partition. See parse_extended() for more information. */ static void parse_unixware(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_UNIXWARE_DISKLABEL Sector sect; struct unixware_disklabel *l; struct unixware_slice *p; l = read_part_sector(state, offset + 29, §); if (!l) return; if (le32_to_cpu(l->d_magic) != UNIXWARE_DISKMAGIC || le32_to_cpu(l->vtoc.v_magic) != UNIXWARE_DISKMAGIC2) { put_dev_sector(sect); return; } { char tmp[1 + BDEVNAME_SIZE + 10 + 12 + 1]; snprintf(tmp, sizeof(tmp), " %s%d: <unixware:", state->name, origin); strlcat(state->pp_buf, tmp, PAGE_SIZE); } p = &l->vtoc.v_slice[1]; /* I omit the 0th slice as it is the same as whole disk. */ while (p - &l->vtoc.v_slice[0] < UNIXWARE_NUMSLICE) { if (state->next == state->limit) break; if (p->s_label != UNIXWARE_FS_UNUSED) put_partition(state, state->next++, le32_to_cpu(p->start_sect), le32_to_cpu(p->nr_sects)); p++; } put_dev_sector(sect); strlcat(state->pp_buf, " >\n", PAGE_SIZE); #endif } #define MINIX_NR_SUBPARTITIONS 4 /* * Minix 2.0.0/2.0.2 subpartition support. * Anand Krishnamurthy <anandk@wiproge.med.ge.com> * Rajeev V. Pillai <rajeevvp@yahoo.com> */ static void parse_minix(struct parsed_partitions *state, sector_t offset, sector_t size, int origin) { #ifdef CONFIG_MINIX_SUBPARTITION Sector sect; unsigned char *data; struct msdos_partition *p; int i; data = read_part_sector(state, offset, §); if (!data) return; p = (struct msdos_partition *)(data + 0x1be); /* The first sector of a Minix partition can have either * a secondary MBR describing its subpartitions, or * the normal boot sector. */ if (msdos_magic_present(data + 510) && p->sys_ind == MINIX_PARTITION) { /* subpartition table present */ char tmp[1 + BDEVNAME_SIZE + 10 + 9 + 1]; snprintf(tmp, sizeof(tmp), " %s%d: <minix:", state->name, origin); strlcat(state->pp_buf, tmp, PAGE_SIZE); for (i = 0; i < MINIX_NR_SUBPARTITIONS; i++, p++) { if (state->next == state->limit) break; /* add each partition in use */ if (p->sys_ind == MINIX_PARTITION) put_partition(state, state->next++, start_sect(p), nr_sects(p)); } strlcat(state->pp_buf, " >\n", PAGE_SIZE); } put_dev_sector(sect); #endif /* CONFIG_MINIX_SUBPARTITION */ } static struct { unsigned char id; void (*parse)(struct parsed_partitions *, sector_t, sector_t, int); } subtypes[] = { {FREEBSD_PARTITION, parse_freebsd}, {NETBSD_PARTITION, parse_netbsd}, {OPENBSD_PARTITION, parse_openbsd}, {MINIX_PARTITION, parse_minix}, {UNIXWARE_PARTITION, parse_unixware}, {SOLARIS_X86_PARTITION, parse_solaris_x86}, {NEW_SOLARIS_X86_PARTITION, parse_solaris_x86}, {0, NULL}, }; int msdos_partition(struct parsed_partitions *state) { sector_t sector_size; Sector sect; unsigned char *data; struct msdos_partition *p; struct fat_boot_sector *fb; int slot; u32 disksig; sector_size = queue_logical_block_size(state->disk->queue) / 512; data = read_part_sector(state, 0, §); if (!data) return -1; /* * Note order! (some AIX disks, e.g. unbootable kind, * have no MSDOS 55aa) */ if (aix_magic_present(state, data)) { put_dev_sector(sect); #ifdef CONFIG_AIX_PARTITION return aix_partition(state); #else strlcat(state->pp_buf, " [AIX]", PAGE_SIZE); return 0; #endif } if (!msdos_magic_present(data + 510)) { put_dev_sector(sect); return 0; } /* * Now that the 55aa signature is present, this is probably * either the boot sector of a FAT filesystem or a DOS-type * partition table. Reject this in case the boot indicator * is not 0 or 0x80. */ p = (struct msdos_partition *) (data + 0x1be); for (slot = 1; slot <= 4; slot++, p++) { if (p->boot_ind != 0 && p->boot_ind != 0x80) { /* * Even without a valid boot indicator value * its still possible this is valid FAT filesystem * without a partition table. */ fb = (struct fat_boot_sector *) data; if (slot == 1 && fb->reserved && fb->fats && fat_valid_media(fb->media)) { strlcat(state->pp_buf, "\n", PAGE_SIZE); put_dev_sector(sect); return 1; } else { put_dev_sector(sect); return 0; } } } #ifdef CONFIG_EFI_PARTITION p = (struct msdos_partition *) (data + 0x1be); for (slot = 1 ; slot <= 4 ; slot++, p++) { /* If this is an EFI GPT disk, msdos should ignore it. */ if (p->sys_ind == EFI_PMBR_OSTYPE_EFI_GPT) { put_dev_sector(sect); return 0; } } #endif p = (struct msdos_partition *) (data + 0x1be); disksig = le32_to_cpup((__le32 *)(data + 0x1b8)); /* * Look for partitions in two passes: * First find the primary and DOS-type extended partitions. * On the second pass look inside *BSD, Unixware and Solaris partitions. */ state->next = 5; for (slot = 1 ; slot <= 4 ; slot++, p++) { sector_t start = start_sect(p)*sector_size; sector_t size = nr_sects(p)*sector_size; if (!size) continue; if (is_extended_partition(p)) { /* * prevent someone doing mkfs or mkswap on an * extended partition, but leave room for LILO * FIXME: this uses one logical sector for > 512b * sector, although it may not be enough/proper. */ sector_t n = 2; n = min(size, max(sector_size, n)); put_partition(state, slot, start, n); strlcat(state->pp_buf, " <", PAGE_SIZE); parse_extended(state, start, size, disksig); strlcat(state->pp_buf, " >", PAGE_SIZE); continue; } put_partition(state, slot, start, size); set_info(state, slot, disksig); if (p->sys_ind == LINUX_RAID_PARTITION) state->parts[slot].flags = ADDPART_FLAG_RAID; if (p->sys_ind == DM6_PARTITION) strlcat(state->pp_buf, "[DM]", PAGE_SIZE); if (p->sys_ind == EZD_PARTITION) strlcat(state->pp_buf, "[EZD]", PAGE_SIZE); } strlcat(state->pp_buf, "\n", PAGE_SIZE); /* second pass - output for each on a separate line */ p = (struct msdos_partition *) (0x1be + data); for (slot = 1 ; slot <= 4 ; slot++, p++) { unsigned char id = p->sys_ind; int n; if (!nr_sects(p)) continue; for (n = 0; subtypes[n].parse && id != subtypes[n].id; n++) ; if (!subtypes[n].parse) continue; subtypes[n].parse(state, start_sect(p) * sector_size, nr_sects(p) * sector_size, slot); } put_dev_sector(sect); return 1; } |
59 1 1 6 50 1 1 1 1 2 20 9 19 2 41 3 34 7 1 1 1 14 9 16 20 13 26 13 39 37 2 25 13 39 39 9 1 22 12 1 20 5 15 10 10 19 1 20 20 20 20 30 21 4 3 2 2 31 6 21 3 49 49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/sched/act_police.c Input police filter * * Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> * J Hadi Salim (action changes) */ #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/init.h> #include <linux/slab.h> #include <net/act_api.h> #include <net/gso.h> #include <net/netlink.h> #include <net/pkt_cls.h> #include <net/tc_act/tc_police.h> #include <net/tc_wrapper.h> /* Each policer is serialized by its individual spinlock */ static struct tc_action_ops act_police_ops; static const struct nla_policy police_policy[TCA_POLICE_MAX + 1] = { [TCA_POLICE_RATE] = { .len = TC_RTAB_SIZE }, [TCA_POLICE_PEAKRATE] = { .len = TC_RTAB_SIZE }, [TCA_POLICE_AVRATE] = { .type = NLA_U32 }, [TCA_POLICE_RESULT] = { .type = NLA_U32 }, [TCA_POLICE_RATE64] = { .type = NLA_U64 }, [TCA_POLICE_PEAKRATE64] = { .type = NLA_U64 }, [TCA_POLICE_PKTRATE64] = { .type = NLA_U64, .min = 1 }, [TCA_POLICE_PKTBURST64] = { .type = NLA_U64, .min = 1 }, }; static int tcf_police_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, struct tcf_proto *tp, u32 flags, struct netlink_ext_ack *extack) { int ret = 0, tcfp_result = TC_ACT_OK, err, size; bool bind = flags & TCA_ACT_FLAGS_BIND; struct nlattr *tb[TCA_POLICE_MAX + 1]; struct tcf_chain *goto_ch = NULL; struct tc_police *parm; struct tcf_police *police; struct qdisc_rate_table *R_tab = NULL, *P_tab = NULL; struct tc_action_net *tn = net_generic(net, act_police_ops.net_id); struct tcf_police_params *new; bool exists = false; u32 index; u64 rate64, prate64; u64 pps, ppsburst; if (nla == NULL) return -EINVAL; err = nla_parse_nested_deprecated(tb, TCA_POLICE_MAX, nla, police_policy, NULL); if (err < 0) return err; if (tb[TCA_POLICE_TBF] == NULL) return -EINVAL; size = nla_len(tb[TCA_POLICE_TBF]); if (size != sizeof(*parm) && size != sizeof(struct tc_police_compat)) return -EINVAL; parm = nla_data(tb[TCA_POLICE_TBF]); index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (err < 0) return err; exists = err; if (exists && bind) return ACT_P_BOUND; if (!exists) { ret = tcf_idr_create(tn, index, NULL, a, &act_police_ops, bind, true, flags); if (ret) { tcf_idr_cleanup(tn, index); return ret; } ret = ACT_P_CREATED; spin_lock_init(&(to_police(*a)->tcfp_lock)); } else if (!(flags & TCA_ACT_FLAGS_REPLACE)) { tcf_idr_release(*a, bind); return -EEXIST; } err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); if (err < 0) goto release_idr; police = to_police(*a); if (parm->rate.rate) { err = -ENOMEM; R_tab = qdisc_get_rtab(&parm->rate, tb[TCA_POLICE_RATE], NULL); if (R_tab == NULL) goto failure; if (parm->peakrate.rate) { P_tab = qdisc_get_rtab(&parm->peakrate, tb[TCA_POLICE_PEAKRATE], NULL); if (P_tab == NULL) goto failure; } } if (est) { err = gen_replace_estimator(&police->tcf_bstats, police->common.cpu_bstats, &police->tcf_rate_est, &police->tcf_lock, false, est); if (err) goto failure; } else if (tb[TCA_POLICE_AVRATE] && (ret == ACT_P_CREATED || !gen_estimator_active(&police->tcf_rate_est))) { err = -EINVAL; goto failure; } if (tb[TCA_POLICE_RESULT]) { tcfp_result = nla_get_u32(tb[TCA_POLICE_RESULT]); if (TC_ACT_EXT_CMP(tcfp_result, TC_ACT_GOTO_CHAIN)) { NL_SET_ERR_MSG(extack, "goto chain not allowed on fallback"); err = -EINVAL; goto failure; } } if ((tb[TCA_POLICE_PKTRATE64] && !tb[TCA_POLICE_PKTBURST64]) || (!tb[TCA_POLICE_PKTRATE64] && tb[TCA_POLICE_PKTBURST64])) { NL_SET_ERR_MSG(extack, "Both or neither packet-per-second burst and rate must be provided"); err = -EINVAL; goto failure; } if (tb[TCA_POLICE_PKTRATE64] && R_tab) { NL_SET_ERR_MSG(extack, "packet-per-second and byte-per-second rate limits not allowed in same action"); err = -EINVAL; goto failure; } new = kzalloc(sizeof(*new), GFP_KERNEL); if (unlikely(!new)) { err = -ENOMEM; goto failure; } /* No failure allowed after this point */ new->tcfp_result = tcfp_result; new->tcfp_mtu = parm->mtu; if (!new->tcfp_mtu) { new->tcfp_mtu = ~0; if (R_tab) new->tcfp_mtu = 255 << R_tab->rate.cell_log; } if (R_tab) { new->rate_present = true; rate64 = nla_get_u64_default(tb[TCA_POLICE_RATE64], 0); psched_ratecfg_precompute(&new->rate, &R_tab->rate, rate64); qdisc_put_rtab(R_tab); } else { new->rate_present = false; } if (P_tab) { new->peak_present = true; prate64 = nla_get_u64_default(tb[TCA_POLICE_PEAKRATE64], 0); psched_ratecfg_precompute(&new->peak, &P_tab->rate, prate64); qdisc_put_rtab(P_tab); } else { new->peak_present = false; } new->tcfp_burst = PSCHED_TICKS2NS(parm->burst); if (new->peak_present) new->tcfp_mtu_ptoks = (s64)psched_l2t_ns(&new->peak, new->tcfp_mtu); if (tb[TCA_POLICE_AVRATE]) new->tcfp_ewma_rate = nla_get_u32(tb[TCA_POLICE_AVRATE]); if (tb[TCA_POLICE_PKTRATE64]) { pps = nla_get_u64(tb[TCA_POLICE_PKTRATE64]); ppsburst = nla_get_u64(tb[TCA_POLICE_PKTBURST64]); new->pps_present = true; new->tcfp_pkt_burst = PSCHED_TICKS2NS(ppsburst); psched_ppscfg_precompute(&new->ppsrate, pps); } spin_lock_bh(&police->tcf_lock); spin_lock_bh(&police->tcfp_lock); police->tcfp_t_c = ktime_get_ns(); police->tcfp_toks = new->tcfp_burst; if (new->peak_present) police->tcfp_ptoks = new->tcfp_mtu_ptoks; spin_unlock_bh(&police->tcfp_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); new = rcu_replace_pointer(police->params, new, lockdep_is_held(&police->tcf_lock)); spin_unlock_bh(&police->tcf_lock); if (goto_ch) tcf_chain_put_by_act(goto_ch); if (new) kfree_rcu(new, rcu); return ret; failure: qdisc_put_rtab(P_tab); qdisc_put_rtab(R_tab); if (goto_ch) tcf_chain_put_by_act(goto_ch); release_idr: tcf_idr_release(*a, bind); return err; } static bool tcf_police_mtu_check(struct sk_buff *skb, u32 limit) { u32 len; if (skb_is_gso(skb)) return skb_gso_validate_mac_len(skb, limit); len = qdisc_pkt_len(skb); if (skb_at_tc_ingress(skb)) len += skb->mac_len; return len <= limit; } TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_police *police = to_police(a); s64 now, toks, ppstoks = 0, ptoks = 0; struct tcf_police_params *p; int ret; tcf_lastuse_update(&police->tcf_tm); bstats_update(this_cpu_ptr(police->common.cpu_bstats), skb); ret = READ_ONCE(police->tcf_action); p = rcu_dereference_bh(police->params); if (p->tcfp_ewma_rate) { struct gnet_stats_rate_est64 sample; if (!gen_estimator_read(&police->tcf_rate_est, &sample) || sample.bps >= p->tcfp_ewma_rate) goto inc_overlimits; } if (tcf_police_mtu_check(skb, p->tcfp_mtu)) { if (!p->rate_present && !p->pps_present) { ret = p->tcfp_result; goto end; } now = ktime_get_ns(); spin_lock_bh(&police->tcfp_lock); toks = min_t(s64, now - police->tcfp_t_c, p->tcfp_burst); if (p->peak_present) { ptoks = toks + police->tcfp_ptoks; if (ptoks > p->tcfp_mtu_ptoks) ptoks = p->tcfp_mtu_ptoks; ptoks -= (s64)psched_l2t_ns(&p->peak, qdisc_pkt_len(skb)); } if (p->rate_present) { toks += police->tcfp_toks; if (toks > p->tcfp_burst) toks = p->tcfp_burst; toks -= (s64)psched_l2t_ns(&p->rate, qdisc_pkt_len(skb)); } else if (p->pps_present) { ppstoks = min_t(s64, now - police->tcfp_t_c, p->tcfp_pkt_burst); ppstoks += police->tcfp_pkttoks; if (ppstoks > p->tcfp_pkt_burst) ppstoks = p->tcfp_pkt_burst; ppstoks -= (s64)psched_pkt2t_ns(&p->ppsrate, 1); } if ((toks | ptoks | ppstoks) >= 0) { police->tcfp_t_c = now; police->tcfp_toks = toks; police->tcfp_ptoks = ptoks; police->tcfp_pkttoks = ppstoks; spin_unlock_bh(&police->tcfp_lock); ret = p->tcfp_result; goto inc_drops; } spin_unlock_bh(&police->tcfp_lock); } inc_overlimits: qstats_overlimit_inc(this_cpu_ptr(police->common.cpu_qstats)); inc_drops: if (ret == TC_ACT_SHOT) qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats)); end: return ret; } static void tcf_police_cleanup(struct tc_action *a) { struct tcf_police *police = to_police(a); struct tcf_police_params *p; p = rcu_dereference_protected(police->params, 1); if (p) kfree_rcu(p, rcu); } static void tcf_police_stats_update(struct tc_action *a, u64 bytes, u64 packets, u64 drops, u64 lastuse, bool hw) { struct tcf_police *police = to_police(a); struct tcf_t *tm = &police->tcf_tm; tcf_action_update_stats(a, bytes, packets, drops, hw); tm->lastuse = max_t(u64, tm->lastuse, lastuse); } static int tcf_police_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { unsigned char *b = skb_tail_pointer(skb); struct tcf_police *police = to_police(a); struct tcf_police_params *p; struct tc_police opt = { .index = police->tcf_index, .refcnt = refcount_read(&police->tcf_refcnt) - ref, .bindcnt = atomic_read(&police->tcf_bindcnt) - bind, }; struct tcf_t t; spin_lock_bh(&police->tcf_lock); opt.action = police->tcf_action; p = rcu_dereference_protected(police->params, lockdep_is_held(&police->tcf_lock)); opt.mtu = p->tcfp_mtu; opt.burst = PSCHED_NS2TICKS(p->tcfp_burst); if (p->rate_present) { psched_ratecfg_getrate(&opt.rate, &p->rate); if ((p->rate.rate_bytes_ps >= (1ULL << 32)) && nla_put_u64_64bit(skb, TCA_POLICE_RATE64, p->rate.rate_bytes_ps, TCA_POLICE_PAD)) goto nla_put_failure; } if (p->peak_present) { psched_ratecfg_getrate(&opt.peakrate, &p->peak); if ((p->peak.rate_bytes_ps >= (1ULL << 32)) && nla_put_u64_64bit(skb, TCA_POLICE_PEAKRATE64, p->peak.rate_bytes_ps, TCA_POLICE_PAD)) goto nla_put_failure; } if (p->pps_present) { if (nla_put_u64_64bit(skb, TCA_POLICE_PKTRATE64, p->ppsrate.rate_pkts_ps, TCA_POLICE_PAD)) goto nla_put_failure; if (nla_put_u64_64bit(skb, TCA_POLICE_PKTBURST64, PSCHED_NS2TICKS(p->tcfp_pkt_burst), TCA_POLICE_PAD)) goto nla_put_failure; } if (nla_put(skb, TCA_POLICE_TBF, sizeof(opt), &opt)) goto nla_put_failure; if (p->tcfp_result && nla_put_u32(skb, TCA_POLICE_RESULT, p->tcfp_result)) goto nla_put_failure; if (p->tcfp_ewma_rate && nla_put_u32(skb, TCA_POLICE_AVRATE, p->tcfp_ewma_rate)) goto nla_put_failure; tcf_tm_dump(&t, &police->tcf_tm); if (nla_put_64bit(skb, TCA_POLICE_TM, sizeof(t), &t, TCA_POLICE_PAD)) goto nla_put_failure; spin_unlock_bh(&police->tcf_lock); return skb->len; nla_put_failure: spin_unlock_bh(&police->tcf_lock); nlmsg_trim(skb, b); return -1; } static int tcf_police_act_to_flow_act(int tc_act, u32 *extval, struct netlink_ext_ack *extack) { int act_id = -EOPNOTSUPP; if (!TC_ACT_EXT_OPCODE(tc_act)) { if (tc_act == TC_ACT_OK) act_id = FLOW_ACTION_ACCEPT; else if (tc_act == TC_ACT_SHOT) act_id = FLOW_ACTION_DROP; else if (tc_act == TC_ACT_PIPE) act_id = FLOW_ACTION_PIPE; else if (tc_act == TC_ACT_RECLASSIFY) NL_SET_ERR_MSG_MOD(extack, "Offload not supported when conform/exceed action is \"reclassify\""); else NL_SET_ERR_MSG_MOD(extack, "Unsupported conform/exceed action offload"); } else if (TC_ACT_EXT_CMP(tc_act, TC_ACT_GOTO_CHAIN)) { act_id = FLOW_ACTION_GOTO; *extval = tc_act & TC_ACT_EXT_VAL_MASK; } else if (TC_ACT_EXT_CMP(tc_act, TC_ACT_JUMP)) { act_id = FLOW_ACTION_JUMP; *extval = tc_act & TC_ACT_EXT_VAL_MASK; } else if (tc_act == TC_ACT_UNSPEC) { act_id = FLOW_ACTION_CONTINUE; } else { NL_SET_ERR_MSG_MOD(extack, "Unsupported conform/exceed action offload"); } return act_id; } static int tcf_police_offload_act_setup(struct tc_action *act, void *entry_data, u32 *index_inc, bool bind, struct netlink_ext_ack *extack) { if (bind) { struct flow_action_entry *entry = entry_data; struct tcf_police *police = to_police(act); struct tcf_police_params *p; int act_id; p = rcu_dereference_protected(police->params, lockdep_is_held(&police->tcf_lock)); entry->id = FLOW_ACTION_POLICE; entry->police.burst = tcf_police_burst(act); entry->police.rate_bytes_ps = tcf_police_rate_bytes_ps(act); entry->police.peakrate_bytes_ps = tcf_police_peakrate_bytes_ps(act); entry->police.avrate = tcf_police_tcfp_ewma_rate(act); entry->police.overhead = tcf_police_rate_overhead(act); entry->police.burst_pkt = tcf_police_burst_pkt(act); entry->police.rate_pkt_ps = tcf_police_rate_pkt_ps(act); entry->police.mtu = tcf_police_tcfp_mtu(act); act_id = tcf_police_act_to_flow_act(police->tcf_action, &entry->police.exceed.extval, extack); if (act_id < 0) return act_id; entry->police.exceed.act_id = act_id; act_id = tcf_police_act_to_flow_act(p->tcfp_result, &entry->police.notexceed.extval, extack); if (act_id < 0) return act_id; entry->police.notexceed.act_id = act_id; *index_inc = 1; } else { struct flow_offload_action *fl_action = entry_data; fl_action->id = FLOW_ACTION_POLICE; } return 0; } MODULE_AUTHOR("Alexey Kuznetsov"); MODULE_DESCRIPTION("Policing actions"); MODULE_LICENSE("GPL"); static struct tc_action_ops act_police_ops = { .kind = "police", .id = TCA_ID_POLICE, .owner = THIS_MODULE, .stats_update = tcf_police_stats_update, .act = tcf_police_act, .dump = tcf_police_dump, .init = tcf_police_init, .cleanup = tcf_police_cleanup, .offload_act_setup = tcf_police_offload_act_setup, .size = sizeof(struct tcf_police), }; MODULE_ALIAS_NET_ACT("police"); static __net_init int police_init_net(struct net *net) { struct tc_action_net *tn = net_generic(net, act_police_ops.net_id); return tc_action_net_init(net, tn, &act_police_ops); } static void __net_exit police_exit_net(struct list_head *net_list) { tc_action_net_exit(net_list, act_police_ops.net_id); } static struct pernet_operations police_net_ops = { .init = police_init_net, .exit_batch = police_exit_net, .id = &act_police_ops.net_id, .size = sizeof(struct tc_action_net), }; static int __init police_init_module(void) { return tcf_register_action(&act_police_ops, &police_net_ops); } static void __exit police_cleanup_module(void) { tcf_unregister_action(&act_police_ops, &police_net_ops); } module_init(police_init_module); module_exit(police_cleanup_module); |
456 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | // SPDX-License-Identifier: GPL-2.0-or-later /* * tsacct.c - System accounting over taskstats interface * * Copyright (C) Jay Lan, <jlan@sgi.com> */ #include <linux/kernel.h> #include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/sched/cputime.h> #include <linux/tsacct_kern.h> #include <linux/acct.h> #include <linux/jiffies.h> #include <linux/mm.h> /* * fill in basic accounting fields */ void bacct_add_tsk(struct user_namespace *user_ns, struct pid_namespace *pid_ns, struct taskstats *stats, struct task_struct *tsk) { const struct cred *tcred; u64 utime, stime, utimescaled, stimescaled; u64 now_ns, delta; time64_t btime; BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN); /* calculate task elapsed time in nsec */ now_ns = ktime_get_ns(); /* store whole group time first */ delta = now_ns - tsk->group_leader->start_time; /* Convert to micro seconds */ do_div(delta, NSEC_PER_USEC); stats->ac_tgetime = delta; delta = now_ns - tsk->start_time; do_div(delta, NSEC_PER_USEC); stats->ac_etime = delta; /* Convert to seconds for btime (note y2106 limit) */ btime = ktime_get_real_seconds() - div_u64(delta, USEC_PER_SEC); stats->ac_btime = clamp_t(time64_t, btime, 0, U32_MAX); stats->ac_btime64 = btime; if (tsk->flags & PF_EXITING) stats->ac_exitcode = tsk->exit_code; if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC)) stats->ac_flag |= AFORK; if (tsk->flags & PF_SUPERPRIV) stats->ac_flag |= ASU; if (tsk->flags & PF_DUMPCORE) stats->ac_flag |= ACORE; if (tsk->flags & PF_SIGNALED) stats->ac_flag |= AXSIG; stats->ac_nice = task_nice(tsk); stats->ac_sched = tsk->policy; stats->ac_pid = task_pid_nr_ns(tsk, pid_ns); stats->ac_tgid = task_tgid_nr_ns(tsk, pid_ns); rcu_read_lock(); tcred = __task_cred(tsk); stats->ac_uid = from_kuid_munged(user_ns, tcred->uid); stats->ac_gid = from_kgid_munged(user_ns, tcred->gid); stats->ac_ppid = pid_alive(tsk) ? task_tgid_nr_ns(rcu_dereference(tsk->real_parent), pid_ns) : 0; rcu_read_unlock(); task_cputime(tsk, &utime, &stime); stats->ac_utime = div_u64(utime, NSEC_PER_USEC); stats->ac_stime = div_u64(stime, NSEC_PER_USEC); task_cputime_scaled(tsk, &utimescaled, &stimescaled); stats->ac_utimescaled = div_u64(utimescaled, NSEC_PER_USEC); stats->ac_stimescaled = div_u64(stimescaled, NSEC_PER_USEC); stats->ac_minflt = tsk->min_flt; stats->ac_majflt = tsk->maj_flt; strscpy_pad(stats->ac_comm, tsk->comm); } #ifdef CONFIG_TASK_XACCT #define KB 1024 #define MB (1024*KB) #define KB_MASK (~(KB-1)) /* * fill in extended accounting fields */ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p) { struct mm_struct *mm; /* convert pages-nsec/1024 to Mbyte-usec, see __acct_update_integrals */ stats->coremem = p->acct_rss_mem1 * PAGE_SIZE; do_div(stats->coremem, 1000 * KB); stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE; do_div(stats->virtmem, 1000 * KB); mm = get_task_mm(p); if (mm) { /* adjust to KB unit */ stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB; stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB; mmput(mm); } stats->read_char = p->ioac.rchar & KB_MASK; stats->write_char = p->ioac.wchar & KB_MASK; stats->read_syscalls = p->ioac.syscr & KB_MASK; stats->write_syscalls = p->ioac.syscw & KB_MASK; #ifdef CONFIG_TASK_IO_ACCOUNTING stats->read_bytes = p->ioac.read_bytes & KB_MASK; stats->write_bytes = p->ioac.write_bytes & KB_MASK; stats->cancelled_write_bytes = p->ioac.cancelled_write_bytes & KB_MASK; #else stats->read_bytes = 0; stats->write_bytes = 0; stats->cancelled_write_bytes = 0; #endif } #undef KB #undef MB static void __acct_update_integrals(struct task_struct *tsk, u64 utime, u64 stime) { u64 time, delta; if (!likely(tsk->mm)) return; time = stime + utime; delta = time - tsk->acct_timexpd; if (delta < TICK_NSEC) return; tsk->acct_timexpd = time; /* * Divide by 1024 to avoid overflow, and to avoid division. * The final unit reported to userspace is Mbyte-usecs, * the rest of the math is done in xacct_add_tsk. */ tsk->acct_rss_mem1 += delta * get_mm_rss(tsk->mm) >> 10; tsk->acct_vm_mem1 += delta * READ_ONCE(tsk->mm->total_vm) >> 10; } /** * acct_update_integrals - update mm integral fields in task_struct * @tsk: task_struct for accounting */ void acct_update_integrals(struct task_struct *tsk) { u64 utime, stime; unsigned long flags; local_irq_save(flags); task_cputime(tsk, &utime, &stime); __acct_update_integrals(tsk, utime, stime); local_irq_restore(flags); } /** * acct_account_cputime - update mm integral after cputime update * @tsk: task_struct for accounting */ void acct_account_cputime(struct task_struct *tsk) { __acct_update_integrals(tsk, tsk->utime, tsk->stime); } /** * acct_clear_integrals - clear the mm integral fields in task_struct * @tsk: task_struct whose accounting fields are cleared */ void acct_clear_integrals(struct task_struct *tsk) { tsk->acct_timexpd = 0; tsk->acct_rss_mem1 = 0; tsk->acct_vm_mem1 = 0; } #endif |
3 32 3 3 28 35 35 28 5 28 1 1 28 4 6 2 7 2 5 10 10 5 5 4 2 2 2 2 3 3 49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | // SPDX-License-Identifier: GPL-2.0-only /* (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org> * (C) 2006-2012 Patrick McHardy <kaber@trash.net> */ #include <linux/types.h> #include <linux/timer.h> #include <linux/module.h> #include <linux/udp.h> #include <linux/seq_file.h> #include <linux/skbuff.h> #include <linux/ipv6.h> #include <net/ip6_checksum.h> #include <net/checksum.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <linux/netfilter_ipv6.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_log.h> #include <net/netfilter/ipv4/nf_conntrack_ipv4.h> #include <net/netfilter/ipv6/nf_conntrack_ipv6.h> static const unsigned int udp_timeouts[UDP_CT_MAX] = { [UDP_CT_UNREPLIED] = 30*HZ, [UDP_CT_REPLIED] = 120*HZ, }; static unsigned int *udp_get_timeouts(struct net *net) { return nf_udp_pernet(net)->timeouts; } static void udp_error_log(const struct sk_buff *skb, const struct nf_hook_state *state, const char *msg) { nf_l4proto_log_invalid(skb, state, IPPROTO_UDP, "%s", msg); } static bool udp_error(struct sk_buff *skb, unsigned int dataoff, const struct nf_hook_state *state) { unsigned int udplen = skb->len - dataoff; const struct udphdr *hdr; struct udphdr _hdr; /* Header is too small? */ hdr = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr); if (!hdr) { udp_error_log(skb, state, "short packet"); return true; } /* Truncated/malformed packets */ if (ntohs(hdr->len) > udplen || ntohs(hdr->len) < sizeof(*hdr)) { udp_error_log(skb, state, "truncated/malformed packet"); return true; } /* Packet with no checksum */ if (!hdr->check) return false; /* Checksum invalid? Ignore. * We skip checking packets on the outgoing path * because the checksum is assumed to be correct. * FIXME: Source route IP option packets --RR */ if (state->hook == NF_INET_PRE_ROUTING && state->net->ct.sysctl_checksum && nf_checksum(skb, state->hook, dataoff, IPPROTO_UDP, state->pf)) { udp_error_log(skb, state, "bad checksum"); return true; } return false; } /* Returns verdict for packet, and may modify conntracktype */ int nf_conntrack_udp_packet(struct nf_conn *ct, struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state) { unsigned int *timeouts; unsigned long status; if (udp_error(skb, dataoff, state)) return -NF_ACCEPT; timeouts = nf_ct_timeout_lookup(ct); if (!timeouts) timeouts = udp_get_timeouts(nf_ct_net(ct)); status = READ_ONCE(ct->status); if ((status & IPS_CONFIRMED) == 0) ct->proto.udp.stream_ts = 2 * HZ + jiffies; /* If we've seen traffic both ways, this is some kind of UDP * stream. Set Assured. */ if (status & IPS_SEEN_REPLY) { unsigned long extra = timeouts[UDP_CT_UNREPLIED]; bool stream = false; /* Still active after two seconds? Extend timeout. */ if (time_after(jiffies, ct->proto.udp.stream_ts)) { extra = timeouts[UDP_CT_REPLIED]; stream = (status & IPS_ASSURED) == 0; } nf_ct_refresh_acct(ct, ctinfo, skb, extra); /* never set ASSURED for IPS_NAT_CLASH, they time out soon */ if (unlikely((status & IPS_NAT_CLASH))) return NF_ACCEPT; /* Also, more likely to be important, and not a probe */ if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) nf_conntrack_event_cache(IPCT_ASSURED, ct); } else { nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); } return NF_ACCEPT; } #ifdef CONFIG_NF_CT_PROTO_UDPLITE static void udplite_error_log(const struct sk_buff *skb, const struct nf_hook_state *state, const char *msg) { nf_l4proto_log_invalid(skb, state, IPPROTO_UDPLITE, "%s", msg); } static bool udplite_error(struct sk_buff *skb, unsigned int dataoff, const struct nf_hook_state *state) { unsigned int udplen = skb->len - dataoff; const struct udphdr *hdr; struct udphdr _hdr; unsigned int cscov; /* Header is too small? */ hdr = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr); if (!hdr) { udplite_error_log(skb, state, "short packet"); return true; } cscov = ntohs(hdr->len); if (cscov == 0) { cscov = udplen; } else if (cscov < sizeof(*hdr) || cscov > udplen) { udplite_error_log(skb, state, "invalid checksum coverage"); return true; } /* UDPLITE mandates checksums */ if (!hdr->check) { udplite_error_log(skb, state, "checksum missing"); return true; } /* Checksum invalid? Ignore. */ if (state->hook == NF_INET_PRE_ROUTING && state->net->ct.sysctl_checksum && nf_checksum_partial(skb, state->hook, dataoff, cscov, IPPROTO_UDP, state->pf)) { udplite_error_log(skb, state, "bad checksum"); return true; } return false; } /* Returns verdict for packet, and may modify conntracktype */ int nf_conntrack_udplite_packet(struct nf_conn *ct, struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state) { unsigned int *timeouts; if (udplite_error(skb, dataoff, state)) return -NF_ACCEPT; timeouts = nf_ct_timeout_lookup(ct); if (!timeouts) timeouts = udp_get_timeouts(nf_ct_net(ct)); /* If we've seen traffic both ways, this is some kind of UDP stream. Extend timeout. */ if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_REPLIED]); if (unlikely((ct->status & IPS_NAT_CLASH))) return NF_ACCEPT; /* Also, more likely to be important, and not a probe */ if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) nf_conntrack_event_cache(IPCT_ASSURED, ct); } else { nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); } return NF_ACCEPT; } #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nfnetlink_cttimeout.h> static int udp_timeout_nlattr_to_obj(struct nlattr *tb[], struct net *net, void *data) { unsigned int *timeouts = data; struct nf_udp_net *un = nf_udp_pernet(net); if (!timeouts) timeouts = un->timeouts; /* set default timeouts for UDP. */ timeouts[UDP_CT_UNREPLIED] = un->timeouts[UDP_CT_UNREPLIED]; timeouts[UDP_CT_REPLIED] = un->timeouts[UDP_CT_REPLIED]; if (tb[CTA_TIMEOUT_UDP_UNREPLIED]) { timeouts[UDP_CT_UNREPLIED] = ntohl(nla_get_be32(tb[CTA_TIMEOUT_UDP_UNREPLIED])) * HZ; } if (tb[CTA_TIMEOUT_UDP_REPLIED]) { timeouts[UDP_CT_REPLIED] = ntohl(nla_get_be32(tb[CTA_TIMEOUT_UDP_REPLIED])) * HZ; } return 0; } static int udp_timeout_obj_to_nlattr(struct sk_buff *skb, const void *data) { const unsigned int *timeouts = data; if (nla_put_be32(skb, CTA_TIMEOUT_UDP_UNREPLIED, htonl(timeouts[UDP_CT_UNREPLIED] / HZ)) || nla_put_be32(skb, CTA_TIMEOUT_UDP_REPLIED, htonl(timeouts[UDP_CT_REPLIED] / HZ))) goto nla_put_failure; return 0; nla_put_failure: return -ENOSPC; } static const struct nla_policy udp_timeout_nla_policy[CTA_TIMEOUT_UDP_MAX+1] = { [CTA_TIMEOUT_UDP_UNREPLIED] = { .type = NLA_U32 }, [CTA_TIMEOUT_UDP_REPLIED] = { .type = NLA_U32 }, }; #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ void nf_conntrack_udp_init_net(struct net *net) { struct nf_udp_net *un = nf_udp_pernet(net); int i; for (i = 0; i < UDP_CT_MAX; i++) un->timeouts[i] = udp_timeouts[i]; #if IS_ENABLED(CONFIG_NF_FLOW_TABLE) un->offload_timeout = 30 * HZ; #endif } const struct nf_conntrack_l4proto nf_conntrack_l4proto_udp = { .l4proto = IPPROTO_UDP, .allow_clash = true, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr, .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple, .nlattr_tuple_size = nf_ct_port_nlattr_tuple_size, .nla_policy = nf_ct_port_nla_policy, #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT .ctnl_timeout = { .nlattr_to_obj = udp_timeout_nlattr_to_obj, .obj_to_nlattr = udp_timeout_obj_to_nlattr, .nlattr_max = CTA_TIMEOUT_UDP_MAX, .obj_size = sizeof(unsigned int) * CTA_TIMEOUT_UDP_MAX, .nla_policy = udp_timeout_nla_policy, }, #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ }; #ifdef CONFIG_NF_CT_PROTO_UDPLITE const struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite = { .l4proto = IPPROTO_UDPLITE, .allow_clash = true, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr, .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple, .nlattr_tuple_size = nf_ct_port_nlattr_tuple_size, .nla_policy = nf_ct_port_nla_policy, #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT .ctnl_timeout = { .nlattr_to_obj = udp_timeout_nlattr_to_obj, .obj_to_nlattr = udp_timeout_obj_to_nlattr, .nlattr_max = CTA_TIMEOUT_UDP_MAX, .obj_size = sizeof(unsigned int) * CTA_TIMEOUT_UDP_MAX, .nla_policy = udp_timeout_nla_policy, }, #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ }; #endif |
295 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM bpf_trace #if !defined(_TRACE_BPF_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_BPF_TRACE_H #include <linux/tracepoint.h> TRACE_EVENT(bpf_trace_printk, TP_PROTO(const char *bpf_string), TP_ARGS(bpf_string), TP_STRUCT__entry( __string(bpf_string, bpf_string) ), TP_fast_assign( __assign_str(bpf_string); ), TP_printk("%s", __get_str(bpf_string)) ); #endif /* _TRACE_BPF_TRACE_H */ #undef TRACE_INCLUDE_PATH #define TRACE_INCLUDE_PATH . #define TRACE_INCLUDE_FILE bpf_trace #include <trace/define_trace.h> |
1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2010 IBM Corporation * Copyright (C) 2010 Politecnico di Torino, Italy * TORSEC group -- https://security.polito.it * * Authors: * Mimi Zohar <zohar@us.ibm.com> * Roberto Sassu <roberto.sassu@polito.it> * * See Documentation/security/keys/trusted-encrypted.rst */ #include <linux/uaccess.h> #include <linux/err.h> #include <keys/trusted-type.h> #include <keys/encrypted-type.h> #include "encrypted.h" /* * request_trusted_key - request the trusted key * * Trusted keys are sealed to PCRs and other metadata. Although userspace * manages both trusted/encrypted key-types, like the encrypted key type * data, trusted key type data is not visible decrypted from userspace. */ struct key *request_trusted_key(const char *trusted_desc, const u8 **master_key, size_t *master_keylen) { struct trusted_key_payload *tpayload; struct key *tkey; tkey = request_key(&key_type_trusted, trusted_desc, NULL); if (IS_ERR(tkey)) goto error; down_read(&tkey->sem); tpayload = tkey->payload.data[0]; *master_key = tpayload->key; *master_keylen = tpayload->key_len; error: return tkey; } |
20 3 15 15 10 10 4 15 15 9 5 1 1 6 6 1 6 9 5 4 5 1 1 2 1 1 1 1 7 13 1 1 1 1 32 1 2 29 27 2 11 24 3 23 1 12 10 6 4 1 2 9 6 15 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 | /* net/tipc/udp_media.c: IP bearer support for TIPC * * Copyright (c) 2015, Ericsson AB * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the names of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * Alternatively, this software may be distributed under the terms of the * GNU General Public License ("GPL") version 2 as published by the Free * Software Foundation. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include <linux/socket.h> #include <linux/ip.h> #include <linux/udp.h> #include <linux/inet.h> #include <linux/inetdevice.h> #include <linux/igmp.h> #include <linux/kernel.h> #include <linux/workqueue.h> #include <linux/list.h> #include <net/sock.h> #include <net/ip.h> #include <net/udp_tunnel.h> #include <net/ipv6_stubs.h> #include <linux/tipc_netlink.h> #include "core.h" #include "addr.h" #include "net.h" #include "bearer.h" #include "netlink.h" #include "msg.h" #include "udp_media.h" /* IANA assigned UDP port */ #define UDP_PORT_DEFAULT 6118 #define UDP_MIN_HEADROOM 48 /** * struct udp_media_addr - IP/UDP addressing information * * This is the bearer level originating address used in neighbor discovery * messages, and all fields should be in network byte order * * @proto: Ethernet protocol in use * @port: port being used * @ipv4: IPv4 address of neighbor * @ipv6: IPv6 address of neighbor */ struct udp_media_addr { __be16 proto; __be16 port; union { struct in_addr ipv4; struct in6_addr ipv6; }; }; /* struct udp_replicast - container for UDP remote addresses */ struct udp_replicast { struct udp_media_addr addr; struct dst_cache dst_cache; struct rcu_head rcu; struct list_head list; }; /** * struct udp_bearer - ip/udp bearer data structure * @bearer: associated generic tipc bearer * @ubsock: bearer associated socket * @ifindex: local address scope * @work: used to schedule deferred work on a bearer * @rcast: associated udp_replicast container */ struct udp_bearer { struct tipc_bearer __rcu *bearer; struct socket *ubsock; u32 ifindex; struct work_struct work; struct udp_replicast rcast; }; static int tipc_udp_is_mcast_addr(struct udp_media_addr *addr) { if (ntohs(addr->proto) == ETH_P_IP) return ipv4_is_multicast(addr->ipv4.s_addr); #if IS_ENABLED(CONFIG_IPV6) else return ipv6_addr_is_multicast(&addr->ipv6); #endif return 0; } /* udp_media_addr_set - convert a ip/udp address to a TIPC media address */ static void tipc_udp_media_addr_set(struct tipc_media_addr *addr, struct udp_media_addr *ua) { memset(addr, 0, sizeof(struct tipc_media_addr)); addr->media_id = TIPC_MEDIA_TYPE_UDP; memcpy(addr->value, ua, sizeof(struct udp_media_addr)); if (tipc_udp_is_mcast_addr(ua)) addr->broadcast = TIPC_BROADCAST_SUPPORT; } /* tipc_udp_addr2str - convert ip/udp address to string */ static int tipc_udp_addr2str(struct tipc_media_addr *a, char *buf, int size) { struct udp_media_addr *ua = (struct udp_media_addr *)&a->value; if (ntohs(ua->proto) == ETH_P_IP) snprintf(buf, size, "%pI4:%u", &ua->ipv4, ntohs(ua->port)); else if (ntohs(ua->proto) == ETH_P_IPV6) snprintf(buf, size, "%pI6:%u", &ua->ipv6, ntohs(ua->port)); else { pr_err("Invalid UDP media address\n"); return 1; } return 0; } /* tipc_udp_msg2addr - extract an ip/udp address from a TIPC ndisc message */ static int tipc_udp_msg2addr(struct tipc_bearer *b, struct tipc_media_addr *a, char *msg) { struct udp_media_addr *ua; ua = (struct udp_media_addr *) (msg + TIPC_MEDIA_ADDR_OFFSET); if (msg[TIPC_MEDIA_TYPE_OFFSET] != TIPC_MEDIA_TYPE_UDP) return -EINVAL; tipc_udp_media_addr_set(a, ua); return 0; } /* tipc_udp_addr2msg - write an ip/udp address to a TIPC ndisc message */ static int tipc_udp_addr2msg(char *msg, struct tipc_media_addr *a) { memset(msg, 0, TIPC_MEDIA_INFO_SIZE); msg[TIPC_MEDIA_TYPE_OFFSET] = TIPC_MEDIA_TYPE_UDP; memcpy(msg + TIPC_MEDIA_ADDR_OFFSET, a->value, sizeof(struct udp_media_addr)); return 0; } /* tipc_send_msg - enqueue a send request */ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb, struct udp_bearer *ub, struct udp_media_addr *src, struct udp_media_addr *dst, struct dst_cache *cache) { struct dst_entry *ndst; int ttl, err = 0; local_bh_disable(); ndst = dst_cache_get(cache); if (dst->proto == htons(ETH_P_IP)) { struct rtable *rt = dst_rtable(ndst); if (!rt) { struct flowi4 fl = { .daddr = dst->ipv4.s_addr, .saddr = src->ipv4.s_addr, .flowi4_mark = skb->mark, .flowi4_proto = IPPROTO_UDP }; rt = ip_route_output_key(net, &fl); if (IS_ERR(rt)) { err = PTR_ERR(rt); goto tx_error; } dst_cache_set_ip4(cache, &rt->dst, fl.saddr); } ttl = ip4_dst_hoplimit(&rt->dst); udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr, dst->ipv4.s_addr, 0, ttl, 0, src->port, dst->port, false, true); #if IS_ENABLED(CONFIG_IPV6) } else { if (!ndst) { struct flowi6 fl6 = { .flowi6_oif = ub->ifindex, .daddr = dst->ipv6, .saddr = src->ipv6, .flowi6_proto = IPPROTO_UDP }; ndst = ipv6_stub->ipv6_dst_lookup_flow(net, ub->ubsock->sk, &fl6, NULL); if (IS_ERR(ndst)) { err = PTR_ERR(ndst); goto tx_error; } dst_cache_set_ip6(cache, ndst, &fl6.saddr); } ttl = ip6_dst_hoplimit(ndst); err = udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, NULL, &src->ipv6, &dst->ipv6, 0, ttl, 0, src->port, dst->port, false); #endif } local_bh_enable(); return err; tx_error: local_bh_enable(); kfree_skb(skb); return err; } static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, struct tipc_bearer *b, struct tipc_media_addr *addr) { struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value; struct udp_media_addr *dst = (struct udp_media_addr *)&addr->value; struct udp_replicast *rcast; struct udp_bearer *ub; int err = 0; if (skb_headroom(skb) < UDP_MIN_HEADROOM) { err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC); if (err) goto out; } skb_set_inner_protocol(skb, htons(ETH_P_TIPC)); ub = rcu_dereference(b->media_ptr); if (!ub) { err = -ENODEV; goto out; } if (addr->broadcast != TIPC_REPLICAST_SUPPORT) return tipc_udp_xmit(net, skb, ub, src, dst, &ub->rcast.dst_cache); /* Replicast, send an skb to each configured IP address */ list_for_each_entry_rcu(rcast, &ub->rcast.list, list) { struct sk_buff *_skb; _skb = pskb_copy(skb, GFP_ATOMIC); if (!_skb) { err = -ENOMEM; goto out; } err = tipc_udp_xmit(net, _skb, ub, src, &rcast->addr, &rcast->dst_cache); if (err) goto out; } err = 0; out: kfree_skb(skb); return err; } static bool tipc_udp_is_known_peer(struct tipc_bearer *b, struct udp_media_addr *addr) { struct udp_replicast *rcast, *tmp; struct udp_bearer *ub; ub = rcu_dereference_rtnl(b->media_ptr); if (!ub) { pr_err_ratelimited("UDP bearer instance not found\n"); return false; } list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { if (!memcmp(&rcast->addr, addr, sizeof(struct udp_media_addr))) return true; } return false; } static int tipc_udp_rcast_add(struct tipc_bearer *b, struct udp_media_addr *addr) { struct udp_replicast *rcast; struct udp_bearer *ub; ub = rcu_dereference_rtnl(b->media_ptr); if (!ub) return -ENODEV; rcast = kmalloc(sizeof(*rcast), GFP_ATOMIC); if (!rcast) return -ENOMEM; if (dst_cache_init(&rcast->dst_cache, GFP_ATOMIC)) { kfree(rcast); return -ENOMEM; } memcpy(&rcast->addr, addr, sizeof(struct udp_media_addr)); if (ntohs(addr->proto) == ETH_P_IP) pr_info("New replicast peer: %pI4\n", &rcast->addr.ipv4); #if IS_ENABLED(CONFIG_IPV6) else if (ntohs(addr->proto) == ETH_P_IPV6) pr_info("New replicast peer: %pI6\n", &rcast->addr.ipv6); #endif b->bcast_addr.broadcast = TIPC_REPLICAST_SUPPORT; list_add_rcu(&rcast->list, &ub->rcast.list); return 0; } static int tipc_udp_rcast_disc(struct tipc_bearer *b, struct sk_buff *skb) { struct udp_media_addr src = {0}; struct udp_media_addr *dst; dst = (struct udp_media_addr *)&b->bcast_addr.value; if (tipc_udp_is_mcast_addr(dst)) return 0; src.port = udp_hdr(skb)->source; if (ip_hdr(skb)->version == 4) { struct iphdr *iphdr = ip_hdr(skb); src.proto = htons(ETH_P_IP); src.ipv4.s_addr = iphdr->saddr; if (ipv4_is_multicast(iphdr->daddr)) return 0; #if IS_ENABLED(CONFIG_IPV6) } else if (ip_hdr(skb)->version == 6) { struct ipv6hdr *iphdr = ipv6_hdr(skb); src.proto = htons(ETH_P_IPV6); src.ipv6 = iphdr->saddr; if (ipv6_addr_is_multicast(&iphdr->daddr)) return 0; #endif } else { return 0; } if (likely(tipc_udp_is_known_peer(b, &src))) return 0; return tipc_udp_rcast_add(b, &src); } /* tipc_udp_recv - read data from bearer socket */ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb) { struct udp_bearer *ub; struct tipc_bearer *b; struct tipc_msg *hdr; int err; ub = rcu_dereference_sk_user_data(sk); if (!ub) { pr_err_ratelimited("Failed to get UDP bearer reference"); goto out; } skb_pull(skb, sizeof(struct udphdr)); hdr = buf_msg(skb); b = rcu_dereference(ub->bearer); if (!b) goto out; if (b && test_bit(0, &b->up)) { TIPC_SKB_CB(skb)->flags = 0; tipc_rcv(sock_net(sk), skb, b); return 0; } if (unlikely(msg_user(hdr) == LINK_CONFIG)) { err = tipc_udp_rcast_disc(b, skb); if (err) goto out; } out: kfree_skb(skb); return 0; } static int enable_mcast(struct udp_bearer *ub, struct udp_media_addr *remote) { int err = 0; struct ip_mreqn mreqn; struct sock *sk = ub->ubsock->sk; if (ntohs(remote->proto) == ETH_P_IP) { mreqn.imr_multiaddr = remote->ipv4; mreqn.imr_ifindex = ub->ifindex; err = ip_mc_join_group(sk, &mreqn); #if IS_ENABLED(CONFIG_IPV6) } else { lock_sock(sk); err = ipv6_stub->ipv6_sock_mc_join(sk, ub->ifindex, &remote->ipv6); release_sock(sk); #endif } return err; } static int __tipc_nl_add_udp_addr(struct sk_buff *skb, struct udp_media_addr *addr, int nla_t) { if (ntohs(addr->proto) == ETH_P_IP) { struct sockaddr_in ip4; memset(&ip4, 0, sizeof(ip4)); ip4.sin_family = AF_INET; ip4.sin_port = addr->port; ip4.sin_addr.s_addr = addr->ipv4.s_addr; if (nla_put(skb, nla_t, sizeof(ip4), &ip4)) return -EMSGSIZE; #if IS_ENABLED(CONFIG_IPV6) } else if (ntohs(addr->proto) == ETH_P_IPV6) { struct sockaddr_in6 ip6; memset(&ip6, 0, sizeof(ip6)); ip6.sin6_family = AF_INET6; ip6.sin6_port = addr->port; memcpy(&ip6.sin6_addr, &addr->ipv6, sizeof(struct in6_addr)); if (nla_put(skb, nla_t, sizeof(ip6), &ip6)) return -EMSGSIZE; #endif } return 0; } int tipc_udp_nl_dump_remoteip(struct sk_buff *skb, struct netlink_callback *cb) { u32 bid = cb->args[0]; u32 skip_cnt = cb->args[1]; u32 portid = NETLINK_CB(cb->skb).portid; struct udp_replicast *rcast, *tmp; struct tipc_bearer *b; struct udp_bearer *ub; void *hdr; int err; int i; if (!bid && !skip_cnt) { struct nlattr **attrs = genl_dumpit_info(cb)->info.attrs; struct net *net = sock_net(skb->sk); struct nlattr *battrs[TIPC_NLA_BEARER_MAX + 1]; char *bname; if (!attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(battrs, TIPC_NLA_BEARER_MAX, attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, NULL); if (err) return err; if (!battrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; bname = nla_data(battrs[TIPC_NLA_BEARER_NAME]); rtnl_lock(); b = tipc_bearer_find(net, bname); if (!b) { rtnl_unlock(); return -EINVAL; } bid = b->identity; } else { struct net *net = sock_net(skb->sk); struct tipc_net *tn = net_generic(net, tipc_net_id); rtnl_lock(); b = rtnl_dereference(tn->bearer_list[bid]); if (!b) { rtnl_unlock(); return -EINVAL; } } ub = rtnl_dereference(b->media_ptr); if (!ub) { rtnl_unlock(); return -EINVAL; } i = 0; list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { if (i < skip_cnt) goto count; hdr = genlmsg_put(skb, portid, cb->nlh->nlmsg_seq, &tipc_genl_family, NLM_F_MULTI, TIPC_NL_BEARER_GET); if (!hdr) goto done; err = __tipc_nl_add_udp_addr(skb, &rcast->addr, TIPC_NLA_UDP_REMOTE); if (err) { genlmsg_cancel(skb, hdr); goto done; } genlmsg_end(skb, hdr); count: i++; } done: rtnl_unlock(); cb->args[0] = bid; cb->args[1] = i; return skb->len; } int tipc_udp_nl_add_bearer_data(struct tipc_nl_msg *msg, struct tipc_bearer *b) { struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value; struct udp_media_addr *dst; struct udp_bearer *ub; struct nlattr *nest; ub = rtnl_dereference(b->media_ptr); if (!ub) return -ENODEV; nest = nla_nest_start_noflag(msg->skb, TIPC_NLA_BEARER_UDP_OPTS); if (!nest) goto msg_full; if (__tipc_nl_add_udp_addr(msg->skb, src, TIPC_NLA_UDP_LOCAL)) goto msg_full; dst = (struct udp_media_addr *)&b->bcast_addr.value; if (__tipc_nl_add_udp_addr(msg->skb, dst, TIPC_NLA_UDP_REMOTE)) goto msg_full; if (!list_empty(&ub->rcast.list)) { if (nla_put_flag(msg->skb, TIPC_NLA_UDP_MULTI_REMOTEIP)) goto msg_full; } nla_nest_end(msg->skb, nest); return 0; msg_full: nla_nest_cancel(msg->skb, nest); return -EMSGSIZE; } /** * tipc_parse_udp_addr - build udp media address from netlink data * @nla: netlink attribute containing sockaddr storage aligned address * @addr: tipc media address to fill with address, port and protocol type * @scope_id: IPv6 scope id pointer, not NULL indicates it's required */ static int tipc_parse_udp_addr(struct nlattr *nla, struct udp_media_addr *addr, u32 *scope_id) { struct sockaddr_storage sa; nla_memcpy(&sa, nla, sizeof(sa)); if (sa.ss_family == AF_INET) { struct sockaddr_in *ip4 = (struct sockaddr_in *)&sa; addr->proto = htons(ETH_P_IP); addr->port = ip4->sin_port; addr->ipv4.s_addr = ip4->sin_addr.s_addr; return 0; #if IS_ENABLED(CONFIG_IPV6) } else if (sa.ss_family == AF_INET6) { struct sockaddr_in6 *ip6 = (struct sockaddr_in6 *)&sa; addr->proto = htons(ETH_P_IPV6); addr->port = ip6->sin6_port; memcpy(&addr->ipv6, &ip6->sin6_addr, sizeof(struct in6_addr)); /* Scope ID is only interesting for local addresses */ if (scope_id) { int atype; atype = ipv6_addr_type(&ip6->sin6_addr); if (__ipv6_addr_needs_scope_id(atype) && !ip6->sin6_scope_id) { return -EINVAL; } *scope_id = ip6->sin6_scope_id ? : 0; } return 0; #endif } return -EADDRNOTAVAIL; } int tipc_udp_nl_bearer_add(struct tipc_bearer *b, struct nlattr *attr) { int err; struct udp_media_addr addr = {0}; struct nlattr *opts[TIPC_NLA_UDP_MAX + 1]; struct udp_media_addr *dst; if (nla_parse_nested_deprecated(opts, TIPC_NLA_UDP_MAX, attr, tipc_nl_udp_policy, NULL)) return -EINVAL; if (!opts[TIPC_NLA_UDP_REMOTE]) return -EINVAL; err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_REMOTE], &addr, NULL); if (err) return err; dst = (struct udp_media_addr *)&b->bcast_addr.value; if (tipc_udp_is_mcast_addr(dst)) { pr_err("Can't add remote ip to TIPC UDP multicast bearer\n"); return -EINVAL; } if (tipc_udp_is_known_peer(b, &addr)) return 0; return tipc_udp_rcast_add(b, &addr); } /** * tipc_udp_enable - callback to create a new udp bearer instance * @net: network namespace * @b: pointer to generic tipc_bearer * @attrs: netlink bearer configuration * * validate the bearer parameters and initialize the udp bearer * rtnl_lock should be held */ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, struct nlattr *attrs[]) { int err = -EINVAL; struct udp_bearer *ub; struct udp_media_addr remote = {0}; struct udp_media_addr local = {0}; struct udp_port_cfg udp_conf = {0}; struct udp_tunnel_sock_cfg tuncfg = {NULL}; struct nlattr *opts[TIPC_NLA_UDP_MAX + 1]; u8 node_id[NODE_ID_LEN] = {0,}; struct net_device *dev; int rmcast = 0; ub = kzalloc(sizeof(*ub), GFP_ATOMIC); if (!ub) return -ENOMEM; INIT_LIST_HEAD(&ub->rcast.list); if (!attrs[TIPC_NLA_BEARER_UDP_OPTS]) goto err; if (nla_parse_nested_deprecated(opts, TIPC_NLA_UDP_MAX, attrs[TIPC_NLA_BEARER_UDP_OPTS], tipc_nl_udp_policy, NULL)) goto err; if (!opts[TIPC_NLA_UDP_LOCAL] || !opts[TIPC_NLA_UDP_REMOTE]) { pr_err("Invalid UDP bearer configuration"); err = -EINVAL; goto err; } err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_LOCAL], &local, &ub->ifindex); if (err) goto err; err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_REMOTE], &remote, NULL); if (err) goto err; if (remote.proto != local.proto) { err = -EINVAL; goto err; } /* Checking remote ip address */ rmcast = tipc_udp_is_mcast_addr(&remote); /* Autoconfigure own node identity if needed */ if (!tipc_own_id(net)) { memcpy(node_id, local.ipv6.in6_u.u6_addr8, 16); tipc_net_init(net, node_id, 0); } if (!tipc_own_id(net)) { pr_warn("Failed to set node id, please configure manually\n"); err = -EINVAL; goto err; } b->bcast_addr.media_id = TIPC_MEDIA_TYPE_UDP; b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT; rcu_assign_pointer(b->media_ptr, ub); rcu_assign_pointer(ub->bearer, b); tipc_udp_media_addr_set(&b->addr, &local); if (local.proto == htons(ETH_P_IP)) { dev = __ip_dev_find(net, local.ipv4.s_addr, false); if (!dev) { err = -ENODEV; goto err; } udp_conf.family = AF_INET; /* Switch to use ANY to receive packets from group */ if (rmcast) udp_conf.local_ip.s_addr = htonl(INADDR_ANY); else udp_conf.local_ip.s_addr = local.ipv4.s_addr; udp_conf.use_udp_checksums = false; ub->ifindex = dev->ifindex; b->encap_hlen = sizeof(struct iphdr) + sizeof(struct udphdr); b->mtu = b->media->mtu; #if IS_ENABLED(CONFIG_IPV6) } else if (local.proto == htons(ETH_P_IPV6)) { dev = ub->ifindex ? __dev_get_by_index(net, ub->ifindex) : NULL; dev = ipv6_dev_find(net, &local.ipv6, dev); if (!dev) { err = -ENODEV; goto err; } udp_conf.family = AF_INET6; udp_conf.use_udp6_tx_checksums = true; udp_conf.use_udp6_rx_checksums = true; if (rmcast) udp_conf.local_ip6 = in6addr_any; else udp_conf.local_ip6 = local.ipv6; ub->ifindex = dev->ifindex; b->encap_hlen = sizeof(struct ipv6hdr) + sizeof(struct udphdr); b->mtu = 1280; #endif } else { err = -EAFNOSUPPORT; goto err; } udp_conf.local_udp_port = local.port; err = udp_sock_create(net, &udp_conf, &ub->ubsock); if (err) goto err; tuncfg.sk_user_data = ub; tuncfg.encap_type = 1; tuncfg.encap_rcv = tipc_udp_recv; tuncfg.encap_destroy = NULL; setup_udp_tunnel_sock(net, ub->ubsock, &tuncfg); err = dst_cache_init(&ub->rcast.dst_cache, GFP_ATOMIC); if (err) goto free; /* * The bcast media address port is used for all peers and the ip * is used if it's a multicast address. */ memcpy(&b->bcast_addr.value, &remote, sizeof(remote)); if (rmcast) err = enable_mcast(ub, &remote); else err = tipc_udp_rcast_add(b, &remote); if (err) goto free; return 0; free: dst_cache_destroy(&ub->rcast.dst_cache); udp_tunnel_sock_release(ub->ubsock); err: kfree(ub); return err; } /* cleanup_bearer - break the socket/bearer association */ static void cleanup_bearer(struct work_struct *work) { struct udp_bearer *ub = container_of(work, struct udp_bearer, work); struct udp_replicast *rcast, *tmp; struct tipc_net *tn; list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { dst_cache_destroy(&rcast->dst_cache); list_del_rcu(&rcast->list); kfree_rcu(rcast, rcu); } tn = tipc_net(sock_net(ub->ubsock->sk)); dst_cache_destroy(&ub->rcast.dst_cache); udp_tunnel_sock_release(ub->ubsock); /* Note: could use a call_rcu() to avoid another synchronize_net() */ synchronize_net(); atomic_dec(&tn->wq_count); kfree(ub); } /* tipc_udp_disable - detach bearer from socket */ static void tipc_udp_disable(struct tipc_bearer *b) { struct udp_bearer *ub; ub = rtnl_dereference(b->media_ptr); if (!ub) { pr_err("UDP bearer instance not found\n"); return; } sock_set_flag(ub->ubsock->sk, SOCK_DEAD); RCU_INIT_POINTER(ub->bearer, NULL); /* sock_release need to be done outside of rtnl lock */ atomic_inc(&tipc_net(sock_net(ub->ubsock->sk))->wq_count); INIT_WORK(&ub->work, cleanup_bearer); schedule_work(&ub->work); } struct tipc_media udp_media_info = { .send_msg = tipc_udp_send_msg, .enable_media = tipc_udp_enable, .disable_media = tipc_udp_disable, .addr2str = tipc_udp_addr2str, .addr2msg = tipc_udp_addr2msg, .msg2addr = tipc_udp_msg2addr, .priority = TIPC_DEF_LINK_PRI, .tolerance = TIPC_DEF_LINK_TOL, .min_win = TIPC_DEF_LINK_WIN, .max_win = TIPC_DEF_LINK_WIN, .mtu = TIPC_DEF_LINK_UDP_MTU, .type_id = TIPC_MEDIA_TYPE_UDP, .hwaddr_len = 0, .name = "udp" }; |
2 158 158 4864 4864 4864 5 5 4862 4873 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_MMZONE_H #define _LINUX_MMZONE_H #ifndef __ASSEMBLY__ #ifndef __GENERATING_BOUNDS_H #include <linux/spinlock.h> #include <linux/list.h> #include <linux/list_nulls.h> #include <linux/wait.h> #include <linux/bitops.h> #include <linux/cache.h> #include <linux/threads.h> #include <linux/numa.h> #include <linux/init.h> #include <linux/seqlock.h> #include <linux/nodemask.h> #include <linux/pageblock-flags.h> #include <linux/page-flags-layout.h> #include <linux/atomic.h> #include <linux/mm_types.h> #include <linux/page-flags.h> #include <linux/local_lock.h> #include <linux/zswap.h> #include <asm/page.h> /* Free memory management - zoned buddy allocator. */ #ifndef CONFIG_ARCH_FORCE_MAX_ORDER #define MAX_PAGE_ORDER 10 #else #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER #endif #define MAX_ORDER_NR_PAGES (1 << MAX_PAGE_ORDER) #define IS_MAX_ORDER_ALIGNED(pfn) IS_ALIGNED(pfn, MAX_ORDER_NR_PAGES) #define NR_PAGE_ORDERS (MAX_PAGE_ORDER + 1) /* * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed * costly to service. That is between allocation orders which should * coalesce naturally under reasonable reclaim pressure and those which * will not. */ #define PAGE_ALLOC_COSTLY_ORDER 3 enum migratetype { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RECLAIMABLE, MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES, #ifdef CONFIG_CMA /* * MIGRATE_CMA migration type is designed to mimic the way * ZONE_MOVABLE works. Only movable pages can be allocated * from MIGRATE_CMA pageblocks and page allocator never * implicitly change migration type of MIGRATE_CMA pageblock. * * The way to use it is to change migratetype of a range of * pageblocks to MIGRATE_CMA which can be done by * __free_pageblock_cma() function. */ MIGRATE_CMA, #endif #ifdef CONFIG_MEMORY_ISOLATION MIGRATE_ISOLATE, /* can't allocate from here */ #endif MIGRATE_TYPES }; /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */ extern const char * const migratetype_names[MIGRATE_TYPES]; #ifdef CONFIG_CMA # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) # define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA) # define is_migrate_cma_folio(folio, pfn) (MIGRATE_CMA == \ get_pfnblock_flags_mask(&folio->page, pfn, MIGRATETYPE_MASK)) #else # define is_migrate_cma(migratetype) false # define is_migrate_cma_page(_page) false # define is_migrate_cma_folio(folio, pfn) false #endif static inline bool is_migrate_movable(int mt) { return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; } /* * Check whether a migratetype can be merged with another migratetype. * * It is only mergeable when it can fall back to other migratetypes for * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c. */ static inline bool migratetype_is_mergeable(int mt) { return mt < MIGRATE_PCPTYPES; } #define for_each_migratetype_order(order, type) \ for (order = 0; order < NR_PAGE_ORDERS; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) extern int page_group_by_mobility_disabled; #define MIGRATETYPE_MASK ((1UL << PB_migratetype_bits) - 1) #define get_pageblock_migratetype(page) \ get_pfnblock_flags_mask(page, page_to_pfn(page), MIGRATETYPE_MASK) #define folio_migratetype(folio) \ get_pfnblock_flags_mask(&folio->page, folio_pfn(folio), \ MIGRATETYPE_MASK) struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; }; struct pglist_data; #ifdef CONFIG_NUMA enum numa_stat_item { NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ NUMA_FOREIGN, /* was intended here, hit elsewhere */ NUMA_INTERLEAVE_HIT, /* interleaver preferred this zone */ NUMA_LOCAL, /* allocation from local node */ NUMA_OTHER, /* allocation from other node */ NR_VM_NUMA_EVENT_ITEMS }; #else #define NR_VM_NUMA_EVENT_ITEMS 0 #endif enum zone_stat_item { /* First 128 byte cacheline (assuming 64 bit words) */ NR_FREE_PAGES, NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */ NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE, NR_ZONE_ACTIVE_ANON, NR_ZONE_INACTIVE_FILE, NR_ZONE_ACTIVE_FILE, NR_ZONE_UNEVICTABLE, NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages */ NR_MLOCK, /* mlock()ed pages found and moved off LRU */ /* Second 128 byte cacheline */ NR_BOUNCE, #if IS_ENABLED(CONFIG_ZSMALLOC) NR_ZSPAGES, /* allocated in zsmalloc */ #endif NR_FREE_CMA_PAGES, #ifdef CONFIG_UNACCEPTED_MEMORY NR_UNACCEPTED, #endif NR_VM_ZONE_STAT_ITEMS }; enum node_stat_item { NR_LRU_BASE, NR_INACTIVE_ANON = NR_LRU_BASE, /* must match order of LRU_[IN]ACTIVE */ NR_ACTIVE_ANON, /* " " " " " */ NR_INACTIVE_FILE, /* " " " " " */ NR_ACTIVE_FILE, /* " " " " " */ NR_UNEVICTABLE, /* " " " " " */ NR_SLAB_RECLAIMABLE_B, NR_SLAB_UNRECLAIMABLE_B, NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ WORKINGSET_NODES, WORKINGSET_REFAULT_BASE, WORKINGSET_REFAULT_ANON = WORKINGSET_REFAULT_BASE, WORKINGSET_REFAULT_FILE, WORKINGSET_ACTIVATE_BASE, WORKINGSET_ACTIVATE_ANON = WORKINGSET_ACTIVATE_BASE, WORKINGSET_ACTIVATE_FILE, WORKINGSET_RESTORE_BASE, WORKINGSET_RESTORE_ANON = WORKINGSET_RESTORE_BASE, WORKINGSET_RESTORE_FILE, WORKINGSET_NODERECLAIM, NR_ANON_MAPPED, /* Mapped anonymous pages */ NR_FILE_MAPPED, /* pagecache pages mapped into pagetables. only modified from process context */ NR_FILE_PAGES, NR_FILE_DIRTY, NR_WRITEBACK, NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ NR_SHMEM_THPS, NR_SHMEM_PMDMAPPED, NR_FILE_THPS, NR_FILE_PMDMAPPED, NR_ANON_THPS, NR_VMSCAN_WRITE, NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ NR_WRITTEN, /* page writings since bootup */ NR_THROTTLED_WRITTEN, /* NR_WRITTEN while reclaim throttled */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ NR_KERNEL_STACK_KB, /* measured in KiB */ #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) NR_KERNEL_SCS_KB, /* measured in KiB */ #endif NR_PAGETABLE, /* used for pagetables */ NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */ #ifdef CONFIG_IOMMU_SUPPORT NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */ #endif #ifdef CONFIG_SWAP NR_SWAPCACHE, #endif #ifdef CONFIG_NUMA_BALANCING PGPROMOTE_SUCCESS, /* promote successfully */ PGPROMOTE_CANDIDATE, /* candidate pages to promote */ #endif /* PGDEMOTE_*: pages demoted */ PGDEMOTE_KSWAPD, PGDEMOTE_DIRECT, PGDEMOTE_KHUGEPAGED, #ifdef CONFIG_HUGETLB_PAGE NR_HUGETLB, #endif NR_VM_NODE_STAT_ITEMS }; /* * Returns true if the item should be printed in THPs (/proc/vmstat * currently prints number of anon, file and shmem THPs. But the item * is charged in pages). */ static __always_inline bool vmstat_item_print_in_thp(enum node_stat_item item) { if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return false; return item == NR_ANON_THPS || item == NR_FILE_THPS || item == NR_SHMEM_THPS || item == NR_SHMEM_PMDMAPPED || item == NR_FILE_PMDMAPPED; } /* * Returns true if the value is measured in bytes (most vmstat values are * measured in pages). This defines the API part, the internal representation * might be different. */ static __always_inline bool vmstat_item_in_bytes(int idx) { /* * Global and per-node slab counters track slab pages. * It's expected that changes are multiples of PAGE_SIZE. * Internally values are stored in pages. * * Per-memcg and per-lruvec counters track memory, consumed * by individual slab objects. These counters are actually * byte-precise. */ return (idx == NR_SLAB_RECLAIMABLE_B || idx == NR_SLAB_UNRECLAIMABLE_B); } /* * We do arithmetic on the LRU lists in various places in the code, * so it is important to keep the active lists LRU_ACTIVE higher in * the array than the corresponding inactive lists, and to keep * the *_FILE lists LRU_FILE higher than the corresponding _ANON lists. * * This has to be kept in sync with the statistics in zone_stat_item * above and the descriptions in vmstat_text in mm/vmstat.c */ #define LRU_BASE 0 #define LRU_ACTIVE 1 #define LRU_FILE 2 enum lru_list { LRU_INACTIVE_ANON = LRU_BASE, LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE, LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE, LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE, LRU_UNEVICTABLE, NR_LRU_LISTS }; enum vmscan_throttle_state { VMSCAN_THROTTLE_WRITEBACK, VMSCAN_THROTTLE_ISOLATED, VMSCAN_THROTTLE_NOPROGRESS, VMSCAN_THROTTLE_CONGESTED, NR_VMSCAN_THROTTLE, }; #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++) #define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++) static inline bool is_file_lru(enum lru_list lru) { return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE); } static inline bool is_active_lru(enum lru_list lru) { return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE); } #define WORKINGSET_ANON 0 #define WORKINGSET_FILE 1 #define ANON_AND_FILE 2 enum lruvec_flags { /* * An lruvec has many dirty pages backed by a congested BDI: * 1. LRUVEC_CGROUP_CONGESTED is set by cgroup-level reclaim. * It can be cleared by cgroup reclaim or kswapd. * 2. LRUVEC_NODE_CONGESTED is set by kswapd node-level reclaim. * It can only be cleared by kswapd. * * Essentially, kswapd can unthrottle an lruvec throttled by cgroup * reclaim, but not vice versa. This only applies to the root cgroup. * The goal is to prevent cgroup reclaim on the root cgroup (e.g. * memory.reclaim) to unthrottle an unbalanced node (that was throttled * by kswapd). */ LRUVEC_CGROUP_CONGESTED, LRUVEC_NODE_CONGESTED, }; #endif /* !__GENERATING_BOUNDS_H */ /* * Evictable pages are divided into multiple generations. The youngest and the * oldest generation numbers, max_seq and min_seq, are monotonically increasing. * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An * offset within MAX_NR_GENS, i.e., gen, indexes the LRU list of the * corresponding generation. The gen counter in folio->flags stores gen+1 while * a page is on one of lrugen->folios[]. Otherwise it stores 0. * * A page is added to the youngest generation on faulting. The aging needs to * check the accessed bit at least twice before handing this page over to the * eviction. The first check takes care of the accessed bit set on the initial * fault; the second check makes sure this page hasn't been used since then. * This process, AKA second chance, requires a minimum of two generations, * hence MIN_NR_GENS. And to maintain ABI compatibility with the active/inactive * LRU, e.g., /proc/vmstat, these two generations are considered active; the * rest of generations, if they exist, are considered inactive. See * lru_gen_is_active(). * * PG_active is always cleared while a page is on one of lrugen->folios[] so * that the aging needs not to worry about it. And it's set again when a page * considered active is isolated for non-reclaiming purposes, e.g., migration. * See lru_gen_add_folio() and lru_gen_del_folio(). * * MAX_NR_GENS is set to 4 so that the multi-gen LRU can support twice the * number of categories of the active/inactive LRU when keeping track of * accesses through page tables. This requires order_base_2(MAX_NR_GENS+1) bits * in folio->flags. */ #define MIN_NR_GENS 2U #define MAX_NR_GENS 4U /* * Each generation is divided into multiple tiers. A page accessed N times * through file descriptors is in tier order_base_2(N). A page in the first tier * (N=0,1) is marked by PG_referenced unless it was faulted in through page * tables or read ahead. A page in any other tier (N>1) is marked by * PG_referenced and PG_workingset. This implies a minimum of two tiers is * supported without using additional bits in folio->flags. * * In contrast to moving across generations which requires the LRU lock, moving * across tiers only involves atomic operations on folio->flags and therefore * has a negligible cost in the buffered access path. In the eviction path, * comparisons of refaulted/(evicted+protected) from the first tier and the * rest infer whether pages accessed multiple times through file descriptors * are statistically hot and thus worth protecting. * * MAX_NR_TIERS is set to 4 so that the multi-gen LRU can support twice the * number of categories of the active/inactive LRU when keeping track of * accesses through file descriptors. This uses MAX_NR_TIERS-2 spare bits in * folio->flags. */ #define MAX_NR_TIERS 4U #ifndef __GENERATING_BOUNDS_H struct lruvec; struct page_vma_mapped_walk; #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) #ifdef CONFIG_LRU_GEN enum { LRU_GEN_ANON, LRU_GEN_FILE, }; enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, NR_LRU_GEN_CAPS }; #define LRU_REFS_FLAGS (BIT(PG_referenced) | BIT(PG_workingset)) #define MIN_LRU_BATCH BITS_PER_LONG #define MAX_LRU_BATCH (MIN_LRU_BATCH * 64) /* whether to keep historical stats from evicted generations */ #ifdef CONFIG_LRU_GEN_STATS #define NR_HIST_GENS MAX_NR_GENS #else #define NR_HIST_GENS 1U #endif /* * The youngest generation number is stored in max_seq for both anon and file * types as they are aged on an equal footing. The oldest generation numbers are * stored in min_seq[] separately for anon and file types as clean file pages * can be evicted regardless of swap constraints. * * Normally anon and file min_seq are in sync. But if swapping is constrained, * e.g., out of swap space, file min_seq is allowed to advance and leave anon * min_seq behind. * * The number of pages in each generation is eventually consistent and therefore * can be transiently negative when reset_batch_size() is pending. */ struct lru_gen_folio { /* the aging increments the youngest generation number */ unsigned long max_seq; /* the eviction increments the oldest generation numbers */ unsigned long min_seq[ANON_AND_FILE]; /* the birth time of each generation in jiffies */ unsigned long timestamps[MAX_NR_GENS]; /* the multi-gen LRU lists, lazily sorted on eviction */ struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; /* the first tier doesn't need protection, hence the minus one */ unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; /* can be modified without holding the LRU lock */ atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; /* whether the multi-gen LRU is enabled */ bool enabled; /* the memcg generation this lru_gen_folio belongs to */ u8 gen; /* the list segment this lru_gen_folio belongs to */ u8 seg; /* per-node lru_gen_folio list for global reclaim */ struct hlist_nulls_node list; }; enum { MM_LEAF_TOTAL, /* total leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS }; /* double-buffering Bloom filters */ #define NR_BLOOM_FILTERS 2 struct lru_gen_mm_state { /* synced with max_seq after each iteration */ unsigned long seq; /* where the current iteration continues after */ struct list_head *head; /* where the last iteration ended before */ struct list_head *tail; /* Bloom filters flip after each iteration */ unsigned long *filters[NR_BLOOM_FILTERS]; /* the mm stats for debugging */ unsigned long stats[NR_HIST_GENS][NR_MM_STATS]; }; struct lru_gen_mm_walk { /* the lruvec under reclaim */ struct lruvec *lruvec; /* max_seq from lru_gen_folio: can be out of date */ unsigned long seq; /* the next address within an mm to scan */ unsigned long next_addr; /* to batch promoted pages */ int nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* to batch the mm stats */ int mm_stats[NR_MM_STATS]; /* total batched items */ int batched; bool can_swap; bool force_scan; }; /* * For each node, memcgs are divided into two generations: the old and the * young. For each generation, memcgs are randomly sharded into multiple bins * to improve scalability. For each bin, the hlist_nulls is virtually divided * into three segments: the head, the tail and the default. * * An onlining memcg is added to the tail of a random bin in the old generation. * The eviction starts at the head of a random bin in the old generation. The * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes * the old generation, is incremented when all its bins become empty. * * There are four operations: * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its * current generation (old or young) and updates its "seg" to "head"; * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its * current generation (old or young) and updates its "seg" to "tail"; * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old * generation, updates its "gen" to "old" and resets its "seg" to "default"; * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the * young generation, updates its "gen" to "young" and resets its "seg" to * "default". * * The events that trigger the above operations are: * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; * 3. The first attempt to reclaim a memcg offlined or below reclaimable size * threshold, which triggers MEMCG_LRU_TAIL; * 4. The second attempt to reclaim a memcg offlined or below reclaimable size * threshold, which triggers MEMCG_LRU_YOUNG; * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. * * Notes: * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing * of their max_seq counters ensures the eventual fairness to all eligible * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). * 2. There are only two valid generations: old (seq) and young (seq+1). * MEMCG_NR_GENS is set to three so that when reading the generation counter * locklessly, a stale value (seq-1) does not wraparound to young. */ #define MEMCG_NR_GENS 3 #define MEMCG_NR_BINS 8 struct lru_gen_memcg { /* the per-node memcg generation counter */ unsigned long seq; /* each memcg has one lru_gen_folio per node */ unsigned long nr_memcgs[MEMCG_NR_GENS]; /* per-node lru_gen_folio list for global reclaim */ struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS]; /* protects the above */ spinlock_t lock; }; void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); void lru_gen_online_memcg(struct mem_cgroup *memcg); void lru_gen_offline_memcg(struct mem_cgroup *memcg); void lru_gen_release_memcg(struct mem_cgroup *memcg); void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid); #else /* !CONFIG_LRU_GEN */ static inline void lru_gen_init_pgdat(struct pglist_data *pgdat) { } static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) { } static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg) { } static inline void lru_gen_online_memcg(struct mem_cgroup *memcg) { } static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg) { } static inline void lru_gen_release_memcg(struct mem_cgroup *memcg) { } static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) { } #endif /* CONFIG_LRU_GEN */ struct lruvec { struct list_head lists[NR_LRU_LISTS]; /* per lruvec lru_lock for memcg */ spinlock_t lru_lock; /* * These track the cost of reclaiming one LRU - file or anon - * over the other. As the observed cost of reclaiming one LRU * increases, the reclaim scan balance tips toward the other. */ unsigned long anon_cost; unsigned long file_cost; /* Non-resident age, driven by LRU movement */ atomic_long_t nonresident_age; /* Refaults at the time of last reclaim cycle */ unsigned long refaults[ANON_AND_FILE]; /* Various lruvec state flags (enum lruvec_flags) */ unsigned long flags; #ifdef CONFIG_LRU_GEN /* evictable pages divided into generations */ struct lru_gen_folio lrugen; #ifdef CONFIG_LRU_GEN_WALKS_MMU /* to concurrently iterate lru_gen_mm_list */ struct lru_gen_mm_state mm_state; #endif #endif /* CONFIG_LRU_GEN */ #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif struct zswap_lruvec_state zswap_lruvec_state; }; /* Isolate for asynchronous migration */ #define ISOLATE_ASYNC_MIGRATE ((__force isolate_mode_t)0x4) /* Isolate unevictable pages */ #define ISOLATE_UNEVICTABLE ((__force isolate_mode_t)0x8) /* LRU Isolation modes. */ typedef unsigned __bitwise isolate_mode_t; enum zone_watermarks { WMARK_MIN, WMARK_LOW, WMARK_HIGH, WMARK_PROMO, NR_WMARK }; /* * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define NR_PCP_THP 2 #else #define NR_PCP_THP 0 #endif #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) /* * Flags used in pcp->flags field. * * PCPF_PREV_FREE_HIGH_ORDER: a high-order page is freed in the * previous page freeing. To avoid to drain PCP for an accident * high-order page freeing. * * PCPF_FREE_HIGH_BATCH: preserve "pcp->batch" pages in PCP before * draining PCP for consecutive high-order pages freeing without * allocation if data cache slice of CPU is large enough. To reduce * zone lock contention and keep cache-hot pages reusing. */ #define PCPF_PREV_FREE_HIGH_ORDER BIT(0) #define PCPF_FREE_HIGH_BATCH BIT(1) struct per_cpu_pages { spinlock_t lock; /* Protects lists field */ int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int high_min; /* min high watermark */ int high_max; /* max high watermark */ int batch; /* chunk size for buddy add/remove */ u8 flags; /* protected by pcp->lock */ u8 alloc_factor; /* batch scaling factor during allocate */ #ifdef CONFIG_NUMA u8 expire; /* When 0, remote pagesets are drained */ #endif short free_count; /* consecutive free count */ /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; } ____cacheline_aligned_in_smp; struct per_cpu_zonestat { #ifdef CONFIG_SMP s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; s8 stat_threshold; #endif #ifdef CONFIG_NUMA /* * Low priority inaccurate counters that are only folded * on demand. Use a large type to avoid the overhead of * folding during refresh_cpu_vm_stats. */ unsigned long vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; #endif }; struct per_cpu_nodestat { s8 stat_threshold; s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS]; }; #endif /* !__GENERATING_BOUNDS.H */ enum zone_type { /* * ZONE_DMA and ZONE_DMA32 are used when there are peripherals not able * to DMA to all of the addressable memory (ZONE_NORMAL). * On architectures where this area covers the whole 32 bit address * space ZONE_DMA32 is used. ZONE_DMA is left for the ones with smaller * DMA addressing constraints. This distinction is important as a 32bit * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit * platforms may need both zones as they support peripherals with * different DMA addressing limitations. */ #ifdef CONFIG_ZONE_DMA ZONE_DMA, #endif #ifdef CONFIG_ZONE_DMA32 ZONE_DMA32, #endif /* * Normal addressable memory is in ZONE_NORMAL. DMA operations can be * performed on pages in ZONE_NORMAL if the DMA devices support * transfers to all addressable memory. */ ZONE_NORMAL, #ifdef CONFIG_HIGHMEM /* * A memory area that is only addressable by the kernel through * mapping portions into its own address space. This is for example * used by i386 to allow the kernel to address the memory beyond * 900MB. The kernel will set up special mappings (page * table entries on i386) for each page that the kernel needs to * access. */ ZONE_HIGHMEM, #endif /* * ZONE_MOVABLE is similar to ZONE_NORMAL, except that it contains * movable pages with few exceptional cases described below. Main use * cases for ZONE_MOVABLE are to make memory offlining/unplug more * likely to succeed, and to locally limit unmovable allocations - e.g., * to increase the number of THP/huge pages. Notable special cases are: * * 1. Pinned pages: (long-term) pinning of movable pages might * essentially turn such pages unmovable. Therefore, we do not allow * pinning long-term pages in ZONE_MOVABLE. When pages are pinned and * faulted, they come from the right zone right away. However, it is * still possible that address space already has pages in * ZONE_MOVABLE at the time when pages are pinned (i.e. user has * touches that memory before pinning). In such case we migrate them * to a different zone. When migration fails - pinning fails. * 2. memblock allocations: kernelcore/movablecore setups might create * situations where ZONE_MOVABLE contains unmovable allocations * after boot. Memory offlining and allocations fail early. * 3. Memory holes: kernelcore/movablecore setups might create very rare * situations where ZONE_MOVABLE contains memory holes after boot, * for example, if we have sections that are only partially * populated. Memory offlining and allocations fail early. * 4. PG_hwpoison pages: while poisoned pages can be skipped during * memory offlining, such pages cannot be allocated. * 5. Unmovable PG_offline pages: in paravirtualized environments, * hotplugged memory blocks might only partially be managed by the * buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The * parts not manged by the buddy are unmovable PG_offline pages. In * some cases (virtio-mem), such pages can be skipped during * memory offlining, however, cannot be moved/allocated. These * techniques might use alloc_contig_range() to hide previously * exposed pages from the buddy again (e.g., to implement some sort * of memory unplug in virtio-mem). * 6. ZERO_PAGE(0), kernelcore/movablecore setups might create * situations where ZERO_PAGE(0) which is allocated differently * on different platforms may end up in a movable zone. ZERO_PAGE(0) * cannot be migrated. * 7. Memory-hotplug: when using memmap_on_memory and onlining the * memory to the MOVABLE zone, the vmemmap pages are also placed in * such zone. Such pages cannot be really moved around as they are * self-stored in the range, but they are treated as movable when * the range they describe is about to be offlined. * * In general, no unmovable allocations that degrade memory offlining * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range()) * have to expect that migrating pages in ZONE_MOVABLE can fail (even * if has_unmovable_pages() states that there are no unmovable pages, * there can be false negatives). */ ZONE_MOVABLE, #ifdef CONFIG_ZONE_DEVICE ZONE_DEVICE, #endif __MAX_NR_ZONES }; #ifndef __GENERATING_BOUNDS_H #define ASYNC_AND_SYNC 2 struct zone { /* Read-mostly fields */ /* zone watermarks, access with *_wmark_pages(zone) macros */ unsigned long _watermark[NR_WMARK]; unsigned long watermark_boost; unsigned long nr_reserved_highatomic; unsigned long nr_free_highatomic; /* * We don't know if the memory that we're going to allocate will be * freeable or/and it will be released eventually, so to avoid totally * wasting several GB of ram we must reserve some of the lower zone * memory (otherwise we risk to run OOM on the lower zones despite * there being tons of freeable ram on the higher zones). This array is * recalculated at runtime if the sysctl_lowmem_reserve_ratio sysctl * changes. */ long lowmem_reserve[MAX_NR_ZONES]; #ifdef CONFIG_NUMA int node; #endif struct pglist_data *zone_pgdat; struct per_cpu_pages __percpu *per_cpu_pageset; struct per_cpu_zonestat __percpu *per_cpu_zonestats; /* * the high and batch values are copied to individual pagesets for * faster access */ int pageset_high_min; int pageset_high_max; int pageset_batch; #ifndef CONFIG_SPARSEMEM /* * Flags for a pageblock_nr_pages block. See pageblock-flags.h. * In SPARSEMEM, this map is stored in struct mem_section */ unsigned long *pageblock_flags; #endif /* CONFIG_SPARSEMEM */ /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */ unsigned long zone_start_pfn; /* * spanned_pages is the total pages spanned by the zone, including * holes, which is calculated as: * spanned_pages = zone_end_pfn - zone_start_pfn; * * present_pages is physical pages existing within the zone, which * is calculated as: * present_pages = spanned_pages - absent_pages(pages in holes); * * present_early_pages is present pages existing within the zone * located on memory available since early boot, excluding hotplugged * memory. * * managed_pages is present pages managed by the buddy system, which * is calculated as (reserved_pages includes pages allocated by the * bootmem allocator): * managed_pages = present_pages - reserved_pages; * * cma pages is present pages that are assigned for CMA use * (MIGRATE_CMA). * * So present_pages may be used by memory hotplug or memory power * management logic to figure out unmanaged pages by checking * (present_pages - managed_pages). And managed_pages should be used * by page allocator and vm scanner to calculate all kinds of watermarks * and thresholds. * * Locking rules: * * zone_start_pfn and spanned_pages are protected by span_seqlock. * It is a seqlock because it has to be read outside of zone->lock, * and it is done in the main allocator path. But, it is written * quite infrequently. * * The span_seq lock is declared along with zone->lock because it is * frequently read in proximity to zone->lock. It's good to * give them a chance of being in the same cacheline. * * Write access to present_pages at runtime should be protected by * mem_hotplug_begin/done(). Any reader who can't tolerant drift of * present_pages should use get_online_mems() to get a stable value. */ atomic_long_t managed_pages; unsigned long spanned_pages; unsigned long present_pages; #if defined(CONFIG_MEMORY_HOTPLUG) unsigned long present_early_pages; #endif #ifdef CONFIG_CMA unsigned long cma_pages; #endif const char *name; #ifdef CONFIG_MEMORY_ISOLATION /* * Number of isolated pageblock. It is used to solve incorrect * freepage counting problem due to racy retrieving migratetype * of pageblock. Protected by zone->lock. */ unsigned long nr_isolate_pageblock; #endif #ifdef CONFIG_MEMORY_HOTPLUG /* see spanned/present_pages for more description */ seqlock_t span_seqlock; #endif int initialized; /* Write-intensive fields used from the page allocator */ CACHELINE_PADDING(_pad1_); /* free areas of different sizes */ struct free_area free_area[NR_PAGE_ORDERS]; #ifdef CONFIG_UNACCEPTED_MEMORY /* Pages to be accepted. All pages on the list are MAX_PAGE_ORDER */ struct list_head unaccepted_pages; #endif /* zone flags, see below */ unsigned long flags; /* Primarily protects free_area */ spinlock_t lock; /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); /* * When free pages are below this point, additional steps are taken * when reading the number of free pages to avoid per-cpu counter * drift allowing watermarks to be breached */ unsigned long percpu_drift_mark; #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* pfn where compaction free scanner should start */ unsigned long compact_cached_free_pfn; /* pfn where compaction migration scanner should start */ unsigned long compact_cached_migrate_pfn[ASYNC_AND_SYNC]; unsigned long compact_init_migrate_pfn; unsigned long compact_init_free_pfn; #endif #ifdef CONFIG_COMPACTION /* * On compaction failure, 1<<compact_defer_shift compactions * are skipped before trying again. The number attempted since * last failure is tracked with compact_considered. * compact_order_failed is the minimum compaction failed order. */ unsigned int compact_considered; unsigned int compact_defer_shift; int compact_order_failed; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* Set to true when the PG_migrate_skip bits should be cleared */ bool compact_blockskip_flush; #endif bool contiguous; CACHELINE_PADDING(_pad3_); /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { PGDAT_DIRTY, /* reclaim scanning has recently found * many dirty file pages at the tail * of the LRU. */ PGDAT_WRITEBACK, /* reclaim scanning has recently found * many pages under writeback */ PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */ }; enum zone_flags { ZONE_BOOSTED_WATERMARK, /* zone recently boosted watermarks. * Cleared when kswapd is woken. */ ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ ZONE_BELOW_HIGH, /* zone is below high watermark. */ }; static inline unsigned long wmark_pages(const struct zone *z, enum zone_watermarks w) { return z->_watermark[w] + z->watermark_boost; } static inline unsigned long min_wmark_pages(const struct zone *z) { return wmark_pages(z, WMARK_MIN); } static inline unsigned long low_wmark_pages(const struct zone *z) { return wmark_pages(z, WMARK_LOW); } static inline unsigned long high_wmark_pages(const struct zone *z) { return wmark_pages(z, WMARK_HIGH); } static inline unsigned long promo_wmark_pages(const struct zone *z) { return wmark_pages(z, WMARK_PROMO); } static inline unsigned long zone_managed_pages(struct zone *zone) { return (unsigned long)atomic_long_read(&zone->managed_pages); } static inline unsigned long zone_cma_pages(struct zone *zone) { #ifdef CONFIG_CMA return zone->cma_pages; #else return 0; #endif } static inline unsigned long zone_end_pfn(const struct zone *zone) { return zone->zone_start_pfn + zone->spanned_pages; } static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn) { return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone); } static inline bool zone_is_initialized(struct zone *zone) { return zone->initialized; } static inline bool zone_is_empty(struct zone *zone) { return zone->spanned_pages == 0; } #ifndef BUILD_VDSO32_64 /* * The zone field is never updated after free_area_init_core() * sets it, so none of the operations on it need to be atomic. */ /* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */ #define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH) #define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH) #define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) #define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) #define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH) #define LRU_GEN_PGOFF (KASAN_TAG_PGOFF - LRU_GEN_WIDTH) #define LRU_REFS_PGOFF (LRU_GEN_PGOFF - LRU_REFS_WIDTH) /* * Define the bit shifts to access each section. For non-existent * sections we define the shift as 0; that plus a 0 mask ensures * the compiler will optimise away reference to them. */ #define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0)) #define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0)) #define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0)) #define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0)) #define KASAN_TAG_PGSHIFT (KASAN_TAG_PGOFF * (KASAN_TAG_WIDTH != 0)) /* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */ #ifdef NODE_NOT_IN_PAGE_FLAGS #define ZONEID_SHIFT (SECTIONS_SHIFT + ZONES_SHIFT) #define ZONEID_PGOFF ((SECTIONS_PGOFF < ZONES_PGOFF) ? \ SECTIONS_PGOFF : ZONES_PGOFF) #else #define ZONEID_SHIFT (NODES_SHIFT + ZONES_SHIFT) #define ZONEID_PGOFF ((NODES_PGOFF < ZONES_PGOFF) ? \ NODES_PGOFF : ZONES_PGOFF) #endif #define ZONEID_PGSHIFT (ZONEID_PGOFF * (ZONEID_SHIFT != 0)) #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) #define NODES_MASK ((1UL << NODES_WIDTH) - 1) #define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1) #define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_SHIFT) - 1) #define KASAN_TAG_MASK ((1UL << KASAN_TAG_WIDTH) - 1) #define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1) static inline enum zone_type page_zonenum(const struct page *page) { ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; } static inline enum zone_type folio_zonenum(const struct folio *folio) { return page_zonenum(&folio->page); } #ifdef CONFIG_ZONE_DEVICE static inline bool is_zone_device_page(const struct page *page) { return page_zonenum(page) == ZONE_DEVICE; } /* * Consecutive zone device pages should not be merged into the same sgl * or bvec segment with other types of pages or if they belong to different * pgmaps. Otherwise getting the pgmap of a given segment is not possible * without scanning the entire segment. This helper returns true either if * both pages are not zone device pages or both pages are zone device pages * with the same pgmap. */ static inline bool zone_device_pages_have_same_pgmap(const struct page *a, const struct page *b) { if (is_zone_device_page(a) != is_zone_device_page(b)) return false; if (!is_zone_device_page(a)) return true; return a->pgmap == b->pgmap; } extern void memmap_init_zone_device(struct zone *, unsigned long, unsigned long, struct dev_pagemap *); #else static inline bool is_zone_device_page(const struct page *page) { return false; } static inline bool zone_device_pages_have_same_pgmap(const struct page *a, const struct page *b) { return true; } #endif static inline bool folio_is_zone_device(const struct folio *folio) { return is_zone_device_page(&folio->page); } static inline bool is_zone_movable_page(const struct page *page) { return page_zonenum(page) == ZONE_MOVABLE; } static inline bool folio_is_zone_movable(const struct folio *folio) { return folio_zonenum(folio) == ZONE_MOVABLE; } #endif /* * Return true if [start_pfn, start_pfn + nr_pages) range has a non-empty * intersection with the given zone */ static inline bool zone_intersects(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) { if (zone_is_empty(zone)) return false; if (start_pfn >= zone_end_pfn(zone) || start_pfn + nr_pages <= zone->zone_start_pfn) return false; return true; } /* * The "priority" of VM scanning is how much of the queues we will scan in one * go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the * queues ("queue_length >> 12") during an aging round. */ #define DEF_PRIORITY 12 /* Maximum number of zones on a zonelist */ #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES) enum { ZONELIST_FALLBACK, /* zonelist with fallback */ #ifdef CONFIG_NUMA /* * The NUMA zonelists are doubled because we need zonelists that * restrict the allocations to a single node for __GFP_THISNODE. */ ZONELIST_NOFALLBACK, /* zonelist without fallback (__GFP_THISNODE) */ #endif MAX_ZONELISTS }; /* * This struct contains information about a zone in a zonelist. It is stored * here to avoid dereferences into large structures and lookups of tables */ struct zoneref { struct zone *zone; /* Pointer to actual zone */ int zone_idx; /* zone_idx(zoneref->zone) */ }; /* * One allocation request operates on a zonelist. A zonelist * is a list of zones, the first one is the 'goal' of the * allocation, the other zones are fallback zones, in decreasing * priority. * * To speed the reading of the zonelist, the zonerefs contain the zone index * of the entry being read. Helper functions to access information given * a struct zoneref are * * zonelist_zone() - Return the struct zone * for an entry in _zonerefs * zonelist_zone_idx() - Return the index of the zone for an entry * zonelist_node_idx() - Return the index of the node for an entry */ struct zonelist { struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1]; }; /* * The array of struct pages for flatmem. * It must be declared for SPARSEMEM as well because there are configurations * that rely on that. */ extern struct page *mem_map; #ifdef CONFIG_TRANSPARENT_HUGEPAGE struct deferred_split { spinlock_t split_queue_lock; struct list_head split_queue; unsigned long split_queue_len; }; #endif #ifdef CONFIG_MEMORY_FAILURE /* * Per NUMA node memory failure handling statistics. */ struct memory_failure_stats { /* * Number of raw pages poisoned. * Cases not accounted: memory outside kernel control, offline page, * arch-specific memory_failure (SGX), hwpoison_filter() filtered * error events, and unpoison actions from hwpoison_unpoison. */ unsigned long total; /* * Recovery results of poisoned raw pages handled by memory_failure, * in sync with mf_result. * total = ignored + failed + delayed + recovered. * total * PAGE_SIZE * #nodes = /proc/meminfo/HardwareCorrupted. */ unsigned long ignored; unsigned long failed; unsigned long delayed; unsigned long recovered; }; #endif /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which * describes the whole memory. * * Memory statistics and page replacement data structures are maintained on a * per-zone basis. */ typedef struct pglist_data { /* * node_zones contains just the zones for THIS node. Not all of the * zones may be populated, but it is the full list. It is referenced by * this node's node_zonelists as well as other node's node_zonelists. */ struct zone node_zones[MAX_NR_ZONES]; /* * node_zonelists contains references to all zones in all nodes. * Generally the first zones will be references to this node's * node_zones. */ struct zonelist node_zonelists[MAX_ZONELISTS]; int nr_zones; /* number of populated zones in this node */ #ifdef CONFIG_FLATMEM /* means !SPARSEMEM */ struct page *node_mem_map; #ifdef CONFIG_PAGE_EXTENSION struct page_ext *node_page_ext; #endif #endif #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* * Must be held any time you expect node_start_pfn, * node_present_pages, node_spanned_pages or nr_zones to stay constant. * Also synchronizes pgdat->first_deferred_pfn during deferred page * init. * * pgdat_resize_lock() and pgdat_resize_unlock() are provided to * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG * or CONFIG_DEFERRED_STRUCT_PAGE_INIT. * * Nests above zone->lock and zone->span_seqlock */ spinlock_t node_size_lock; #endif unsigned long node_start_pfn; unsigned long node_present_pages; /* total number of physical pages */ unsigned long node_spanned_pages; /* total size of physical page range, including holes */ int node_id; wait_queue_head_t kswapd_wait; wait_queue_head_t pfmemalloc_wait; /* workqueues for throttling reclaim for different reasons. */ wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE]; atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */ unsigned long nr_reclaim_start; /* nr pages written while throttled * when throttling started. */ #ifdef CONFIG_MEMORY_HOTPLUG struct mutex kswapd_lock; #endif struct task_struct *kswapd; /* Protected by kswapd_lock */ int kswapd_order; enum zone_type kswapd_highest_zoneidx; int kswapd_failures; /* Number of 'reclaimed == 0' runs */ #ifdef CONFIG_COMPACTION int kcompactd_max_order; enum zone_type kcompactd_highest_zoneidx; wait_queue_head_t kcompactd_wait; struct task_struct *kcompactd; bool proactive_compact_trigger; #endif /* * This is a per-node reserve of pages that are not available * to userspace allocations. */ unsigned long totalreserve_pages; #ifdef CONFIG_NUMA /* * node reclaim becomes active if more unmapped pages exist. */ unsigned long min_unmapped_pages; unsigned long min_slab_pages; #endif /* CONFIG_NUMA */ /* Write-intensive fields used by page reclaim */ CACHELINE_PADDING(_pad1_); #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* * If memory initialisation on large machines is deferred then this * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE struct deferred_split deferred_split_queue; #endif #ifdef CONFIG_NUMA_BALANCING /* start time in ms of current promote rate limit period */ unsigned int nbp_rl_start; /* number of promote candidate pages at start time of current rate limit period */ unsigned long nbp_rl_nr_cand; /* promote threshold in ms */ unsigned int nbp_threshold; /* start time in ms of current promote threshold adjustment period */ unsigned int nbp_th_start; /* * number of promote candidate pages at start time of current promote * threshold adjustment period */ unsigned long nbp_th_nr_cand; #endif /* Fields commonly accessed by the page reclaim scanner */ /* * NOTE: THIS IS UNUSED IF MEMCG IS ENABLED. * * Use mem_cgroup_lruvec() to look up lruvecs. */ struct lruvec __lruvec; unsigned long flags; #ifdef CONFIG_LRU_GEN /* kswap mm walk data */ struct lru_gen_mm_walk mm_walk; /* lru_gen_folio list */ struct lru_gen_memcg memcg_lru; #endif CACHELINE_PADDING(_pad2_); /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; #ifdef CONFIG_NUMA struct memory_tier __rcu *memtier; #endif #ifdef CONFIG_MEMORY_FAILURE struct memory_failure_stats mf_stats; #endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid)) static inline unsigned long pgdat_end_pfn(pg_data_t *pgdat) { return pgdat->node_start_pfn + pgdat->node_spanned_pages; } #include <linux/memory_hotplug.h> void build_all_zonelists(pg_data_t *pgdat); void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, int order, enum zone_type highest_zoneidx); bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx, unsigned int alloc_flags, long free_pages); bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx, unsigned int alloc_flags); bool zone_watermark_ok_safe(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx); /* * Memory initialization context, use to differentiate memory added by * the platform statically or via memory hotplug interface. */ enum meminit_context { MEMINIT_EARLY, MEMINIT_HOTPLUG, }; extern void init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, unsigned long size); extern void lruvec_init(struct lruvec *lruvec); static inline struct pglist_data *lruvec_pgdat(struct lruvec *lruvec) { #ifdef CONFIG_MEMCG return lruvec->pgdat; #else return container_of(lruvec, struct pglist_data, __lruvec); #endif } #ifdef CONFIG_HAVE_MEMORYLESS_NODES int local_memory_node(int node_id); #else static inline int local_memory_node(int node_id) { return node_id; }; #endif /* * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc. */ #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones) #ifdef CONFIG_ZONE_DEVICE static inline bool zone_is_zone_device(struct zone *zone) { return zone_idx(zone) == ZONE_DEVICE; } #else static inline bool zone_is_zone_device(struct zone *zone) { return false; } #endif /* * Returns true if a zone has pages managed by the buddy allocator. * All the reclaim decisions have to use this function rather than * populated_zone(). If the whole zone is reserved then we can easily * end up with populated_zone() && !managed_zone(). */ static inline bool managed_zone(struct zone *zone) { return zone_managed_pages(zone); } /* Returns true if a zone has memory */ static inline bool populated_zone(struct zone *zone) { return zone->present_pages; } #ifdef CONFIG_NUMA static inline int zone_to_nid(struct zone *zone) { return zone->node; } static inline void zone_set_nid(struct zone *zone, int nid) { zone->node = nid; } #else static inline int zone_to_nid(struct zone *zone) { return 0; } static inline void zone_set_nid(struct zone *zone, int nid) {} #endif extern int movable_zone; static inline int is_highmem_idx(enum zone_type idx) { #ifdef CONFIG_HIGHMEM return (idx == ZONE_HIGHMEM || (idx == ZONE_MOVABLE && movable_zone == ZONE_HIGHMEM)); #else return 0; #endif } /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references * to ZONE_{DMA/NORMAL/HIGHMEM/etc} in general code to a minimum. * @zone: pointer to struct zone variable * Return: 1 for a highmem zone, 0 otherwise */ static inline int is_highmem(struct zone *zone) { return is_highmem_idx(zone_idx(zone)); } #ifdef CONFIG_ZONE_DMA bool has_managed_dma(void); #else static inline bool has_managed_dma(void) { return false; } #endif #ifndef CONFIG_NUMA extern struct pglist_data contig_page_data; static inline struct pglist_data *NODE_DATA(int nid) { return &contig_page_data; } #else /* CONFIG_NUMA */ #include <asm/mmzone.h> #endif /* !CONFIG_NUMA */ extern struct pglist_data *first_online_pgdat(void); extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat); extern struct zone *next_zone(struct zone *zone); /** * for_each_online_pgdat - helper macro to iterate over all online nodes * @pgdat: pointer to a pg_data_t variable */ #define for_each_online_pgdat(pgdat) \ for (pgdat = first_online_pgdat(); \ pgdat; \ pgdat = next_online_pgdat(pgdat)) /** * for_each_zone - helper macro to iterate over all memory zones * @zone: pointer to struct zone variable * * The user only needs to declare the zone variable, for_each_zone * fills it in. */ #define for_each_zone(zone) \ for (zone = (first_online_pgdat())->node_zones; \ zone; \ zone = next_zone(zone)) #define for_each_populated_zone(zone) \ for (zone = (first_online_pgdat())->node_zones; \ zone; \ zone = next_zone(zone)) \ if (!populated_zone(zone)) \ ; /* do nothing */ \ else static inline struct zone *zonelist_zone(struct zoneref *zoneref) { return zoneref->zone; } static inline int zonelist_zone_idx(struct zoneref *zoneref) { return zoneref->zone_idx; } static inline int zonelist_node_idx(struct zoneref *zoneref) { return zone_to_nid(zoneref->zone); } struct zoneref *__next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, nodemask_t *nodes); /** * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point * @z: The cursor used as a starting point for the search * @highest_zoneidx: The zone index of the highest zone to return * @nodes: An optional nodemask to filter the zonelist with * * This function returns the next zone at or below a given zone index that is * within the allowed nodemask using a cursor as the starting point for the * search. The zoneref returned is a cursor that represents the current zone * being examined. It should be advanced by one before calling * next_zones_zonelist again. * * Return: the next zone at or below highest_zoneidx within the allowed * nodemask using a cursor within a zonelist as a starting point */ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, nodemask_t *nodes) { if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx)) return z; return __next_zones_zonelist(z, highest_zoneidx, nodes); } /** * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist * @zonelist: The zonelist to search for a suitable zone * @highest_zoneidx: The zone index of the highest zone to return * @nodes: An optional nodemask to filter the zonelist with * * This function returns the first zone at or below a given zone index that is * within the allowed nodemask. The zoneref returned is a cursor that can be * used to iterate the zonelist with next_zones_zonelist by advancing it by * one before calling. * * When no eligible zone is found, zoneref->zone is NULL (zoneref itself is * never NULL). This may happen either genuinely, or due to concurrent nodemask * update due to cpuset modification. * * Return: Zoneref pointer for the first suitable zone found */ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, enum zone_type highest_zoneidx, nodemask_t *nodes) { return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes); } /** * for_each_zone_zonelist_nodemask - helper macro to iterate over valid zones in a zonelist at or below a given zone index and within a nodemask * @zone: The current zone in the iterator * @z: The current pointer within zonelist->_zonerefs being iterated * @zlist: The zonelist being iterated * @highidx: The zone index of the highest zone to return * @nodemask: Nodemask allowed by the allocator * * This iterator iterates though all zones at or below a given zone index and * within a given nodemask */ #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ zone; \ z = next_zones_zonelist(++z, highidx, nodemask), \ zone = zonelist_zone(z)) #define for_next_zone_zonelist_nodemask(zone, z, highidx, nodemask) \ for (zone = zonelist_zone(z); \ zone; \ z = next_zones_zonelist(++z, highidx, nodemask), \ zone = zonelist_zone(z)) /** * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index * @zone: The current zone in the iterator * @z: The current pointer within zonelist->zones being iterated * @zlist: The zonelist being iterated * @highidx: The zone index of the highest zone to return * * This iterator iterates though all zones at or below a given zone index. */ #define for_each_zone_zonelist(zone, z, zlist, highidx) \ for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) /* Whether the 'nodes' are all movable nodes */ static inline bool movable_only_nodes(nodemask_t *nodes) { struct zonelist *zonelist; struct zoneref *z; int nid; if (nodes_empty(*nodes)) return false; /* * We can chose arbitrary node from the nodemask to get a * zonelist as they are interlinked. We just need to find * at least one zone that can satisfy kernel allocations. */ nid = first_node(*nodes); zonelist = &NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK]; z = first_zones_zonelist(zonelist, ZONE_NORMAL, nodes); return (!zonelist_zone(z)) ? true : false; } #ifdef CONFIG_SPARSEMEM #include <asm/sparsemem.h> #endif #ifdef CONFIG_FLATMEM #define pfn_to_nid(pfn) (0) #endif #ifdef CONFIG_SPARSEMEM /* * PA_SECTION_SHIFT physical address to/from section number * PFN_SECTION_SHIFT pfn to/from section number */ #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT) #define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT) #define PAGES_PER_SECTION (1UL << PFN_SECTION_SHIFT) #define PAGE_SECTION_MASK (~(PAGES_PER_SECTION-1)) #define SECTION_BLOCKFLAGS_BITS \ ((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS) #if (MAX_PAGE_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS #error Allocator MAX_PAGE_ORDER exceeds SECTION_SIZE #endif static inline unsigned long pfn_to_section_nr(unsigned long pfn) { return pfn >> PFN_SECTION_SHIFT; } static inline unsigned long section_nr_to_pfn(unsigned long sec) { return sec << PFN_SECTION_SHIFT; } #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) #define SUBSECTION_SHIFT 21 #define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT) #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) #define PAGE_SUBSECTION_MASK (~(PAGES_PER_SUBSECTION-1)) #if SUBSECTION_SHIFT > SECTION_SIZE_BITS #error Subsection size exceeds section size #else #define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT)) #endif #define SUBSECTION_ALIGN_UP(pfn) ALIGN((pfn), PAGES_PER_SUBSECTION) #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK) struct mem_section_usage { struct rcu_head rcu; #ifdef CONFIG_SPARSEMEM_VMEMMAP DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); #endif /* See declaration of similar field in struct zone */ unsigned long pageblock_flags[0]; }; void subsection_map_init(unsigned long pfn, unsigned long nr_pages); struct page; struct page_ext; struct mem_section { /* * This is, logically, a pointer to an array of struct * pages. However, it is stored with some other magic. * (see sparse.c::sparse_init_one_section()) * * Additionally during early boot we encode node id of * the location of the section here to guide allocation. * (see sparse.c::memory_present()) * * Making it a UL at least makes someone do a cast * before using it wrong. */ unsigned long section_mem_map; struct mem_section_usage *usage; #ifdef CONFIG_PAGE_EXTENSION /* * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use * section. (see page_ext.h about this.) */ struct page_ext *page_ext; unsigned long pad; #endif /* * WARNING: mem_section must be a power-of-2 in size for the * calculation and use of SECTION_ROOT_MASK to make sense. */ }; #ifdef CONFIG_SPARSEMEM_EXTREME #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) #else #define SECTIONS_PER_ROOT 1 #endif #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif static inline unsigned long *section_to_usemap(struct mem_section *ms) { return ms->usage->pageblock_flags; } static inline struct mem_section *__nr_to_section(unsigned long nr) { unsigned long root = SECTION_NR_TO_ROOT(nr); if (unlikely(root >= NR_SECTION_ROOTS)) return NULL; #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section || !mem_section[root]) return NULL; #endif return &mem_section[root][nr & SECTION_ROOT_MASK]; } extern size_t mem_section_usage_size(void); /* * We use the lower bits of the mem_map pointer to store * a little bit of information. The pointer is calculated * as mem_map - section_nr_to_pfn(pnum). The result is * aligned to the minimum alignment of the two values: * 1. All mem_map arrays are page-aligned. * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT * lowest bits. PFN_SECTION_SHIFT is arch-specific * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. * To sum it up, at least 6 bits are available on all architectures. * However, we can exceed 6 bits on some other architectures except * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available * with the worst case of 64K pages on arm64) if we make sure the * exceeded bit is not applicable to powerpc. */ enum { SECTION_MARKED_PRESENT_BIT, SECTION_HAS_MEM_MAP_BIT, SECTION_IS_ONLINE_BIT, SECTION_IS_EARLY_BIT, #ifdef CONFIG_ZONE_DEVICE SECTION_TAINT_ZONE_DEVICE_BIT, #endif SECTION_MAP_LAST_BIT, }; #define SECTION_MARKED_PRESENT BIT(SECTION_MARKED_PRESENT_BIT) #define SECTION_HAS_MEM_MAP BIT(SECTION_HAS_MEM_MAP_BIT) #define SECTION_IS_ONLINE BIT(SECTION_IS_ONLINE_BIT) #define SECTION_IS_EARLY BIT(SECTION_IS_EARLY_BIT) #ifdef CONFIG_ZONE_DEVICE #define SECTION_TAINT_ZONE_DEVICE BIT(SECTION_TAINT_ZONE_DEVICE_BIT) #endif #define SECTION_MAP_MASK (~(BIT(SECTION_MAP_LAST_BIT) - 1)) #define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT static inline struct page *__section_mem_map_addr(struct mem_section *section) { unsigned long map = section->section_mem_map; map &= SECTION_MAP_MASK; return (struct page *)map; } static inline int present_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); } static inline int present_section_nr(unsigned long nr) { return present_section(__nr_to_section(nr)); } static inline int valid_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP)); } static inline int early_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_IS_EARLY)); } static inline int valid_section_nr(unsigned long nr) { return valid_section(__nr_to_section(nr)); } static inline int online_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_IS_ONLINE)); } #ifdef CONFIG_ZONE_DEVICE static inline int online_device_section(struct mem_section *section) { unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE; return section && ((section->section_mem_map & flags) == flags); } #else static inline int online_device_section(struct mem_section *section) { return 0; } #endif static inline int online_section_nr(unsigned long nr) { return online_section(__nr_to_section(nr)); } #ifdef CONFIG_MEMORY_HOTPLUG void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn); void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn); #endif static inline struct mem_section *__pfn_to_section(unsigned long pfn) { return __nr_to_section(pfn_to_section_nr(pfn)); } extern unsigned long __highest_present_section_nr; static inline int subsection_map_index(unsigned long pfn) { return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; } #ifdef CONFIG_SPARSEMEM_VMEMMAP static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) { int idx = subsection_map_index(pfn); struct mem_section_usage *usage = READ_ONCE(ms->usage); return usage ? test_bit(idx, usage->subsection_map) : 0; } #else static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) { return 1; } #endif #ifndef CONFIG_HAVE_ARCH_PFN_VALID /** * pfn_valid - check if there is a valid memory map entry for a PFN * @pfn: the page frame number to check * * Check if there is a valid memory map entry aka struct page for the @pfn. * Note, that availability of the memory map entry does not imply that * there is actual usable memory at that @pfn. The struct page may * represent a hole or an unusable page frame. * * Return: 1 for PFNs that have memory map entries and 0 otherwise */ static inline int pfn_valid(unsigned long pfn) { struct mem_section *ms; int ret; /* * Ensure the upper PAGE_SHIFT bits are clear in the * pfn. Else it might lead to false positives when * some of the upper bits are set, but the lower bits * match a valid pfn. */ if (PHYS_PFN(PFN_PHYS(pfn)) != pfn) return 0; if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; ms = __pfn_to_section(pfn); rcu_read_lock_sched(); if (!valid_section(ms)) { rcu_read_unlock_sched(); return 0; } /* * Traditionally early sections always returned pfn_valid() for * the entire section-sized span. */ ret = early_section(ms) || pfn_section_valid(ms, pfn); rcu_read_unlock_sched(); return ret; } #endif static inline int pfn_in_present_section(unsigned long pfn) { if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; return present_section(__pfn_to_section(pfn)); } static inline unsigned long next_present_section_nr(unsigned long section_nr) { while (++section_nr <= __highest_present_section_nr) { if (present_section_nr(section_nr)) return section_nr; } return -1; } /* * These are _only_ used during initialisation, therefore they * can use __initdata ... They could have names to indicate * this restriction. */ #ifdef CONFIG_NUMA #define pfn_to_nid(pfn) \ ({ \ unsigned long __pfn_to_nid_pfn = (pfn); \ page_to_nid(pfn_to_page(__pfn_to_nid_pfn)); \ }) #else #define pfn_to_nid(pfn) (0) #endif void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) #define pfn_in_present_section pfn_valid #define subsection_map_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ #endif /* !__GENERATING_BOUNDS.H */ #endif /* !__ASSEMBLY__ */ #endif /* _LINUX_MMZONE_H */ |
10 10 1 2 1 3 2 2 2 2 2 3 1 2 7 7 2 5 3 4 1 2 5 7 7 1 7 9 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/fs/affs/inode.c * * (c) 1996 Hans-Joachim Widmaier - Rewritten * * (C) 1993 Ray Burr - Modified for Amiga FFS filesystem. * * (C) 1992 Eric Youngdale Modified for ISO 9660 filesystem. * * (C) 1991 Linus Torvalds - minix filesystem */ #include <linux/module.h> #include <linux/init.h> #include <linux/statfs.h> #include <linux/fs_parser.h> #include <linux/fs_context.h> #include <linux/magic.h> #include <linux/sched.h> #include <linux/cred.h> #include <linux/slab.h> #include <linux/writeback.h> #include <linux/blkdev.h> #include <linux/seq_file.h> #include <linux/iversion.h> #include "affs.h" static int affs_statfs(struct dentry *dentry, struct kstatfs *buf); static int affs_show_options(struct seq_file *m, struct dentry *root); static void affs_commit_super(struct super_block *sb, int wait) { struct affs_sb_info *sbi = AFFS_SB(sb); struct buffer_head *bh = sbi->s_root_bh; struct affs_root_tail *tail = AFFS_ROOT_TAIL(sb, bh); lock_buffer(bh); affs_secs_to_datestamp(ktime_get_real_seconds(), &tail->disk_change); affs_fix_checksum(sb, bh); unlock_buffer(bh); mark_buffer_dirty(bh); if (wait) sync_dirty_buffer(bh); } static void affs_put_super(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); pr_debug("%s()\n", __func__); cancel_delayed_work_sync(&sbi->sb_work); } static int affs_sync_fs(struct super_block *sb, int wait) { affs_commit_super(sb, wait); return 0; } static void flush_superblock(struct work_struct *work) { struct affs_sb_info *sbi; struct super_block *sb; sbi = container_of(work, struct affs_sb_info, sb_work.work); sb = sbi->sb; spin_lock(&sbi->work_lock); sbi->work_queued = 0; spin_unlock(&sbi->work_lock); affs_commit_super(sb, 1); } void affs_mark_sb_dirty(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); unsigned long delay; if (sb_rdonly(sb)) return; spin_lock(&sbi->work_lock); if (!sbi->work_queued) { delay = msecs_to_jiffies(dirty_writeback_interval * 10); queue_delayed_work(system_long_wq, &sbi->sb_work, delay); sbi->work_queued = 1; } spin_unlock(&sbi->work_lock); } static struct kmem_cache * affs_inode_cachep; static struct inode *affs_alloc_inode(struct super_block *sb) { struct affs_inode_info *i; i = alloc_inode_sb(sb, affs_inode_cachep, GFP_KERNEL); if (!i) return NULL; inode_set_iversion(&i->vfs_inode, 1); i->i_lc = NULL; i->i_ext_bh = NULL; i->i_pa_cnt = 0; return &i->vfs_inode; } static void affs_free_inode(struct inode *inode) { kmem_cache_free(affs_inode_cachep, AFFS_I(inode)); } static void init_once(void *foo) { struct affs_inode_info *ei = (struct affs_inode_info *) foo; mutex_init(&ei->i_link_lock); mutex_init(&ei->i_ext_lock); inode_init_once(&ei->vfs_inode); } static int __init init_inodecache(void) { affs_inode_cachep = kmem_cache_create("affs_inode_cache", sizeof(struct affs_inode_info), 0, (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT), init_once); if (affs_inode_cachep == NULL) return -ENOMEM; return 0; } static void destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(affs_inode_cachep); } static const struct super_operations affs_sops = { .alloc_inode = affs_alloc_inode, .free_inode = affs_free_inode, .write_inode = affs_write_inode, .evict_inode = affs_evict_inode, .put_super = affs_put_super, .sync_fs = affs_sync_fs, .statfs = affs_statfs, .show_options = affs_show_options, }; enum { Opt_bs, Opt_mode, Opt_mufs, Opt_notruncate, Opt_prefix, Opt_protect, Opt_reserved, Opt_root, Opt_setgid, Opt_setuid, Opt_verbose, Opt_volume, Opt_ignore, }; struct affs_context { kuid_t uid; /* uid to override */ kgid_t gid; /* gid to override */ unsigned int mode; /* mode to override */ unsigned int reserved; /* Number of reserved blocks */ int root_block; /* FFS root block number */ int blocksize; /* Initial device blksize */ char *prefix; /* Prefix for volumes and assigns */ char volume[32]; /* Vol. prefix for absolute symlinks */ unsigned long mount_flags; /* Options */ }; static const struct fs_parameter_spec affs_param_spec[] = { fsparam_u32 ("bs", Opt_bs), fsparam_u32oct ("mode", Opt_mode), fsparam_flag ("mufs", Opt_mufs), fsparam_flag ("nofilenametruncate", Opt_notruncate), fsparam_string ("prefix", Opt_prefix), fsparam_flag ("protect", Opt_protect), fsparam_u32 ("reserved", Opt_reserved), fsparam_u32 ("root", Opt_root), fsparam_gid ("setgid", Opt_setgid), fsparam_uid ("setuid", Opt_setuid), fsparam_flag ("verbose", Opt_verbose), fsparam_string ("volume", Opt_volume), fsparam_flag ("grpquota", Opt_ignore), fsparam_flag ("noquota", Opt_ignore), fsparam_flag ("quota", Opt_ignore), fsparam_flag ("usrquota", Opt_ignore), {}, }; static int affs_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct affs_context *ctx = fc->fs_private; struct fs_parse_result result; int n; int opt; opt = fs_parse(fc, affs_param_spec, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_bs: n = result.uint_32; if (n != 512 && n != 1024 && n != 2048 && n != 4096) { pr_warn("Invalid blocksize (512, 1024, 2048, 4096 allowed)\n"); return -EINVAL; } ctx->blocksize = n; break; case Opt_mode: ctx->mode = result.uint_32 & 0777; affs_set_opt(ctx->mount_flags, SF_SETMODE); break; case Opt_mufs: affs_set_opt(ctx->mount_flags, SF_MUFS); break; case Opt_notruncate: affs_set_opt(ctx->mount_flags, SF_NO_TRUNCATE); break; case Opt_prefix: kfree(ctx->prefix); ctx->prefix = param->string; param->string = NULL; affs_set_opt(ctx->mount_flags, SF_PREFIX); break; case Opt_protect: affs_set_opt(ctx->mount_flags, SF_IMMUTABLE); break; case Opt_reserved: ctx->reserved = result.uint_32; break; case Opt_root: ctx->root_block = result.uint_32; break; case Opt_setgid: ctx->gid = result.gid; affs_set_opt(ctx->mount_flags, SF_SETGID); break; case Opt_setuid: ctx->uid = result.uid; affs_set_opt(ctx->mount_flags, SF_SETUID); break; case Opt_verbose: affs_set_opt(ctx->mount_flags, SF_VERBOSE); break; case Opt_volume: strscpy(ctx->volume, param->string, 32); break; case Opt_ignore: /* Silently ignore the quota options */ break; default: return -EINVAL; } return 0; } static int affs_show_options(struct seq_file *m, struct dentry *root) { struct super_block *sb = root->d_sb; struct affs_sb_info *sbi = AFFS_SB(sb); if (sb->s_blocksize) seq_printf(m, ",bs=%lu", sb->s_blocksize); if (affs_test_opt(sbi->s_flags, SF_SETMODE)) seq_printf(m, ",mode=%o", sbi->s_mode); if (affs_test_opt(sbi->s_flags, SF_MUFS)) seq_puts(m, ",mufs"); if (affs_test_opt(sbi->s_flags, SF_NO_TRUNCATE)) seq_puts(m, ",nofilenametruncate"); if (affs_test_opt(sbi->s_flags, SF_PREFIX)) seq_printf(m, ",prefix=%s", sbi->s_prefix); if (affs_test_opt(sbi->s_flags, SF_IMMUTABLE)) seq_puts(m, ",protect"); if (sbi->s_reserved != 2) seq_printf(m, ",reserved=%u", sbi->s_reserved); if (sbi->s_root_block != (sbi->s_reserved + sbi->s_partition_size - 1) / 2) seq_printf(m, ",root=%u", sbi->s_root_block); if (affs_test_opt(sbi->s_flags, SF_SETGID)) seq_printf(m, ",setgid=%u", from_kgid_munged(&init_user_ns, sbi->s_gid)); if (affs_test_opt(sbi->s_flags, SF_SETUID)) seq_printf(m, ",setuid=%u", from_kuid_munged(&init_user_ns, sbi->s_uid)); if (affs_test_opt(sbi->s_flags, SF_VERBOSE)) seq_puts(m, ",verbose"); if (sbi->s_volume[0]) seq_printf(m, ",volume=%s", sbi->s_volume); return 0; } /* This function definitely needs to be split up. Some fine day I'll * hopefully have the guts to do so. Until then: sorry for the mess. */ static int affs_fill_super(struct super_block *sb, struct fs_context *fc) { struct affs_sb_info *sbi; struct affs_context *ctx = fc->fs_private; struct buffer_head *root_bh = NULL; struct buffer_head *boot_bh; struct inode *root_inode = NULL; int silent = fc->sb_flags & SB_SILENT; int size, blocksize; u32 chksum; int num_bm; int i, j; int tmp_flags; /* fix remount prototype... */ u8 sig[4]; int ret; sb->s_magic = AFFS_SUPER_MAGIC; sb->s_op = &affs_sops; sb->s_flags |= SB_NODIRATIME; sb->s_time_gran = NSEC_PER_SEC; sb->s_time_min = sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; sb->s_time_max = 86400LL * U32_MAX + 86400 + sb->s_time_min; sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; sb->s_fs_info = sbi; sbi->sb = sb; mutex_init(&sbi->s_bmlock); spin_lock_init(&sbi->symlink_lock); spin_lock_init(&sbi->work_lock); INIT_DELAYED_WORK(&sbi->sb_work, flush_superblock); sbi->s_flags = ctx->mount_flags; sbi->s_mode = ctx->mode; sbi->s_uid = ctx->uid; sbi->s_gid = ctx->gid; sbi->s_reserved = ctx->reserved; sbi->s_prefix = ctx->prefix; ctx->prefix = NULL; memcpy(sbi->s_volume, ctx->volume, 32); /* N.B. after this point s_prefix must be released */ /* Get the size of the device in 512-byte blocks. * If we later see that the partition uses bigger * blocks, we will have to change it. */ size = bdev_nr_sectors(sb->s_bdev); pr_debug("initial blocksize=%d, #blocks=%d\n", 512, size); affs_set_blocksize(sb, PAGE_SIZE); /* Try to find root block. Its location depends on the block size. */ i = bdev_logical_block_size(sb->s_bdev); j = PAGE_SIZE; blocksize = ctx->blocksize; if (blocksize > 0) { i = j = blocksize; size = size / (blocksize / 512); } for (blocksize = i; blocksize <= j; blocksize <<= 1, size >>= 1) { sbi->s_root_block = ctx->root_block; if (ctx->root_block < 0) sbi->s_root_block = (ctx->reserved + size - 1) / 2; pr_debug("setting blocksize to %d\n", blocksize); affs_set_blocksize(sb, blocksize); sbi->s_partition_size = size; /* The root block location that was calculated above is not * correct if the partition size is an odd number of 512- * byte blocks, which will be rounded down to a number of * 1024-byte blocks, and if there were an even number of * reserved blocks. Ideally, all partition checkers should * report the real number of blocks of the real blocksize, * but since this just cannot be done, we have to try to * find the root block anyways. In the above case, it is one * block behind the calculated one. So we check this one, too. */ for (num_bm = 0; num_bm < 2; num_bm++) { pr_debug("Dev %s, trying root=%u, bs=%d, " "size=%d, reserved=%d\n", sb->s_id, sbi->s_root_block + num_bm, ctx->blocksize, size, ctx->reserved); root_bh = affs_bread(sb, sbi->s_root_block + num_bm); if (!root_bh) continue; if (!affs_checksum_block(sb, root_bh) && be32_to_cpu(AFFS_ROOT_HEAD(root_bh)->ptype) == T_SHORT && be32_to_cpu(AFFS_ROOT_TAIL(sb, root_bh)->stype) == ST_ROOT) { sbi->s_hashsize = blocksize / 4 - 56; sbi->s_root_block += num_bm; goto got_root; } affs_brelse(root_bh); root_bh = NULL; } } if (!silent) pr_err("No valid root block on device %s\n", sb->s_id); return -EINVAL; /* N.B. after this point bh must be released */ got_root: /* Keep super block in cache */ sbi->s_root_bh = root_bh; ctx->root_block = sbi->s_root_block; /* Find out which kind of FS we have */ boot_bh = sb_bread(sb, 0); if (!boot_bh) { pr_err("Cannot read boot block\n"); return -EINVAL; } memcpy(sig, boot_bh->b_data, 4); brelse(boot_bh); chksum = be32_to_cpu(*(__be32 *)sig); /* Dircache filesystems are compatible with non-dircache ones * when reading. As long as they aren't supported, writing is * not recommended. */ if ((chksum == FS_DCFFS || chksum == MUFS_DCFFS || chksum == FS_DCOFS || chksum == MUFS_DCOFS) && !sb_rdonly(sb)) { pr_notice("Dircache FS - mounting %s read only\n", sb->s_id); sb->s_flags |= SB_RDONLY; } switch (chksum) { case MUFS_FS: case MUFS_INTLFFS: case MUFS_DCFFS: affs_set_opt(sbi->s_flags, SF_MUFS); fallthrough; case FS_INTLFFS: case FS_DCFFS: affs_set_opt(sbi->s_flags, SF_INTL); break; case MUFS_FFS: affs_set_opt(sbi->s_flags, SF_MUFS); break; case FS_FFS: break; case MUFS_OFS: affs_set_opt(sbi->s_flags, SF_MUFS); fallthrough; case FS_OFS: affs_set_opt(sbi->s_flags, SF_OFS); sb->s_flags |= SB_NOEXEC; break; case MUFS_DCOFS: case MUFS_INTLOFS: affs_set_opt(sbi->s_flags, SF_MUFS); fallthrough; case FS_DCOFS: case FS_INTLOFS: affs_set_opt(sbi->s_flags, SF_INTL); affs_set_opt(sbi->s_flags, SF_OFS); sb->s_flags |= SB_NOEXEC; break; default: pr_err("Unknown filesystem on device %s: %08X\n", sb->s_id, chksum); return -EINVAL; } if (affs_test_opt(ctx->mount_flags, SF_VERBOSE)) { u8 len = AFFS_ROOT_TAIL(sb, root_bh)->disk_name[0]; pr_notice("Mounting volume \"%.*s\": Type=%.3s\\%c, Blocksize=%d\n", len > 31 ? 31 : len, AFFS_ROOT_TAIL(sb, root_bh)->disk_name + 1, sig, sig[3] + '0', blocksize); } sb->s_flags |= SB_NODEV | SB_NOSUID; sbi->s_data_blksize = sb->s_blocksize; if (affs_test_opt(sbi->s_flags, SF_OFS)) sbi->s_data_blksize -= 24; tmp_flags = sb->s_flags; ret = affs_init_bitmap(sb, &tmp_flags); if (ret) return ret; sb->s_flags = tmp_flags; /* set up enough so that it can read an inode */ root_inode = affs_iget(sb, ctx->root_block); if (IS_ERR(root_inode)) return PTR_ERR(root_inode); if (affs_test_opt(AFFS_SB(sb)->s_flags, SF_INTL)) sb->s_d_op = &affs_intl_dentry_operations; else sb->s_d_op = &affs_dentry_operations; sb->s_root = d_make_root(root_inode); if (!sb->s_root) { pr_err("AFFS: Get root inode failed\n"); return -ENOMEM; } sb->s_export_op = &affs_export_ops; pr_debug("s_flags=%lX\n", sb->s_flags); return 0; } static int affs_reconfigure(struct fs_context *fc) { struct super_block *sb = fc->root->d_sb; struct affs_context *ctx = fc->fs_private; struct affs_sb_info *sbi = AFFS_SB(sb); int res = 0; sync_filesystem(sb); fc->sb_flags |= SB_NODIRATIME; flush_delayed_work(&sbi->sb_work); /* * NB: Historically, only mount_flags, mode, uid, gic, prefix, * and volume are accepted during remount. */ sbi->s_flags = ctx->mount_flags; sbi->s_mode = ctx->mode; sbi->s_uid = ctx->uid; sbi->s_gid = ctx->gid; /* protect against readers */ spin_lock(&sbi->symlink_lock); if (ctx->prefix) { kfree(sbi->s_prefix); sbi->s_prefix = ctx->prefix; ctx->prefix = NULL; } memcpy(sbi->s_volume, ctx->volume, 32); spin_unlock(&sbi->symlink_lock); if ((bool)(fc->sb_flags & SB_RDONLY) == sb_rdonly(sb)) return 0; if (fc->sb_flags & SB_RDONLY) affs_free_bitmap(sb); else res = affs_init_bitmap(sb, &fc->sb_flags); return res; } static int affs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; int free; u64 id = huge_encode_dev(sb->s_bdev->bd_dev); pr_debug("%s() partsize=%d, reserved=%d\n", __func__, AFFS_SB(sb)->s_partition_size, AFFS_SB(sb)->s_reserved); free = affs_count_free_blocks(sb); buf->f_type = AFFS_SUPER_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_blocks = AFFS_SB(sb)->s_partition_size - AFFS_SB(sb)->s_reserved; buf->f_bfree = free; buf->f_bavail = free; buf->f_fsid = u64_to_fsid(id); buf->f_namelen = AFFSNAMEMAX; return 0; } static int affs_get_tree(struct fs_context *fc) { return get_tree_bdev(fc, affs_fill_super); } static void affs_kill_sb(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); kill_block_super(sb); if (sbi) { affs_free_bitmap(sb); affs_brelse(sbi->s_root_bh); kfree(sbi->s_prefix); mutex_destroy(&sbi->s_bmlock); kfree_rcu(sbi, rcu); } } static void affs_free_fc(struct fs_context *fc) { struct affs_context *ctx = fc->fs_private; kfree(ctx->prefix); kfree(ctx); } static const struct fs_context_operations affs_context_ops = { .parse_param = affs_parse_param, .get_tree = affs_get_tree, .reconfigure = affs_reconfigure, .free = affs_free_fc, }; static int affs_init_fs_context(struct fs_context *fc) { struct affs_context *ctx; ctx = kzalloc(sizeof(struct affs_context), GFP_KERNEL); if (!ctx) return -ENOMEM; if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE) { struct super_block *sb = fc->root->d_sb; struct affs_sb_info *sbi = AFFS_SB(sb); /* * NB: historically, no options other than volume were * preserved across a remount unless they were explicitly * passed in. */ memcpy(ctx->volume, sbi->s_volume, 32); } else { ctx->uid = current_uid(); ctx->gid = current_gid(); ctx->reserved = 2; ctx->root_block = -1; ctx->blocksize = -1; ctx->volume[0] = ':'; } fc->ops = &affs_context_ops; fc->fs_private = ctx; return 0; } static struct file_system_type affs_fs_type = { .owner = THIS_MODULE, .name = "affs", .kill_sb = affs_kill_sb, .fs_flags = FS_REQUIRES_DEV, .init_fs_context = affs_init_fs_context, .parameters = affs_param_spec, }; MODULE_ALIAS_FS("affs"); static int __init init_affs_fs(void) { int err = init_inodecache(); if (err) goto out1; err = register_filesystem(&affs_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); out1: return err; } static void __exit exit_affs_fs(void) { unregister_filesystem(&affs_fs_type); destroy_inodecache(); } MODULE_DESCRIPTION("Amiga filesystem support for Linux"); MODULE_LICENSE("GPL"); module_init(init_affs_fs) module_exit(exit_affs_fs) |
4 4 1 2 1 2 2 2 2 2 36 36 2 2 2 2 2 2 2 2 2 1 2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2006-2007 Silicon Graphics, Inc. * All Rights Reserved. */ #include "xfs.h" #include "xfs_mru_cache.h" /* * The MRU Cache data structure consists of a data store, an array of lists and * a lock to protect its internal state. At initialisation time, the client * supplies an element lifetime in milliseconds and a group count, as well as a * function pointer to call when deleting elements. A data structure for * queueing up work in the form of timed callbacks is also included. * * The group count controls how many lists are created, and thereby how finely * the elements are grouped in time. When reaping occurs, all the elements in * all the lists whose time has expired are deleted. * * To give an example of how this works in practice, consider a client that * initialises an MRU Cache with a lifetime of ten seconds and a group count of * five. Five internal lists will be created, each representing a two second * period in time. When the first element is added, time zero for the data * structure is initialised to the current time. * * All the elements added in the first two seconds are appended to the first * list. Elements added in the third second go into the second list, and so on. * If an element is accessed at any point, it is removed from its list and * inserted at the head of the current most-recently-used list. * * The reaper function will have nothing to do until at least twelve seconds * have elapsed since the first element was added. The reason for this is that * if it were called at t=11s, there could be elements in the first list that * have only been inactive for nine seconds, so it still does nothing. If it is * called anywhere between t=12 and t=14 seconds, it will delete all the * elements that remain in the first list. It's therefore possible for elements * to remain in the data store even after they've been inactive for up to * (t + t/g) seconds, where t is the inactive element lifetime and g is the * number of groups. * * The above example assumes that the reaper function gets called at least once * every (t/g) seconds. If it is called less frequently, unused elements will * accumulate in the reap list until the reaper function is eventually called. * The current implementation uses work queue callbacks to carefully time the * reaper function calls, so this should happen rarely, if at all. * * From a design perspective, the primary reason for the choice of a list array * representing discrete time intervals is that it's only practical to reap * expired elements in groups of some appreciable size. This automatically * introduces a granularity to element lifetimes, so there's no point storing an * individual timeout with each element that specifies a more precise reap time. * The bonus is a saving of sizeof(long) bytes of memory per element stored. * * The elements could have been stored in just one list, but an array of * counters or pointers would need to be maintained to allow them to be divided * up into discrete time groups. More critically, the process of touching or * removing an element would involve walking large portions of the entire list, * which would have a detrimental effect on performance. The additional memory * requirement for the array of list heads is minimal. * * When an element is touched or deleted, it needs to be removed from its * current list. Doubly linked lists are used to make the list maintenance * portion of these operations O(1). Since reaper timing can be imprecise, * inserts and lookups can occur when there are no free lists available. When * this happens, all the elements on the LRU list need to be migrated to the end * of the reap list. To keep the list maintenance portion of these operations * O(1) also, list tails need to be accessible without walking the entire list. * This is the reason why doubly linked list heads are used. */ /* * An MRU Cache is a dynamic data structure that stores its elements in a way * that allows efficient lookups, but also groups them into discrete time * intervals based on insertion time. This allows elements to be efficiently * and automatically reaped after a fixed period of inactivity. * * When a client data pointer is stored in the MRU Cache it needs to be added to * both the data store and to one of the lists. It must also be possible to * access each of these entries via the other, i.e. to: * * a) Walk a list, removing the corresponding data store entry for each item. * b) Look up a data store entry, then access its list entry directly. * * To achieve both of these goals, each entry must contain both a list entry and * a key, in addition to the user's data pointer. Note that it's not a good * idea to have the client embed one of these structures at the top of their own * data structure, because inserting the same item more than once would most * likely result in a loop in one of the lists. That's a sure-fire recipe for * an infinite loop in the code. */ struct xfs_mru_cache { struct radix_tree_root store; /* Core storage data structure. */ struct list_head *lists; /* Array of lists, one per grp. */ struct list_head reap_list; /* Elements overdue for reaping. */ spinlock_t lock; /* Lock to protect this struct. */ unsigned int grp_count; /* Number of discrete groups. */ unsigned int grp_time; /* Time period spanned by grps. */ unsigned int lru_grp; /* Group containing time zero. */ unsigned long time_zero; /* Time first element was added. */ xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */ struct delayed_work work; /* Workqueue data for reaping. */ unsigned int queued; /* work has been queued */ void *data; }; static struct workqueue_struct *xfs_mru_reap_wq; /* * When inserting, destroying or reaping, it's first necessary to update the * lists relative to a particular time. In the case of destroying, that time * will be well in the future to ensure that all items are moved to the reap * list. In all other cases though, the time will be the current time. * * This function enters a loop, moving the contents of the LRU list to the reap * list again and again until either a) the lists are all empty, or b) time zero * has been advanced sufficiently to be within the immediate element lifetime. * * Case a) above is detected by counting how many groups are migrated and * stopping when they've all been moved. Case b) is detected by monitoring the * time_zero field, which is updated as each group is migrated. * * The return value is the earliest time that more migration could be needed, or * zero if there's no need to schedule more work because the lists are empty. */ STATIC unsigned long _xfs_mru_cache_migrate( struct xfs_mru_cache *mru, unsigned long now) { unsigned int grp; unsigned int migrated = 0; struct list_head *lru_list; /* Nothing to do if the data store is empty. */ if (!mru->time_zero) return 0; /* While time zero is older than the time spanned by all the lists. */ while (mru->time_zero <= now - mru->grp_count * mru->grp_time) { /* * If the LRU list isn't empty, migrate its elements to the tail * of the reap list. */ lru_list = mru->lists + mru->lru_grp; if (!list_empty(lru_list)) list_splice_init(lru_list, mru->reap_list.prev); /* * Advance the LRU group number, freeing the old LRU list to * become the new MRU list; advance time zero accordingly. */ mru->lru_grp = (mru->lru_grp + 1) % mru->grp_count; mru->time_zero += mru->grp_time; /* * If reaping is so far behind that all the elements on all the * lists have been migrated to the reap list, it's now empty. */ if (++migrated == mru->grp_count) { mru->lru_grp = 0; mru->time_zero = 0; return 0; } } /* Find the first non-empty list from the LRU end. */ for (grp = 0; grp < mru->grp_count; grp++) { /* Check the grp'th list from the LRU end. */ lru_list = mru->lists + ((mru->lru_grp + grp) % mru->grp_count); if (!list_empty(lru_list)) return mru->time_zero + (mru->grp_count + grp) * mru->grp_time; } /* All the lists must be empty. */ mru->lru_grp = 0; mru->time_zero = 0; return 0; } /* * When inserting or doing a lookup, an element needs to be inserted into the * MRU list. The lists must be migrated first to ensure that they're * up-to-date, otherwise the new element could be given a shorter lifetime in * the cache than it should. */ STATIC void _xfs_mru_cache_list_insert( struct xfs_mru_cache *mru, struct xfs_mru_cache_elem *elem) { unsigned int grp = 0; unsigned long now = jiffies; /* * If the data store is empty, initialise time zero, leave grp set to * zero and start the work queue timer if necessary. Otherwise, set grp * to the number of group times that have elapsed since time zero. */ if (!_xfs_mru_cache_migrate(mru, now)) { mru->time_zero = now; if (!mru->queued) { mru->queued = 1; queue_delayed_work(xfs_mru_reap_wq, &mru->work, mru->grp_count * mru->grp_time); } } else { grp = (now - mru->time_zero) / mru->grp_time; grp = (mru->lru_grp + grp) % mru->grp_count; } /* Insert the element at the tail of the corresponding list. */ list_add_tail(&elem->list_node, mru->lists + grp); } /* * When destroying or reaping, all the elements that were migrated to the reap * list need to be deleted. For each element this involves removing it from the * data store, removing it from the reap list, calling the client's free * function and deleting the element from the element cache. * * We get called holding the mru->lock, which we drop and then reacquire. * Sparse need special help with this to tell it we know what we are doing. */ STATIC void _xfs_mru_cache_clear_reap_list( struct xfs_mru_cache *mru) __releases(mru->lock) __acquires(mru->lock) { struct xfs_mru_cache_elem *elem, *next; LIST_HEAD(tmp); list_for_each_entry_safe(elem, next, &mru->reap_list, list_node) { /* Remove the element from the data store. */ radix_tree_delete(&mru->store, elem->key); /* * remove to temp list so it can be freed without * needing to hold the lock */ list_move(&elem->list_node, &tmp); } spin_unlock(&mru->lock); list_for_each_entry_safe(elem, next, &tmp, list_node) { list_del_init(&elem->list_node); mru->free_func(mru->data, elem); } spin_lock(&mru->lock); } /* * We fire the reap timer every group expiry interval so * we always have a reaper ready to run. This makes shutdown * and flushing of the reaper easy to do. Hence we need to * keep when the next reap must occur so we can determine * at each interval whether there is anything we need to do. */ STATIC void _xfs_mru_cache_reap( struct work_struct *work) { struct xfs_mru_cache *mru = container_of(work, struct xfs_mru_cache, work.work); unsigned long now, next; ASSERT(mru && mru->lists); if (!mru || !mru->lists) return; spin_lock(&mru->lock); next = _xfs_mru_cache_migrate(mru, jiffies); _xfs_mru_cache_clear_reap_list(mru); mru->queued = next; if ((mru->queued > 0)) { now = jiffies; if (next <= now) next = 0; else next -= now; queue_delayed_work(xfs_mru_reap_wq, &mru->work, next); } spin_unlock(&mru->lock); } int xfs_mru_cache_init(void) { xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache", XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 1); if (!xfs_mru_reap_wq) return -ENOMEM; return 0; } void xfs_mru_cache_uninit(void) { destroy_workqueue(xfs_mru_reap_wq); } /* * To initialise a struct xfs_mru_cache pointer, call xfs_mru_cache_create() * with the address of the pointer, a lifetime value in milliseconds, a group * count and a free function to use when deleting elements. This function * returns 0 if the initialisation was successful. */ int xfs_mru_cache_create( struct xfs_mru_cache **mrup, void *data, unsigned int lifetime_ms, unsigned int grp_count, xfs_mru_cache_free_func_t free_func) { struct xfs_mru_cache *mru = NULL; int err = 0, grp; unsigned int grp_time; if (mrup) *mrup = NULL; if (!mrup || !grp_count || !lifetime_ms || !free_func) return -EINVAL; if (!(grp_time = msecs_to_jiffies(lifetime_ms) / grp_count)) return -EINVAL; mru = kzalloc(sizeof(*mru), GFP_KERNEL | __GFP_NOFAIL); if (!mru) return -ENOMEM; /* An extra list is needed to avoid reaping up to a grp_time early. */ mru->grp_count = grp_count + 1; mru->lists = kzalloc(mru->grp_count * sizeof(*mru->lists), GFP_KERNEL | __GFP_NOFAIL); if (!mru->lists) { err = -ENOMEM; goto exit; } for (grp = 0; grp < mru->grp_count; grp++) INIT_LIST_HEAD(mru->lists + grp); /* * We use GFP_KERNEL radix tree preload and do inserts under a * spinlock so GFP_ATOMIC is appropriate for the radix tree itself. */ INIT_RADIX_TREE(&mru->store, GFP_ATOMIC); INIT_LIST_HEAD(&mru->reap_list); spin_lock_init(&mru->lock); INIT_DELAYED_WORK(&mru->work, _xfs_mru_cache_reap); mru->grp_time = grp_time; mru->free_func = free_func; mru->data = data; *mrup = mru; exit: if (err && mru && mru->lists) kfree(mru->lists); if (err && mru) kfree(mru); return err; } /* * Call xfs_mru_cache_flush() to flush out all cached entries, calling their * free functions as they're deleted. When this function returns, the caller is * guaranteed that all the free functions for all the elements have finished * executing and the reaper is not running. */ static void xfs_mru_cache_flush( struct xfs_mru_cache *mru) { if (!mru || !mru->lists) return; spin_lock(&mru->lock); if (mru->queued) { spin_unlock(&mru->lock); cancel_delayed_work_sync(&mru->work); spin_lock(&mru->lock); } _xfs_mru_cache_migrate(mru, jiffies + mru->grp_count * mru->grp_time); _xfs_mru_cache_clear_reap_list(mru); spin_unlock(&mru->lock); } void xfs_mru_cache_destroy( struct xfs_mru_cache *mru) { if (!mru || !mru->lists) return; xfs_mru_cache_flush(mru); kfree(mru->lists); kfree(mru); } /* * To insert an element, call xfs_mru_cache_insert() with the data store, the * element's key and the client data pointer. This function returns 0 on * success or ENOMEM if memory for the data element couldn't be allocated. */ int xfs_mru_cache_insert( struct xfs_mru_cache *mru, unsigned long key, struct xfs_mru_cache_elem *elem) { int error; ASSERT(mru && mru->lists); if (!mru || !mru->lists) return -EINVAL; if (radix_tree_preload(GFP_KERNEL)) return -ENOMEM; INIT_LIST_HEAD(&elem->list_node); elem->key = key; spin_lock(&mru->lock); error = radix_tree_insert(&mru->store, key, elem); radix_tree_preload_end(); if (!error) _xfs_mru_cache_list_insert(mru, elem); spin_unlock(&mru->lock); return error; } /* * To remove an element without calling the free function, call * xfs_mru_cache_remove() with the data store and the element's key. On success * the client data pointer for the removed element is returned, otherwise this * function will return a NULL pointer. */ struct xfs_mru_cache_elem * xfs_mru_cache_remove( struct xfs_mru_cache *mru, unsigned long key) { struct xfs_mru_cache_elem *elem; ASSERT(mru && mru->lists); if (!mru || !mru->lists) return NULL; spin_lock(&mru->lock); elem = radix_tree_delete(&mru->store, key); if (elem) list_del(&elem->list_node); spin_unlock(&mru->lock); return elem; } /* * To remove and element and call the free function, call xfs_mru_cache_delete() * with the data store and the element's key. */ void xfs_mru_cache_delete( struct xfs_mru_cache *mru, unsigned long key) { struct xfs_mru_cache_elem *elem; elem = xfs_mru_cache_remove(mru, key); if (elem) mru->free_func(mru->data, elem); } /* * To look up an element using its key, call xfs_mru_cache_lookup() with the * data store and the element's key. If found, the element will be moved to the * head of the MRU list to indicate that it's been touched. * * The internal data structures are protected by a spinlock that is STILL HELD * when this function returns. Call xfs_mru_cache_done() to release it. Note * that it is not safe to call any function that might sleep in the interim. * * The implementation could have used reference counting to avoid this * restriction, but since most clients simply want to get, set or test a member * of the returned data structure, the extra per-element memory isn't warranted. * * If the element isn't found, this function returns NULL and the spinlock is * released. xfs_mru_cache_done() should NOT be called when this occurs. * * Because sparse isn't smart enough to know about conditional lock return * status, we need to help it get it right by annotating the path that does * not release the lock. */ struct xfs_mru_cache_elem * xfs_mru_cache_lookup( struct xfs_mru_cache *mru, unsigned long key) { struct xfs_mru_cache_elem *elem; ASSERT(mru && mru->lists); if (!mru || !mru->lists) return NULL; spin_lock(&mru->lock); elem = radix_tree_lookup(&mru->store, key); if (elem) { list_del(&elem->list_node); _xfs_mru_cache_list_insert(mru, elem); __release(mru_lock); /* help sparse not be stupid */ } else spin_unlock(&mru->lock); return elem; } /* * To release the internal data structure spinlock after having performed an * xfs_mru_cache_lookup() or an xfs_mru_cache_peek(), call xfs_mru_cache_done() * with the data store pointer. */ void xfs_mru_cache_done( struct xfs_mru_cache *mru) __releases(mru->lock) { spin_unlock(&mru->lock); } |
2 2 5 1 4 5 5 7 7 6 7 20 6 1 7 7 3 19 16 3 2 12 5 7 1 12 4 1 1 2 7 1 6 6 6 4 3 3 6 6 6 16 2 2 2 10 1 1 1 1 1 4 3 1 3 1 2 1 1 1 1 1 1 2 1 2 1 6 4 2 1 19 1 18 12 1 2 1 1 1 6 4 1 1 1 1742 1743 68 4 3 3 18 18 18 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 | // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) /* isotp.c - ISO 15765-2 CAN transport protocol for protocol family CAN * * This implementation does not provide ISO-TP specific return values to the * userspace. * * - RX path timeout of data reception leads to -ETIMEDOUT * - RX path SN mismatch leads to -EILSEQ * - RX path data reception with wrong padding leads to -EBADMSG * - TX path flowcontrol reception timeout leads to -ECOMM * - TX path flowcontrol reception overflow leads to -EMSGSIZE * - TX path flowcontrol reception with wrong layout/padding leads to -EBADMSG * - when a transfer (tx) is on the run the next write() blocks until it's done * - use CAN_ISOTP_WAIT_TX_DONE flag to block the caller until the PDU is sent * - as we have static buffers the check whether the PDU fits into the buffer * is done at FF reception time (no support for sending 'wait frames') * * Copyright (c) 2020 Volkswagen Group Electronic Research * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of Volkswagen nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * Alternatively, provided that this notice is retained in full, this * software may be distributed under the terms of the GNU General * Public License ("GPL") version 2, in which case the provisions of the * GPL apply INSTEAD OF those given above. * * The provided data structures and external interfaces from this code * are not restricted to be used by modules with a GPL compatible license. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ #include <linux/module.h> #include <linux/init.h> #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/hrtimer.h> #include <linux/wait.h> #include <linux/uio.h> #include <linux/net.h> #include <linux/netdevice.h> #include <linux/socket.h> #include <linux/if_arp.h> #include <linux/skbuff.h> #include <linux/can.h> #include <linux/can/core.h> #include <linux/can/skb.h> #include <linux/can/isotp.h> #include <linux/slab.h> #include <net/sock.h> #include <net/net_namespace.h> MODULE_DESCRIPTION("PF_CAN ISO 15765-2 transport protocol"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Oliver Hartkopp <socketcan@hartkopp.net>"); MODULE_ALIAS("can-proto-6"); #define ISOTP_MIN_NAMELEN CAN_REQUIRED_SIZE(struct sockaddr_can, can_addr.tp) #define SINGLE_MASK(id) (((id) & CAN_EFF_FLAG) ? \ (CAN_EFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG) : \ (CAN_SFF_MASK | CAN_EFF_FLAG | CAN_RTR_FLAG)) /* Since ISO 15765-2:2016 the CAN isotp protocol supports more than 4095 * byte per ISO PDU as the FF_DL can take full 32 bit values (4 Gbyte). * We would need some good concept to handle this between user space and * kernel space. For now set the static buffer to something about 8 kbyte * to be able to test this new functionality. */ #define DEFAULT_MAX_PDU_SIZE 8300 /* maximum PDU size before ISO 15765-2:2016 extension was 4095 */ #define MAX_12BIT_PDU_SIZE 4095 /* limit the isotp pdu size from the optional module parameter to 1MByte */ #define MAX_PDU_SIZE (1025 * 1024U) static unsigned int max_pdu_size __read_mostly = DEFAULT_MAX_PDU_SIZE; module_param(max_pdu_size, uint, 0444); MODULE_PARM_DESC(max_pdu_size, "maximum isotp pdu size (default " __stringify(DEFAULT_MAX_PDU_SIZE) ")"); /* N_PCI type values in bits 7-4 of N_PCI bytes */ #define N_PCI_SF 0x00 /* single frame */ #define N_PCI_FF 0x10 /* first frame */ #define N_PCI_CF 0x20 /* consecutive frame */ #define N_PCI_FC 0x30 /* flow control */ #define N_PCI_SZ 1 /* size of the PCI byte #1 */ #define SF_PCI_SZ4 1 /* size of SingleFrame PCI including 4 bit SF_DL */ #define SF_PCI_SZ8 2 /* size of SingleFrame PCI including 8 bit SF_DL */ #define FF_PCI_SZ12 2 /* size of FirstFrame PCI including 12 bit FF_DL */ #define FF_PCI_SZ32 6 /* size of FirstFrame PCI including 32 bit FF_DL */ #define FC_CONTENT_SZ 3 /* flow control content size in byte (FS/BS/STmin) */ #define ISOTP_CHECK_PADDING (CAN_ISOTP_CHK_PAD_LEN | CAN_ISOTP_CHK_PAD_DATA) #define ISOTP_ALL_BC_FLAGS (CAN_ISOTP_SF_BROADCAST | CAN_ISOTP_CF_BROADCAST) /* Flow Status given in FC frame */ #define ISOTP_FC_CTS 0 /* clear to send */ #define ISOTP_FC_WT 1 /* wait */ #define ISOTP_FC_OVFLW 2 /* overflow */ #define ISOTP_FC_TIMEOUT 1 /* 1 sec */ #define ISOTP_ECHO_TIMEOUT 2 /* 2 secs */ enum { ISOTP_IDLE = 0, ISOTP_WAIT_FIRST_FC, ISOTP_WAIT_FC, ISOTP_WAIT_DATA, ISOTP_SENDING, ISOTP_SHUTDOWN, }; struct tpcon { u8 *buf; unsigned int buflen; unsigned int len; unsigned int idx; u32 state; u8 bs; u8 sn; u8 ll_dl; u8 sbuf[DEFAULT_MAX_PDU_SIZE]; }; struct isotp_sock { struct sock sk; int bound; int ifindex; canid_t txid; canid_t rxid; ktime_t tx_gap; ktime_t lastrxcf_tstamp; struct hrtimer rxtimer, txtimer, txfrtimer; struct can_isotp_options opt; struct can_isotp_fc_options rxfc, txfc; struct can_isotp_ll_options ll; u32 frame_txtime; u32 force_tx_stmin; u32 force_rx_stmin; u32 cfecho; /* consecutive frame echo tag */ struct tpcon rx, tx; struct list_head notifier; wait_queue_head_t wait; spinlock_t rx_lock; /* protect single thread state machine */ }; static LIST_HEAD(isotp_notifier_list); static DEFINE_SPINLOCK(isotp_notifier_lock); static struct isotp_sock *isotp_busy_notifier; static inline struct isotp_sock *isotp_sk(const struct sock *sk) { return (struct isotp_sock *)sk; } static u32 isotp_bc_flags(struct isotp_sock *so) { return so->opt.flags & ISOTP_ALL_BC_FLAGS; } static bool isotp_register_rxid(struct isotp_sock *so) { /* no broadcast modes => register rx_id for FC frame reception */ return (isotp_bc_flags(so) == 0); } static enum hrtimer_restart isotp_rx_timer_handler(struct hrtimer *hrtimer) { struct isotp_sock *so = container_of(hrtimer, struct isotp_sock, rxtimer); struct sock *sk = &so->sk; if (so->rx.state == ISOTP_WAIT_DATA) { /* we did not get new data frames in time */ /* report 'connection timed out' */ sk->sk_err = ETIMEDOUT; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); /* reset rx state */ so->rx.state = ISOTP_IDLE; } return HRTIMER_NORESTART; } static int isotp_send_fc(struct sock *sk, int ae, u8 flowstatus) { struct net_device *dev; struct sk_buff *nskb; struct canfd_frame *ncf; struct isotp_sock *so = isotp_sk(sk); int can_send_ret; nskb = alloc_skb(so->ll.mtu + sizeof(struct can_skb_priv), gfp_any()); if (!nskb) return 1; dev = dev_get_by_index(sock_net(sk), so->ifindex); if (!dev) { kfree_skb(nskb); return 1; } can_skb_reserve(nskb); can_skb_prv(nskb)->ifindex = dev->ifindex; can_skb_prv(nskb)->skbcnt = 0; nskb->dev = dev; can_skb_set_owner(nskb, sk); ncf = (struct canfd_frame *)nskb->data; skb_put_zero(nskb, so->ll.mtu); /* create & send flow control reply */ ncf->can_id = so->txid; if (so->opt.flags & CAN_ISOTP_TX_PADDING) { memset(ncf->data, so->opt.txpad_content, CAN_MAX_DLEN); ncf->len = CAN_MAX_DLEN; } else { ncf->len = ae + FC_CONTENT_SZ; } ncf->data[ae] = N_PCI_FC | flowstatus; ncf->data[ae + 1] = so->rxfc.bs; ncf->data[ae + 2] = so->rxfc.stmin; if (ae) ncf->data[0] = so->opt.ext_address; ncf->flags = so->ll.tx_flags; can_send_ret = can_send(nskb, 1); if (can_send_ret) pr_notice_once("can-isotp: %s: can_send_ret %pe\n", __func__, ERR_PTR(can_send_ret)); dev_put(dev); /* reset blocksize counter */ so->rx.bs = 0; /* reset last CF frame rx timestamp for rx stmin enforcement */ so->lastrxcf_tstamp = ktime_set(0, 0); /* start rx timeout watchdog */ hrtimer_start(&so->rxtimer, ktime_set(ISOTP_FC_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); return 0; } static void isotp_rcv_skb(struct sk_buff *skb, struct sock *sk) { struct sockaddr_can *addr = (struct sockaddr_can *)skb->cb; BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct sockaddr_can)); memset(addr, 0, sizeof(*addr)); addr->can_family = AF_CAN; addr->can_ifindex = skb->dev->ifindex; if (sock_queue_rcv_skb(sk, skb) < 0) kfree_skb(skb); } static u8 padlen(u8 datalen) { static const u8 plen[] = { 8, 8, 8, 8, 8, 8, 8, 8, 8, /* 0 - 8 */ 12, 12, 12, 12, /* 9 - 12 */ 16, 16, 16, 16, /* 13 - 16 */ 20, 20, 20, 20, /* 17 - 20 */ 24, 24, 24, 24, /* 21 - 24 */ 32, 32, 32, 32, 32, 32, 32, 32, /* 25 - 32 */ 48, 48, 48, 48, 48, 48, 48, 48, /* 33 - 40 */ 48, 48, 48, 48, 48, 48, 48, 48 /* 41 - 48 */ }; if (datalen > 48) return 64; return plen[datalen]; } /* check for length optimization and return 1/true when the check fails */ static int check_optimized(struct canfd_frame *cf, int start_index) { /* for CAN_DL <= 8 the start_index is equal to the CAN_DL as the * padding would start at this point. E.g. if the padding would * start at cf.data[7] cf->len has to be 7 to be optimal. * Note: The data[] index starts with zero. */ if (cf->len <= CAN_MAX_DLEN) return (cf->len != start_index); /* This relation is also valid in the non-linear DLC range, where * we need to take care of the minimal next possible CAN_DL. * The correct check would be (padlen(cf->len) != padlen(start_index)). * But as cf->len can only take discrete values from 12, .., 64 at this * point the padlen(cf->len) is always equal to cf->len. */ return (cf->len != padlen(start_index)); } /* check padding and return 1/true when the check fails */ static int check_pad(struct isotp_sock *so, struct canfd_frame *cf, int start_index, u8 content) { int i; /* no RX_PADDING value => check length of optimized frame length */ if (!(so->opt.flags & CAN_ISOTP_RX_PADDING)) { if (so->opt.flags & CAN_ISOTP_CHK_PAD_LEN) return check_optimized(cf, start_index); /* no valid test against empty value => ignore frame */ return 1; } /* check datalength of correctly padded CAN frame */ if ((so->opt.flags & CAN_ISOTP_CHK_PAD_LEN) && cf->len != padlen(cf->len)) return 1; /* check padding content */ if (so->opt.flags & CAN_ISOTP_CHK_PAD_DATA) { for (i = start_index; i < cf->len; i++) if (cf->data[i] != content) return 1; } return 0; } static void isotp_send_cframe(struct isotp_sock *so); static int isotp_rcv_fc(struct isotp_sock *so, struct canfd_frame *cf, int ae) { struct sock *sk = &so->sk; if (so->tx.state != ISOTP_WAIT_FC && so->tx.state != ISOTP_WAIT_FIRST_FC) return 0; hrtimer_cancel(&so->txtimer); if ((cf->len < ae + FC_CONTENT_SZ) || ((so->opt.flags & ISOTP_CHECK_PADDING) && check_pad(so, cf, ae + FC_CONTENT_SZ, so->opt.rxpad_content))) { /* malformed PDU - report 'not a data message' */ sk->sk_err = EBADMSG; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); so->tx.state = ISOTP_IDLE; wake_up_interruptible(&so->wait); return 1; } /* get static/dynamic communication params from first/every FC frame */ if (so->tx.state == ISOTP_WAIT_FIRST_FC || so->opt.flags & CAN_ISOTP_DYN_FC_PARMS) { so->txfc.bs = cf->data[ae + 1]; so->txfc.stmin = cf->data[ae + 2]; /* fix wrong STmin values according spec */ if (so->txfc.stmin > 0x7F && (so->txfc.stmin < 0xF1 || so->txfc.stmin > 0xF9)) so->txfc.stmin = 0x7F; so->tx_gap = ktime_set(0, 0); /* add transmission time for CAN frame N_As */ so->tx_gap = ktime_add_ns(so->tx_gap, so->frame_txtime); /* add waiting time for consecutive frames N_Cs */ if (so->opt.flags & CAN_ISOTP_FORCE_TXSTMIN) so->tx_gap = ktime_add_ns(so->tx_gap, so->force_tx_stmin); else if (so->txfc.stmin < 0x80) so->tx_gap = ktime_add_ns(so->tx_gap, so->txfc.stmin * 1000000); else so->tx_gap = ktime_add_ns(so->tx_gap, (so->txfc.stmin - 0xF0) * 100000); so->tx.state = ISOTP_WAIT_FC; } switch (cf->data[ae] & 0x0F) { case ISOTP_FC_CTS: so->tx.bs = 0; so->tx.state = ISOTP_SENDING; /* send CF frame and enable echo timeout handling */ hrtimer_start(&so->txtimer, ktime_set(ISOTP_ECHO_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); isotp_send_cframe(so); break; case ISOTP_FC_WT: /* start timer to wait for next FC frame */ hrtimer_start(&so->txtimer, ktime_set(ISOTP_FC_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); break; case ISOTP_FC_OVFLW: /* overflow on receiver side - report 'message too long' */ sk->sk_err = EMSGSIZE; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); fallthrough; default: /* stop this tx job */ so->tx.state = ISOTP_IDLE; wake_up_interruptible(&so->wait); } return 0; } static int isotp_rcv_sf(struct sock *sk, struct canfd_frame *cf, int pcilen, struct sk_buff *skb, int len) { struct isotp_sock *so = isotp_sk(sk); struct sk_buff *nskb; hrtimer_cancel(&so->rxtimer); so->rx.state = ISOTP_IDLE; if (!len || len > cf->len - pcilen) return 1; if ((so->opt.flags & ISOTP_CHECK_PADDING) && check_pad(so, cf, pcilen + len, so->opt.rxpad_content)) { /* malformed PDU - report 'not a data message' */ sk->sk_err = EBADMSG; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); return 1; } nskb = alloc_skb(len, gfp_any()); if (!nskb) return 1; memcpy(skb_put(nskb, len), &cf->data[pcilen], len); nskb->tstamp = skb->tstamp; nskb->dev = skb->dev; isotp_rcv_skb(nskb, sk); return 0; } static int isotp_rcv_ff(struct sock *sk, struct canfd_frame *cf, int ae) { struct isotp_sock *so = isotp_sk(sk); int i; int off; int ff_pci_sz; hrtimer_cancel(&so->rxtimer); so->rx.state = ISOTP_IDLE; /* get the used sender LL_DL from the (first) CAN frame data length */ so->rx.ll_dl = padlen(cf->len); /* the first frame has to use the entire frame up to LL_DL length */ if (cf->len != so->rx.ll_dl) return 1; /* get the FF_DL */ so->rx.len = (cf->data[ae] & 0x0F) << 8; so->rx.len += cf->data[ae + 1]; /* Check for FF_DL escape sequence supporting 32 bit PDU length */ if (so->rx.len) { ff_pci_sz = FF_PCI_SZ12; } else { /* FF_DL = 0 => get real length from next 4 bytes */ so->rx.len = cf->data[ae + 2] << 24; so->rx.len += cf->data[ae + 3] << 16; so->rx.len += cf->data[ae + 4] << 8; so->rx.len += cf->data[ae + 5]; ff_pci_sz = FF_PCI_SZ32; } /* take care of a potential SF_DL ESC offset for TX_DL > 8 */ off = (so->rx.ll_dl > CAN_MAX_DLEN) ? 1 : 0; if (so->rx.len + ae + off + ff_pci_sz < so->rx.ll_dl) return 1; /* PDU size > default => try max_pdu_size */ if (so->rx.len > so->rx.buflen && so->rx.buflen < max_pdu_size) { u8 *newbuf = kmalloc(max_pdu_size, GFP_ATOMIC); if (newbuf) { so->rx.buf = newbuf; so->rx.buflen = max_pdu_size; } } if (so->rx.len > so->rx.buflen) { /* send FC frame with overflow status */ isotp_send_fc(sk, ae, ISOTP_FC_OVFLW); return 1; } /* copy the first received data bytes */ so->rx.idx = 0; for (i = ae + ff_pci_sz; i < so->rx.ll_dl; i++) so->rx.buf[so->rx.idx++] = cf->data[i]; /* initial setup for this pdu reception */ so->rx.sn = 1; so->rx.state = ISOTP_WAIT_DATA; /* no creation of flow control frames */ if (so->opt.flags & CAN_ISOTP_LISTEN_MODE) return 0; /* send our first FC frame */ isotp_send_fc(sk, ae, ISOTP_FC_CTS); return 0; } static int isotp_rcv_cf(struct sock *sk, struct canfd_frame *cf, int ae, struct sk_buff *skb) { struct isotp_sock *so = isotp_sk(sk); struct sk_buff *nskb; int i; if (so->rx.state != ISOTP_WAIT_DATA) return 0; /* drop if timestamp gap is less than force_rx_stmin nano secs */ if (so->opt.flags & CAN_ISOTP_FORCE_RXSTMIN) { if (ktime_to_ns(ktime_sub(skb->tstamp, so->lastrxcf_tstamp)) < so->force_rx_stmin) return 0; so->lastrxcf_tstamp = skb->tstamp; } hrtimer_cancel(&so->rxtimer); /* CFs are never longer than the FF */ if (cf->len > so->rx.ll_dl) return 1; /* CFs have usually the LL_DL length */ if (cf->len < so->rx.ll_dl) { /* this is only allowed for the last CF */ if (so->rx.len - so->rx.idx > so->rx.ll_dl - ae - N_PCI_SZ) return 1; } if ((cf->data[ae] & 0x0F) != so->rx.sn) { /* wrong sn detected - report 'illegal byte sequence' */ sk->sk_err = EILSEQ; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); /* reset rx state */ so->rx.state = ISOTP_IDLE; return 1; } so->rx.sn++; so->rx.sn %= 16; for (i = ae + N_PCI_SZ; i < cf->len; i++) { so->rx.buf[so->rx.idx++] = cf->data[i]; if (so->rx.idx >= so->rx.len) break; } if (so->rx.idx >= so->rx.len) { /* we are done */ so->rx.state = ISOTP_IDLE; if ((so->opt.flags & ISOTP_CHECK_PADDING) && check_pad(so, cf, i + 1, so->opt.rxpad_content)) { /* malformed PDU - report 'not a data message' */ sk->sk_err = EBADMSG; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); return 1; } nskb = alloc_skb(so->rx.len, gfp_any()); if (!nskb) return 1; memcpy(skb_put(nskb, so->rx.len), so->rx.buf, so->rx.len); nskb->tstamp = skb->tstamp; nskb->dev = skb->dev; isotp_rcv_skb(nskb, sk); return 0; } /* perform blocksize handling, if enabled */ if (!so->rxfc.bs || ++so->rx.bs < so->rxfc.bs) { /* start rx timeout watchdog */ hrtimer_start(&so->rxtimer, ktime_set(ISOTP_FC_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); return 0; } /* no creation of flow control frames */ if (so->opt.flags & CAN_ISOTP_LISTEN_MODE) return 0; /* we reached the specified blocksize so->rxfc.bs */ isotp_send_fc(sk, ae, ISOTP_FC_CTS); return 0; } static void isotp_rcv(struct sk_buff *skb, void *data) { struct sock *sk = (struct sock *)data; struct isotp_sock *so = isotp_sk(sk); struct canfd_frame *cf; int ae = (so->opt.flags & CAN_ISOTP_EXTEND_ADDR) ? 1 : 0; u8 n_pci_type, sf_dl; /* Strictly receive only frames with the configured MTU size * => clear separation of CAN2.0 / CAN FD transport channels */ if (skb->len != so->ll.mtu) return; cf = (struct canfd_frame *)skb->data; /* if enabled: check reception of my configured extended address */ if (ae && cf->data[0] != so->opt.rx_ext_address) return; n_pci_type = cf->data[ae] & 0xF0; /* Make sure the state changes and data structures stay consistent at * CAN frame reception time. This locking is not needed in real world * use cases but the inconsistency can be triggered with syzkaller. */ spin_lock(&so->rx_lock); if (so->opt.flags & CAN_ISOTP_HALF_DUPLEX) { /* check rx/tx path half duplex expectations */ if ((so->tx.state != ISOTP_IDLE && n_pci_type != N_PCI_FC) || (so->rx.state != ISOTP_IDLE && n_pci_type == N_PCI_FC)) goto out_unlock; } switch (n_pci_type) { case N_PCI_FC: /* tx path: flow control frame containing the FC parameters */ isotp_rcv_fc(so, cf, ae); break; case N_PCI_SF: /* rx path: single frame * * As we do not have a rx.ll_dl configuration, we can only test * if the CAN frames payload length matches the LL_DL == 8 * requirements - no matter if it's CAN 2.0 or CAN FD */ /* get the SF_DL from the N_PCI byte */ sf_dl = cf->data[ae] & 0x0F; if (cf->len <= CAN_MAX_DLEN) { isotp_rcv_sf(sk, cf, SF_PCI_SZ4 + ae, skb, sf_dl); } else { if (can_is_canfd_skb(skb)) { /* We have a CAN FD frame and CAN_DL is greater than 8: * Only frames with the SF_DL == 0 ESC value are valid. * * If so take care of the increased SF PCI size * (SF_PCI_SZ8) to point to the message content behind * the extended SF PCI info and get the real SF_DL * length value from the formerly first data byte. */ if (sf_dl == 0) isotp_rcv_sf(sk, cf, SF_PCI_SZ8 + ae, skb, cf->data[SF_PCI_SZ4 + ae]); } } break; case N_PCI_FF: /* rx path: first frame */ isotp_rcv_ff(sk, cf, ae); break; case N_PCI_CF: /* rx path: consecutive frame */ isotp_rcv_cf(sk, cf, ae, skb); break; } out_unlock: spin_unlock(&so->rx_lock); } static void isotp_fill_dataframe(struct canfd_frame *cf, struct isotp_sock *so, int ae, int off) { int pcilen = N_PCI_SZ + ae + off; int space = so->tx.ll_dl - pcilen; int num = min_t(int, so->tx.len - so->tx.idx, space); int i; cf->can_id = so->txid; cf->len = num + pcilen; if (num < space) { if (so->opt.flags & CAN_ISOTP_TX_PADDING) { /* user requested padding */ cf->len = padlen(cf->len); memset(cf->data, so->opt.txpad_content, cf->len); } else if (cf->len > CAN_MAX_DLEN) { /* mandatory padding for CAN FD frames */ cf->len = padlen(cf->len); memset(cf->data, CAN_ISOTP_DEFAULT_PAD_CONTENT, cf->len); } } for (i = 0; i < num; i++) cf->data[pcilen + i] = so->tx.buf[so->tx.idx++]; if (ae) cf->data[0] = so->opt.ext_address; } static void isotp_send_cframe(struct isotp_sock *so) { struct sock *sk = &so->sk; struct sk_buff *skb; struct net_device *dev; struct canfd_frame *cf; int can_send_ret; int ae = (so->opt.flags & CAN_ISOTP_EXTEND_ADDR) ? 1 : 0; dev = dev_get_by_index(sock_net(sk), so->ifindex); if (!dev) return; skb = alloc_skb(so->ll.mtu + sizeof(struct can_skb_priv), GFP_ATOMIC); if (!skb) { dev_put(dev); return; } can_skb_reserve(skb); can_skb_prv(skb)->ifindex = dev->ifindex; can_skb_prv(skb)->skbcnt = 0; cf = (struct canfd_frame *)skb->data; skb_put_zero(skb, so->ll.mtu); /* create consecutive frame */ isotp_fill_dataframe(cf, so, ae, 0); /* place consecutive frame N_PCI in appropriate index */ cf->data[ae] = N_PCI_CF | so->tx.sn++; so->tx.sn %= 16; so->tx.bs++; cf->flags = so->ll.tx_flags; skb->dev = dev; can_skb_set_owner(skb, sk); /* cfecho should have been zero'ed by init/isotp_rcv_echo() */ if (so->cfecho) pr_notice_once("can-isotp: cfecho is %08X != 0\n", so->cfecho); /* set consecutive frame echo tag */ so->cfecho = *(u32 *)cf->data; /* send frame with local echo enabled */ can_send_ret = can_send(skb, 1); if (can_send_ret) { pr_notice_once("can-isotp: %s: can_send_ret %pe\n", __func__, ERR_PTR(can_send_ret)); if (can_send_ret == -ENOBUFS) pr_notice_once("can-isotp: tx queue is full\n"); } dev_put(dev); } static void isotp_create_fframe(struct canfd_frame *cf, struct isotp_sock *so, int ae) { int i; int ff_pci_sz; cf->can_id = so->txid; cf->len = so->tx.ll_dl; if (ae) cf->data[0] = so->opt.ext_address; /* create N_PCI bytes with 12/32 bit FF_DL data length */ if (so->tx.len > MAX_12BIT_PDU_SIZE) { /* use 32 bit FF_DL notation */ cf->data[ae] = N_PCI_FF; cf->data[ae + 1] = 0; cf->data[ae + 2] = (u8)(so->tx.len >> 24) & 0xFFU; cf->data[ae + 3] = (u8)(so->tx.len >> 16) & 0xFFU; cf->data[ae + 4] = (u8)(so->tx.len >> 8) & 0xFFU; cf->data[ae + 5] = (u8)so->tx.len & 0xFFU; ff_pci_sz = FF_PCI_SZ32; } else { /* use 12 bit FF_DL notation */ cf->data[ae] = (u8)(so->tx.len >> 8) | N_PCI_FF; cf->data[ae + 1] = (u8)so->tx.len & 0xFFU; ff_pci_sz = FF_PCI_SZ12; } /* add first data bytes depending on ae */ for (i = ae + ff_pci_sz; i < so->tx.ll_dl; i++) cf->data[i] = so->tx.buf[so->tx.idx++]; so->tx.sn = 1; } static void isotp_rcv_echo(struct sk_buff *skb, void *data) { struct sock *sk = (struct sock *)data; struct isotp_sock *so = isotp_sk(sk); struct canfd_frame *cf = (struct canfd_frame *)skb->data; /* only handle my own local echo CF/SF skb's (no FF!) */ if (skb->sk != sk || so->cfecho != *(u32 *)cf->data) return; /* cancel local echo timeout */ hrtimer_cancel(&so->txtimer); /* local echo skb with consecutive frame has been consumed */ so->cfecho = 0; if (so->tx.idx >= so->tx.len) { /* we are done */ so->tx.state = ISOTP_IDLE; wake_up_interruptible(&so->wait); return; } if (so->txfc.bs && so->tx.bs >= so->txfc.bs) { /* stop and wait for FC with timeout */ so->tx.state = ISOTP_WAIT_FC; hrtimer_start(&so->txtimer, ktime_set(ISOTP_FC_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); return; } /* no gap between data frames needed => use burst mode */ if (!so->tx_gap) { /* enable echo timeout handling */ hrtimer_start(&so->txtimer, ktime_set(ISOTP_ECHO_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); isotp_send_cframe(so); return; } /* start timer to send next consecutive frame with correct delay */ hrtimer_start(&so->txfrtimer, so->tx_gap, HRTIMER_MODE_REL_SOFT); } static enum hrtimer_restart isotp_tx_timer_handler(struct hrtimer *hrtimer) { struct isotp_sock *so = container_of(hrtimer, struct isotp_sock, txtimer); struct sock *sk = &so->sk; /* don't handle timeouts in IDLE or SHUTDOWN state */ if (so->tx.state == ISOTP_IDLE || so->tx.state == ISOTP_SHUTDOWN) return HRTIMER_NORESTART; /* we did not get any flow control or echo frame in time */ /* report 'communication error on send' */ sk->sk_err = ECOMM; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); /* reset tx state */ so->tx.state = ISOTP_IDLE; wake_up_interruptible(&so->wait); return HRTIMER_NORESTART; } static enum hrtimer_restart isotp_txfr_timer_handler(struct hrtimer *hrtimer) { struct isotp_sock *so = container_of(hrtimer, struct isotp_sock, txfrtimer); /* start echo timeout handling and cover below protocol error */ hrtimer_start(&so->txtimer, ktime_set(ISOTP_ECHO_TIMEOUT, 0), HRTIMER_MODE_REL_SOFT); /* cfecho should be consumed by isotp_rcv_echo() here */ if (so->tx.state == ISOTP_SENDING && !so->cfecho) isotp_send_cframe(so); return HRTIMER_NORESTART; } static int isotp_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); struct sk_buff *skb; struct net_device *dev; struct canfd_frame *cf; int ae = (so->opt.flags & CAN_ISOTP_EXTEND_ADDR) ? 1 : 0; int wait_tx_done = (so->opt.flags & CAN_ISOTP_WAIT_TX_DONE) ? 1 : 0; s64 hrtimer_sec = ISOTP_ECHO_TIMEOUT; int off; int err; if (!so->bound || so->tx.state == ISOTP_SHUTDOWN) return -EADDRNOTAVAIL; while (cmpxchg(&so->tx.state, ISOTP_IDLE, ISOTP_SENDING) != ISOTP_IDLE) { /* we do not support multiple buffers - for now */ if (msg->msg_flags & MSG_DONTWAIT) return -EAGAIN; if (so->tx.state == ISOTP_SHUTDOWN) return -EADDRNOTAVAIL; /* wait for complete transmission of current pdu */ err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE); if (err) goto err_event_drop; } /* PDU size > default => try max_pdu_size */ if (size > so->tx.buflen && so->tx.buflen < max_pdu_size) { u8 *newbuf = kmalloc(max_pdu_size, GFP_KERNEL); if (newbuf) { so->tx.buf = newbuf; so->tx.buflen = max_pdu_size; } } if (!size || size > so->tx.buflen) { err = -EINVAL; goto err_out_drop; } /* take care of a potential SF_DL ESC offset for TX_DL > 8 */ off = (so->tx.ll_dl > CAN_MAX_DLEN) ? 1 : 0; /* does the given data fit into a single frame for SF_BROADCAST? */ if ((isotp_bc_flags(so) == CAN_ISOTP_SF_BROADCAST) && (size > so->tx.ll_dl - SF_PCI_SZ4 - ae - off)) { err = -EINVAL; goto err_out_drop; } err = memcpy_from_msg(so->tx.buf, msg, size); if (err < 0) goto err_out_drop; dev = dev_get_by_index(sock_net(sk), so->ifindex); if (!dev) { err = -ENXIO; goto err_out_drop; } skb = sock_alloc_send_skb(sk, so->ll.mtu + sizeof(struct can_skb_priv), msg->msg_flags & MSG_DONTWAIT, &err); if (!skb) { dev_put(dev); goto err_out_drop; } can_skb_reserve(skb); can_skb_prv(skb)->ifindex = dev->ifindex; can_skb_prv(skb)->skbcnt = 0; so->tx.len = size; so->tx.idx = 0; cf = (struct canfd_frame *)skb->data; skb_put_zero(skb, so->ll.mtu); /* cfecho should have been zero'ed by init / former isotp_rcv_echo() */ if (so->cfecho) pr_notice_once("can-isotp: uninit cfecho %08X\n", so->cfecho); /* check for single frame transmission depending on TX_DL */ if (size <= so->tx.ll_dl - SF_PCI_SZ4 - ae - off) { /* The message size generally fits into a SingleFrame - good. * * SF_DL ESC offset optimization: * * When TX_DL is greater 8 but the message would still fit * into a 8 byte CAN frame, we can omit the offset. * This prevents a protocol caused length extension from * CAN_DL = 8 to CAN_DL = 12 due to the SF_SL ESC handling. */ if (size <= CAN_MAX_DLEN - SF_PCI_SZ4 - ae) off = 0; isotp_fill_dataframe(cf, so, ae, off); /* place single frame N_PCI w/o length in appropriate index */ cf->data[ae] = N_PCI_SF; /* place SF_DL size value depending on the SF_DL ESC offset */ if (off) cf->data[SF_PCI_SZ4 + ae] = size; else cf->data[ae] |= size; /* set CF echo tag for isotp_rcv_echo() (SF-mode) */ so->cfecho = *(u32 *)cf->data; } else { /* send first frame */ isotp_create_fframe(cf, so, ae); if (isotp_bc_flags(so) == CAN_ISOTP_CF_BROADCAST) { /* set timer for FC-less operation (STmin = 0) */ if (so->opt.flags & CAN_ISOTP_FORCE_TXSTMIN) so->tx_gap = ktime_set(0, so->force_tx_stmin); else so->tx_gap = ktime_set(0, so->frame_txtime); /* disable wait for FCs due to activated block size */ so->txfc.bs = 0; /* set CF echo tag for isotp_rcv_echo() (CF-mode) */ so->cfecho = *(u32 *)cf->data; } else { /* standard flow control check */ so->tx.state = ISOTP_WAIT_FIRST_FC; /* start timeout for FC */ hrtimer_sec = ISOTP_FC_TIMEOUT; /* no CF echo tag for isotp_rcv_echo() (FF-mode) */ so->cfecho = 0; } } hrtimer_start(&so->txtimer, ktime_set(hrtimer_sec, 0), HRTIMER_MODE_REL_SOFT); /* send the first or only CAN frame */ cf->flags = so->ll.tx_flags; skb->dev = dev; skb->sk = sk; err = can_send(skb, 1); dev_put(dev); if (err) { pr_notice_once("can-isotp: %s: can_send_ret %pe\n", __func__, ERR_PTR(err)); /* no transmission -> no timeout monitoring */ hrtimer_cancel(&so->txtimer); /* reset consecutive frame echo tag */ so->cfecho = 0; goto err_out_drop; } if (wait_tx_done) { /* wait for complete transmission of current pdu */ err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE); if (err) goto err_event_drop; err = sock_error(sk); if (err) return err; } return size; err_event_drop: /* got signal: force tx state machine to be idle */ so->tx.state = ISOTP_IDLE; hrtimer_cancel(&so->txfrtimer); hrtimer_cancel(&so->txtimer); err_out_drop: /* drop this PDU and unlock a potential wait queue */ so->tx.state = ISOTP_IDLE; wake_up_interruptible(&so->wait); return err; } static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; struct sk_buff *skb; struct isotp_sock *so = isotp_sk(sk); int ret = 0; if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK | MSG_CMSG_COMPAT)) return -EINVAL; if (!so->bound) return -EADDRNOTAVAIL; skb = skb_recv_datagram(sk, flags, &ret); if (!skb) return ret; if (size < skb->len) msg->msg_flags |= MSG_TRUNC; else size = skb->len; ret = memcpy_to_msg(msg, skb->data, size); if (ret < 0) goto out_err; sock_recv_cmsgs(msg, sk, skb); if (msg->msg_name) { __sockaddr_check_size(ISOTP_MIN_NAMELEN); msg->msg_namelen = ISOTP_MIN_NAMELEN; memcpy(msg->msg_name, skb->cb, msg->msg_namelen); } /* set length of return value */ ret = (flags & MSG_TRUNC) ? skb->len : size; out_err: skb_free_datagram(sk, skb); return ret; } static int isotp_release(struct socket *sock) { struct sock *sk = sock->sk; struct isotp_sock *so; struct net *net; if (!sk) return 0; so = isotp_sk(sk); net = sock_net(sk); /* wait for complete transmission of current pdu */ while (wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE) == 0 && cmpxchg(&so->tx.state, ISOTP_IDLE, ISOTP_SHUTDOWN) != ISOTP_IDLE) ; /* force state machines to be idle also when a signal occurred */ so->tx.state = ISOTP_SHUTDOWN; so->rx.state = ISOTP_IDLE; spin_lock(&isotp_notifier_lock); while (isotp_busy_notifier == so) { spin_unlock(&isotp_notifier_lock); schedule_timeout_uninterruptible(1); spin_lock(&isotp_notifier_lock); } list_del(&so->notifier); spin_unlock(&isotp_notifier_lock); lock_sock(sk); /* remove current filters & unregister */ if (so->bound) { if (so->ifindex) { struct net_device *dev; dev = dev_get_by_index(net, so->ifindex); if (dev) { if (isotp_register_rxid(so)) can_rx_unregister(net, dev, so->rxid, SINGLE_MASK(so->rxid), isotp_rcv, sk); can_rx_unregister(net, dev, so->txid, SINGLE_MASK(so->txid), isotp_rcv_echo, sk); dev_put(dev); synchronize_rcu(); } } } hrtimer_cancel(&so->txfrtimer); hrtimer_cancel(&so->txtimer); hrtimer_cancel(&so->rxtimer); so->ifindex = 0; so->bound = 0; if (so->rx.buf != so->rx.sbuf) kfree(so->rx.buf); if (so->tx.buf != so->tx.sbuf) kfree(so->tx.buf); sock_orphan(sk); sock->sk = NULL; release_sock(sk); sock_put(sk); return 0; } static int isotp_bind(struct socket *sock, struct sockaddr *uaddr, int len) { struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); struct net *net = sock_net(sk); int ifindex; struct net_device *dev; canid_t tx_id = addr->can_addr.tp.tx_id; canid_t rx_id = addr->can_addr.tp.rx_id; int err = 0; int notify_enetdown = 0; if (len < ISOTP_MIN_NAMELEN) return -EINVAL; if (addr->can_family != AF_CAN) return -EINVAL; /* sanitize tx CAN identifier */ if (tx_id & CAN_EFF_FLAG) tx_id &= (CAN_EFF_FLAG | CAN_EFF_MASK); else tx_id &= CAN_SFF_MASK; /* give feedback on wrong CAN-ID value */ if (tx_id != addr->can_addr.tp.tx_id) return -EINVAL; /* sanitize rx CAN identifier (if needed) */ if (isotp_register_rxid(so)) { if (rx_id & CAN_EFF_FLAG) rx_id &= (CAN_EFF_FLAG | CAN_EFF_MASK); else rx_id &= CAN_SFF_MASK; /* give feedback on wrong CAN-ID value */ if (rx_id != addr->can_addr.tp.rx_id) return -EINVAL; } if (!addr->can_ifindex) return -ENODEV; lock_sock(sk); if (so->bound) { err = -EINVAL; goto out; } /* ensure different CAN IDs when the rx_id is to be registered */ if (isotp_register_rxid(so) && rx_id == tx_id) { err = -EADDRNOTAVAIL; goto out; } dev = dev_get_by_index(net, addr->can_ifindex); if (!dev) { err = -ENODEV; goto out; } if (dev->type != ARPHRD_CAN) { dev_put(dev); err = -ENODEV; goto out; } if (dev->mtu < so->ll.mtu) { dev_put(dev); err = -EINVAL; goto out; } if (!(dev->flags & IFF_UP)) notify_enetdown = 1; ifindex = dev->ifindex; if (isotp_register_rxid(so)) can_rx_register(net, dev, rx_id, SINGLE_MASK(rx_id), isotp_rcv, sk, "isotp", sk); /* no consecutive frame echo skb in flight */ so->cfecho = 0; /* register for echo skb's */ can_rx_register(net, dev, tx_id, SINGLE_MASK(tx_id), isotp_rcv_echo, sk, "isotpe", sk); dev_put(dev); /* switch to new settings */ so->ifindex = ifindex; so->rxid = rx_id; so->txid = tx_id; so->bound = 1; out: release_sock(sk); if (notify_enetdown) { sk->sk_err = ENETDOWN; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); } return err; } static int isotp_getname(struct socket *sock, struct sockaddr *uaddr, int peer) { struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); if (peer) return -EOPNOTSUPP; memset(addr, 0, ISOTP_MIN_NAMELEN); addr->can_family = AF_CAN; addr->can_ifindex = so->ifindex; addr->can_addr.tp.rx_id = so->rxid; addr->can_addr.tp.tx_id = so->txid; return ISOTP_MIN_NAMELEN; } static int isotp_setsockopt_locked(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); int ret = 0; if (so->bound) return -EISCONN; switch (optname) { case CAN_ISOTP_OPTS: if (optlen != sizeof(struct can_isotp_options)) return -EINVAL; if (copy_from_sockptr(&so->opt, optval, optlen)) return -EFAULT; /* no separate rx_ext_address is given => use ext_address */ if (!(so->opt.flags & CAN_ISOTP_RX_EXT_ADDR)) so->opt.rx_ext_address = so->opt.ext_address; /* these broadcast flags are not allowed together */ if (isotp_bc_flags(so) == ISOTP_ALL_BC_FLAGS) { /* CAN_ISOTP_SF_BROADCAST is prioritized */ so->opt.flags &= ~CAN_ISOTP_CF_BROADCAST; /* give user feedback on wrong config attempt */ ret = -EINVAL; } /* check for frame_txtime changes (0 => no changes) */ if (so->opt.frame_txtime) { if (so->opt.frame_txtime == CAN_ISOTP_FRAME_TXTIME_ZERO) so->frame_txtime = 0; else so->frame_txtime = so->opt.frame_txtime; } break; case CAN_ISOTP_RECV_FC: if (optlen != sizeof(struct can_isotp_fc_options)) return -EINVAL; if (copy_from_sockptr(&so->rxfc, optval, optlen)) return -EFAULT; break; case CAN_ISOTP_TX_STMIN: if (optlen != sizeof(u32)) return -EINVAL; if (copy_from_sockptr(&so->force_tx_stmin, optval, optlen)) return -EFAULT; break; case CAN_ISOTP_RX_STMIN: if (optlen != sizeof(u32)) return -EINVAL; if (copy_from_sockptr(&so->force_rx_stmin, optval, optlen)) return -EFAULT; break; case CAN_ISOTP_LL_OPTS: if (optlen == sizeof(struct can_isotp_ll_options)) { struct can_isotp_ll_options ll; if (copy_from_sockptr(&ll, optval, optlen)) return -EFAULT; /* check for correct ISO 11898-1 DLC data length */ if (ll.tx_dl != padlen(ll.tx_dl)) return -EINVAL; if (ll.mtu != CAN_MTU && ll.mtu != CANFD_MTU) return -EINVAL; if (ll.mtu == CAN_MTU && (ll.tx_dl > CAN_MAX_DLEN || ll.tx_flags != 0)) return -EINVAL; memcpy(&so->ll, &ll, sizeof(ll)); /* set ll_dl for tx path to similar place as for rx */ so->tx.ll_dl = ll.tx_dl; } else { return -EINVAL; } break; default: ret = -ENOPROTOOPT; } return ret; } static int isotp_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; int ret; if (level != SOL_CAN_ISOTP) return -EINVAL; lock_sock(sk); ret = isotp_setsockopt_locked(sock, level, optname, optval, optlen); release_sock(sk); return ret; } static int isotp_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); int len; void *val; if (level != SOL_CAN_ISOTP) return -EINVAL; if (get_user(len, optlen)) return -EFAULT; if (len < 0) return -EINVAL; switch (optname) { case CAN_ISOTP_OPTS: len = min_t(int, len, sizeof(struct can_isotp_options)); val = &so->opt; break; case CAN_ISOTP_RECV_FC: len = min_t(int, len, sizeof(struct can_isotp_fc_options)); val = &so->rxfc; break; case CAN_ISOTP_TX_STMIN: len = min_t(int, len, sizeof(u32)); val = &so->force_tx_stmin; break; case CAN_ISOTP_RX_STMIN: len = min_t(int, len, sizeof(u32)); val = &so->force_rx_stmin; break; case CAN_ISOTP_LL_OPTS: len = min_t(int, len, sizeof(struct can_isotp_ll_options)); val = &so->ll; break; default: return -ENOPROTOOPT; } if (put_user(len, optlen)) return -EFAULT; if (copy_to_user(optval, val, len)) return -EFAULT; return 0; } static void isotp_notify(struct isotp_sock *so, unsigned long msg, struct net_device *dev) { struct sock *sk = &so->sk; if (!net_eq(dev_net(dev), sock_net(sk))) return; if (so->ifindex != dev->ifindex) return; switch (msg) { case NETDEV_UNREGISTER: lock_sock(sk); /* remove current filters & unregister */ if (so->bound) { if (isotp_register_rxid(so)) can_rx_unregister(dev_net(dev), dev, so->rxid, SINGLE_MASK(so->rxid), isotp_rcv, sk); can_rx_unregister(dev_net(dev), dev, so->txid, SINGLE_MASK(so->txid), isotp_rcv_echo, sk); } so->ifindex = 0; so->bound = 0; release_sock(sk); sk->sk_err = ENODEV; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); break; case NETDEV_DOWN: sk->sk_err = ENETDOWN; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); break; } } static int isotp_notifier(struct notifier_block *nb, unsigned long msg, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); if (dev->type != ARPHRD_CAN) return NOTIFY_DONE; if (msg != NETDEV_UNREGISTER && msg != NETDEV_DOWN) return NOTIFY_DONE; if (unlikely(isotp_busy_notifier)) /* Check for reentrant bug. */ return NOTIFY_DONE; spin_lock(&isotp_notifier_lock); list_for_each_entry(isotp_busy_notifier, &isotp_notifier_list, notifier) { spin_unlock(&isotp_notifier_lock); isotp_notify(isotp_busy_notifier, msg, dev); spin_lock(&isotp_notifier_lock); } isotp_busy_notifier = NULL; spin_unlock(&isotp_notifier_lock); return NOTIFY_DONE; } static int isotp_init(struct sock *sk) { struct isotp_sock *so = isotp_sk(sk); so->ifindex = 0; so->bound = 0; so->opt.flags = CAN_ISOTP_DEFAULT_FLAGS; so->opt.ext_address = CAN_ISOTP_DEFAULT_EXT_ADDRESS; so->opt.rx_ext_address = CAN_ISOTP_DEFAULT_EXT_ADDRESS; so->opt.rxpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.txpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; so->frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; so->rxfc.bs = CAN_ISOTP_DEFAULT_RECV_BS; so->rxfc.stmin = CAN_ISOTP_DEFAULT_RECV_STMIN; so->rxfc.wftmax = CAN_ISOTP_DEFAULT_RECV_WFTMAX; so->ll.mtu = CAN_ISOTP_DEFAULT_LL_MTU; so->ll.tx_dl = CAN_ISOTP_DEFAULT_LL_TX_DL; so->ll.tx_flags = CAN_ISOTP_DEFAULT_LL_TX_FLAGS; /* set ll_dl for tx path to similar place as for rx */ so->tx.ll_dl = so->ll.tx_dl; so->rx.state = ISOTP_IDLE; so->tx.state = ISOTP_IDLE; so->rx.buf = so->rx.sbuf; so->tx.buf = so->tx.sbuf; so->rx.buflen = ARRAY_SIZE(so->rx.sbuf); so->tx.buflen = ARRAY_SIZE(so->tx.sbuf); hrtimer_init(&so->rxtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); so->rxtimer.function = isotp_rx_timer_handler; hrtimer_init(&so->txtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); so->txtimer.function = isotp_tx_timer_handler; hrtimer_init(&so->txfrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); so->txfrtimer.function = isotp_txfr_timer_handler; init_waitqueue_head(&so->wait); spin_lock_init(&so->rx_lock); spin_lock(&isotp_notifier_lock); list_add_tail(&so->notifier, &isotp_notifier_list); spin_unlock(&isotp_notifier_lock); return 0; } static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait) { struct sock *sk = sock->sk; struct isotp_sock *so = isotp_sk(sk); __poll_t mask = datagram_poll(file, sock, wait); poll_wait(file, &so->wait, wait); /* Check for false positives due to TX state */ if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE)) mask &= ~(EPOLLOUT | EPOLLWRNORM); return mask; } static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd, unsigned long arg) { /* no ioctls for socket layer -> hand it down to NIC layer */ return -ENOIOCTLCMD; } static const struct proto_ops isotp_ops = { .family = PF_CAN, .release = isotp_release, .bind = isotp_bind, .connect = sock_no_connect, .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = isotp_getname, .poll = isotp_poll, .ioctl = isotp_sock_no_ioctlcmd, .gettstamp = sock_gettstamp, .listen = sock_no_listen, .shutdown = sock_no_shutdown, .setsockopt = isotp_setsockopt, .getsockopt = isotp_getsockopt, .sendmsg = isotp_sendmsg, .recvmsg = isotp_recvmsg, .mmap = sock_no_mmap, }; static struct proto isotp_proto __read_mostly = { .name = "CAN_ISOTP", .owner = THIS_MODULE, .obj_size = sizeof(struct isotp_sock), .init = isotp_init, }; static const struct can_proto isotp_can_proto = { .type = SOCK_DGRAM, .protocol = CAN_ISOTP, .ops = &isotp_ops, .prot = &isotp_proto, }; static struct notifier_block canisotp_notifier = { .notifier_call = isotp_notifier }; static __init int isotp_module_init(void) { int err; max_pdu_size = max_t(unsigned int, max_pdu_size, MAX_12BIT_PDU_SIZE); max_pdu_size = min_t(unsigned int, max_pdu_size, MAX_PDU_SIZE); pr_info("can: isotp protocol (max_pdu_size %d)\n", max_pdu_size); err = can_proto_register(&isotp_can_proto); if (err < 0) pr_err("can: registration of isotp protocol failed %pe\n", ERR_PTR(err)); else register_netdevice_notifier(&canisotp_notifier); return err; } static __exit void isotp_module_exit(void) { can_proto_unregister(&isotp_can_proto); unregister_netdevice_notifier(&canisotp_notifier); } module_init(isotp_module_init); module_exit(isotp_module_exit); |
15 490 117 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | #ifndef IOU_ALLOC_CACHE_H #define IOU_ALLOC_CACHE_H /* * Don't allow the cache to grow beyond this size. */ #define IO_ALLOC_CACHE_MAX 128 static inline bool io_alloc_cache_put(struct io_alloc_cache *cache, void *entry) { if (cache->nr_cached < cache->max_cached) { if (!kasan_mempool_poison_object(entry)) return false; cache->entries[cache->nr_cached++] = entry; return true; } return false; } static inline void *io_alloc_cache_get(struct io_alloc_cache *cache) { if (cache->nr_cached) { void *entry = cache->entries[--cache->nr_cached]; kasan_mempool_unpoison_object(entry, cache->elem_size); return entry; } return NULL; } static inline void *io_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp, void (*init_once)(void *obj)) { if (unlikely(!cache->nr_cached)) { void *obj = kmalloc(cache->elem_size, gfp); if (obj && init_once) init_once(obj); return obj; } return io_alloc_cache_get(cache); } /* returns false if the cache was initialized properly */ static inline bool io_alloc_cache_init(struct io_alloc_cache *cache, unsigned max_nr, size_t size) { cache->entries = kvmalloc_array(max_nr, sizeof(void *), GFP_KERNEL); if (cache->entries) { cache->nr_cached = 0; cache->max_cached = max_nr; cache->elem_size = size; return false; } return true; } static inline void io_alloc_cache_free(struct io_alloc_cache *cache, void (*free)(const void *)) { void *entry; if (!cache->entries) return; while ((entry = io_alloc_cache_get(cache)) != NULL) free(entry); kvfree(cache->entries); cache->entries = NULL; } #endif |
49 98 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | // SPDX-License-Identifier: GPL-2.0-only /* * Unified UUID/GUID definition * * Copyright (C) 2009, 2016 Intel Corp. * Huang Ying <ying.huang@intel.com> */ #include <linux/kernel.h> #include <linux/ctype.h> #include <linux/errno.h> #include <linux/export.h> #include <linux/uuid.h> #include <linux/random.h> const guid_t guid_null; EXPORT_SYMBOL(guid_null); const uuid_t uuid_null; EXPORT_SYMBOL(uuid_null); const u8 guid_index[16] = {3,2,1,0,5,4,7,6,8,9,10,11,12,13,14,15}; const u8 uuid_index[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; /** * generate_random_uuid - generate a random UUID * @uuid: where to put the generated UUID * * Random UUID interface * * Used to create a Boot ID or a filesystem UUID/GUID, but can be * useful for other kernel drivers. */ void generate_random_uuid(unsigned char uuid[16]) { get_random_bytes(uuid, 16); /* Set UUID version to 4 --- truly random generation */ uuid[6] = (uuid[6] & 0x0F) | 0x40; /* Set the UUID variant to DCE */ uuid[8] = (uuid[8] & 0x3F) | 0x80; } EXPORT_SYMBOL(generate_random_uuid); void generate_random_guid(unsigned char guid[16]) { get_random_bytes(guid, 16); /* Set GUID version to 4 --- truly random generation */ guid[7] = (guid[7] & 0x0F) | 0x40; /* Set the GUID variant to DCE */ guid[8] = (guid[8] & 0x3F) | 0x80; } EXPORT_SYMBOL(generate_random_guid); static void __uuid_gen_common(__u8 b[16]) { get_random_bytes(b, 16); /* reversion 0b10 */ b[8] = (b[8] & 0x3F) | 0x80; } void guid_gen(guid_t *lu) { __uuid_gen_common(lu->b); /* version 4 : random generation */ lu->b[7] = (lu->b[7] & 0x0F) | 0x40; } EXPORT_SYMBOL_GPL(guid_gen); void uuid_gen(uuid_t *bu) { __uuid_gen_common(bu->b); /* version 4 : random generation */ bu->b[6] = (bu->b[6] & 0x0F) | 0x40; } EXPORT_SYMBOL_GPL(uuid_gen); /** * uuid_is_valid - checks if a UUID string is valid * @uuid: UUID string to check * * Description: * It checks if the UUID string is following the format: * xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx * * where x is a hex digit. * * Return: true if input is valid UUID string. */ bool uuid_is_valid(const char *uuid) { unsigned int i; for (i = 0; i < UUID_STRING_LEN; i++) { if (i == 8 || i == 13 || i == 18 || i == 23) { if (uuid[i] != '-') return false; } else if (!isxdigit(uuid[i])) { return false; } } return true; } EXPORT_SYMBOL(uuid_is_valid); static int __uuid_parse(const char *uuid, __u8 b[16], const u8 ei[16]) { static const u8 si[16] = {0,2,4,6,9,11,14,16,19,21,24,26,28,30,32,34}; unsigned int i; if (!uuid_is_valid(uuid)) return -EINVAL; for (i = 0; i < 16; i++) { int hi = hex_to_bin(uuid[si[i] + 0]); int lo = hex_to_bin(uuid[si[i] + 1]); b[ei[i]] = (hi << 4) | lo; } return 0; } int guid_parse(const char *uuid, guid_t *u) { return __uuid_parse(uuid, u->b, guid_index); } EXPORT_SYMBOL(guid_parse); int uuid_parse(const char *uuid, uuid_t *u) { return __uuid_parse(uuid, u->b, uuid_index); } EXPORT_SYMBOL(uuid_parse); |
43 43 43 43 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 | // SPDX-License-Identifier: GPL-2.0 /* * f2fs debugging statistics * * Copyright (c) 2012 Samsung Electronics Co., Ltd. * http://www.samsung.com/ * Copyright (c) 2012 Linux Foundation * Copyright (c) 2012 Greg Kroah-Hartman <gregkh@linuxfoundation.org> */ #include <linux/fs.h> #include <linux/backing-dev.h> #include <linux/f2fs_fs.h> #include <linux/blkdev.h> #include <linux/debugfs.h> #include <linux/seq_file.h> #include "f2fs.h" #include "node.h" #include "segment.h" #include "gc.h" static LIST_HEAD(f2fs_stat_list); static DEFINE_RAW_SPINLOCK(f2fs_stat_lock); #ifdef CONFIG_DEBUG_FS static struct dentry *f2fs_debugfs_root; #endif /* * This function calculates BDF of every segments */ void f2fs_update_sit_info(struct f2fs_sb_info *sbi) { struct f2fs_stat_info *si = F2FS_STAT(sbi); unsigned long long blks_per_sec, hblks_per_sec, total_vblocks; unsigned long long bimodal, dist; unsigned int segno, vblocks; int ndirty = 0; bimodal = 0; total_vblocks = 0; blks_per_sec = CAP_BLKS_PER_SEC(sbi); hblks_per_sec = blks_per_sec / 2; for (segno = 0; segno < MAIN_SEGS(sbi); segno += SEGS_PER_SEC(sbi)) { vblocks = get_valid_blocks(sbi, segno, true); dist = abs(vblocks - hblks_per_sec); bimodal += dist * dist; if (vblocks > 0 && vblocks < blks_per_sec) { total_vblocks += vblocks; ndirty++; } } dist = div_u64(MAIN_SECS(sbi) * hblks_per_sec * hblks_per_sec, 100); si->bimodal = div64_u64(bimodal, dist); if (si->dirty_count) si->avg_vblocks = div_u64(total_vblocks, ndirty); else si->avg_vblocks = 0; } #ifdef CONFIG_DEBUG_FS static void update_multidevice_stats(struct f2fs_sb_info *sbi) { struct f2fs_stat_info *si = F2FS_STAT(sbi); struct f2fs_dev_stats *dev_stats = si->dev_stats; int i, j; if (!f2fs_is_multi_device(sbi)) return; memset(dev_stats, 0, sizeof(struct f2fs_dev_stats) * sbi->s_ndevs); for (i = 0; i < sbi->s_ndevs; i++) { unsigned int start_segno, end_segno; block_t start_blk, end_blk; if (i == 0) { start_blk = MAIN_BLKADDR(sbi); end_blk = FDEV(i).end_blk + 1 - SEG0_BLKADDR(sbi); } else { start_blk = FDEV(i).start_blk; end_blk = FDEV(i).end_blk + 1; } start_segno = GET_SEGNO(sbi, start_blk); end_segno = GET_SEGNO(sbi, end_blk); for (j = start_segno; j < end_segno; j++) { unsigned int seg_blks, sec_blks; seg_blks = get_seg_entry(sbi, j)->valid_blocks; /* update segment stats */ if (IS_CURSEG(sbi, j)) dev_stats[i].devstats[0][DEVSTAT_INUSE]++; else if (seg_blks == BLKS_PER_SEG(sbi)) dev_stats[i].devstats[0][DEVSTAT_FULL]++; else if (seg_blks != 0) dev_stats[i].devstats[0][DEVSTAT_DIRTY]++; else if (!test_bit(j, FREE_I(sbi)->free_segmap)) dev_stats[i].devstats[0][DEVSTAT_FREE]++; else dev_stats[i].devstats[0][DEVSTAT_PREFREE]++; if (!__is_large_section(sbi) || (j % SEGS_PER_SEC(sbi)) != 0) continue; sec_blks = get_sec_entry(sbi, j)->valid_blocks; /* update section stats */ if (IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, j))) dev_stats[i].devstats[1][DEVSTAT_INUSE]++; else if (sec_blks == BLKS_PER_SEC(sbi)) dev_stats[i].devstats[1][DEVSTAT_FULL]++; else if (sec_blks != 0) dev_stats[i].devstats[1][DEVSTAT_DIRTY]++; else if (!test_bit(GET_SEC_FROM_SEG(sbi, j), FREE_I(sbi)->free_secmap)) dev_stats[i].devstats[1][DEVSTAT_FREE]++; else dev_stats[i].devstats[1][DEVSTAT_PREFREE]++; } } } static void update_general_status(struct f2fs_sb_info *sbi) { struct f2fs_stat_info *si = F2FS_STAT(sbi); struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); int i; /* these will be changed if online resize is done */ si->main_area_segs = le32_to_cpu(raw_super->segment_count_main); si->main_area_sections = le32_to_cpu(raw_super->section_count); si->main_area_zones = si->main_area_sections / le32_to_cpu(raw_super->secs_per_zone); /* general extent cache stats */ for (i = 0; i < NR_EXTENT_CACHES; i++) { struct extent_tree_info *eti = &sbi->extent_tree[i]; si->hit_cached[i] = atomic64_read(&sbi->read_hit_cached[i]); si->hit_rbtree[i] = atomic64_read(&sbi->read_hit_rbtree[i]); si->total_ext[i] = atomic64_read(&sbi->total_hit_ext[i]); si->hit_total[i] = si->hit_cached[i] + si->hit_rbtree[i]; si->ext_tree[i] = atomic_read(&eti->total_ext_tree); si->zombie_tree[i] = atomic_read(&eti->total_zombie_tree); si->ext_node[i] = atomic_read(&eti->total_ext_node); } /* read extent_cache only */ si->hit_largest = atomic64_read(&sbi->read_hit_largest); si->hit_total[EX_READ] += si->hit_largest; /* block age extent_cache only */ si->allocated_data_blocks = atomic64_read(&sbi->allocated_data_blocks); /* validation check of the segment numbers */ si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES); si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS); si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META); si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA); si->ndirty_qdata = get_pages(sbi, F2FS_DIRTY_QDATA); si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA); si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE]; si->ndirty_files = sbi->ndirty_inode[FILE_INODE]; si->nquota_files = sbi->nquota_files; si->ndirty_all = sbi->ndirty_inode[DIRTY_META]; si->aw_cnt = atomic_read(&sbi->atomic_files); si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt); si->nr_dio_read = get_pages(sbi, F2FS_DIO_READ); si->nr_dio_write = get_pages(sbi, F2FS_DIO_WRITE); si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA); si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA); si->nr_rd_data = get_pages(sbi, F2FS_RD_DATA); si->nr_rd_node = get_pages(sbi, F2FS_RD_NODE); si->nr_rd_meta = get_pages(sbi, F2FS_RD_META); if (SM_I(sbi)->fcc_info) { si->nr_flushed = atomic_read(&SM_I(sbi)->fcc_info->issued_flush); si->nr_flushing = atomic_read(&SM_I(sbi)->fcc_info->queued_flush); si->flush_list_empty = llist_empty(&SM_I(sbi)->fcc_info->issue_list); } if (SM_I(sbi)->dcc_info) { si->nr_discarded = atomic_read(&SM_I(sbi)->dcc_info->issued_discard); si->nr_discarding = atomic_read(&SM_I(sbi)->dcc_info->queued_discard); si->nr_discard_cmd = atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt); si->undiscard_blks = SM_I(sbi)->dcc_info->undiscard_blks; } si->nr_issued_ckpt = atomic_read(&sbi->cprc_info.issued_ckpt); si->nr_total_ckpt = atomic_read(&sbi->cprc_info.total_ckpt); si->nr_queued_ckpt = atomic_read(&sbi->cprc_info.queued_ckpt); spin_lock(&sbi->cprc_info.stat_lock); si->cur_ckpt_time = sbi->cprc_info.cur_time; si->peak_ckpt_time = sbi->cprc_info.peak_time; spin_unlock(&sbi->cprc_info.stat_lock); si->total_count = BLKS_TO_SEGS(sbi, (int)sbi->user_block_count); si->rsvd_segs = reserved_segments(sbi); si->overp_segs = overprovision_segments(sbi); si->valid_count = valid_user_blocks(sbi); si->discard_blks = discard_blocks(sbi); si->valid_node_count = valid_node_count(sbi); si->valid_inode_count = valid_inode_count(sbi); si->inline_xattr = atomic_read(&sbi->inline_xattr); si->inline_inode = atomic_read(&sbi->inline_inode); si->inline_dir = atomic_read(&sbi->inline_dir); si->compr_inode = atomic_read(&sbi->compr_inode); si->swapfile_inode = atomic_read(&sbi->swapfile_inode); si->compr_blocks = atomic64_read(&sbi->compr_blocks); si->append = sbi->im[APPEND_INO].ino_num; si->update = sbi->im[UPDATE_INO].ino_num; si->orphans = sbi->im[ORPHAN_INO].ino_num; si->utilization = utilization(sbi); si->free_segs = free_segments(sbi); si->free_secs = free_sections(sbi); si->prefree_count = prefree_segments(sbi); si->dirty_count = dirty_segments(sbi); if (sbi->node_inode) si->node_pages = NODE_MAPPING(sbi)->nrpages; if (sbi->meta_inode) si->meta_pages = META_MAPPING(sbi)->nrpages; #ifdef CONFIG_F2FS_FS_COMPRESSION if (sbi->compress_inode) { si->compress_pages = COMPRESS_MAPPING(sbi)->nrpages; si->compress_page_hit = atomic_read(&sbi->compress_page_hit); } #endif si->nats = NM_I(sbi)->nat_cnt[TOTAL_NAT]; si->dirty_nats = NM_I(sbi)->nat_cnt[DIRTY_NAT]; si->sits = MAIN_SEGS(sbi); si->dirty_sits = SIT_I(sbi)->dirty_sentries; si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID]; si->avail_nids = NM_I(sbi)->available_nids; si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID]; si->io_skip_bggc = sbi->io_skip_bggc; si->other_skip_bggc = sbi->other_skip_bggc; si->util_free = (int)(BLKS_TO_SEGS(sbi, free_user_blocks(sbi))) * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) / 2; si->util_valid = (int)(BLKS_TO_SEGS(sbi, written_block_count(sbi))) * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) / 2; si->util_invalid = 50 - si->util_free - si->util_valid; for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) { struct curseg_info *curseg = CURSEG_I(sbi, i); si->curseg[i] = curseg->segno; si->cursec[i] = GET_SEC_FROM_SEG(sbi, curseg->segno); si->curzone[i] = GET_ZONE_FROM_SEC(sbi, si->cursec[i]); } for (i = META_CP; i < META_MAX; i++) si->meta_count[i] = atomic_read(&sbi->meta_count[i]); for (i = 0; i < NO_CHECK_TYPE; i++) { si->dirty_seg[i] = 0; si->full_seg[i] = 0; si->valid_blks[i] = 0; } for (i = 0; i < MAIN_SEGS(sbi); i++) { int blks = get_seg_entry(sbi, i)->valid_blocks; int type = get_seg_entry(sbi, i)->type; if (!blks) continue; if (blks == BLKS_PER_SEG(sbi)) si->full_seg[type]++; else si->dirty_seg[type]++; si->valid_blks[type] += blks; } update_multidevice_stats(sbi); for (i = 0; i < MAX_CALL_TYPE; i++) si->cp_call_count[i] = atomic_read(&sbi->cp_call_count[i]); for (i = 0; i < 2; i++) { si->segment_count[i] = sbi->segment_count[i]; si->block_count[i] = sbi->block_count[i]; } si->inplace_count = atomic_read(&sbi->inplace_count); } /* * This function calculates memory footprint. */ static void update_mem_info(struct f2fs_sb_info *sbi) { struct f2fs_stat_info *si = F2FS_STAT(sbi); int i; if (si->base_mem) goto get_cache; /* build stat */ si->base_mem = sizeof(struct f2fs_stat_info); /* build superblock */ si->base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize; si->base_mem += 2 * sizeof(struct f2fs_inode_info); si->base_mem += sizeof(*sbi->ckpt); /* build sm */ si->base_mem += sizeof(struct f2fs_sm_info); /* build sit */ si->base_mem += sizeof(struct sit_info); si->base_mem += MAIN_SEGS(sbi) * sizeof(struct seg_entry); si->base_mem += f2fs_bitmap_size(MAIN_SEGS(sbi)); si->base_mem += 2 * SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi); si->base_mem += SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi); si->base_mem += SIT_VBLOCK_MAP_SIZE; if (__is_large_section(sbi)) si->base_mem += MAIN_SECS(sbi) * sizeof(struct sec_entry); si->base_mem += __bitmap_size(sbi, SIT_BITMAP); /* build free segmap */ si->base_mem += sizeof(struct free_segmap_info); si->base_mem += f2fs_bitmap_size(MAIN_SEGS(sbi)); si->base_mem += f2fs_bitmap_size(MAIN_SECS(sbi)); /* build curseg */ si->base_mem += sizeof(struct curseg_info) * NR_CURSEG_TYPE; si->base_mem += PAGE_SIZE * NR_CURSEG_TYPE; /* build dirty segmap */ si->base_mem += sizeof(struct dirty_seglist_info); si->base_mem += NR_DIRTY_TYPE * f2fs_bitmap_size(MAIN_SEGS(sbi)); si->base_mem += f2fs_bitmap_size(MAIN_SECS(sbi)); /* build nm */ si->base_mem += sizeof(struct f2fs_nm_info); si->base_mem += __bitmap_size(sbi, NAT_BITMAP); si->base_mem += F2FS_BLK_TO_BYTES(NM_I(sbi)->nat_bits_blocks); si->base_mem += NM_I(sbi)->nat_blocks * f2fs_bitmap_size(NAT_ENTRY_PER_BLOCK); si->base_mem += NM_I(sbi)->nat_blocks / 8; si->base_mem += NM_I(sbi)->nat_blocks * sizeof(unsigned short); get_cache: si->cache_mem = 0; /* build gc */ if (sbi->gc_thread) si->cache_mem += sizeof(struct f2fs_gc_kthread); /* build merge flush thread */ if (SM_I(sbi)->fcc_info) si->cache_mem += sizeof(struct flush_cmd_control); if (SM_I(sbi)->dcc_info) { si->cache_mem += sizeof(struct discard_cmd_control); si->cache_mem += sizeof(struct discard_cmd) * atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt); } /* free nids */ si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] + NM_I(sbi)->nid_cnt[PREALLOC_NID]) * sizeof(struct free_nid); si->cache_mem += NM_I(sbi)->nat_cnt[TOTAL_NAT] * sizeof(struct nat_entry); si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] * sizeof(struct nat_entry_set); for (i = 0; i < MAX_INO_ENTRY; i++) si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry); for (i = 0; i < NR_EXTENT_CACHES; i++) { struct extent_tree_info *eti = &sbi->extent_tree[i]; si->ext_mem[i] = atomic_read(&eti->total_ext_tree) * sizeof(struct extent_tree); si->ext_mem[i] += atomic_read(&eti->total_ext_node) * sizeof(struct extent_node); si->cache_mem += si->ext_mem[i]; } si->page_mem = 0; if (sbi->node_inode) { unsigned long npages = NODE_MAPPING(sbi)->nrpages; si->page_mem += (unsigned long long)npages << PAGE_SHIFT; } if (sbi->meta_inode) { unsigned long npages = META_MAPPING(sbi)->nrpages; si->page_mem += (unsigned long long)npages << PAGE_SHIFT; } #ifdef CONFIG_F2FS_FS_COMPRESSION if (sbi->compress_inode) { unsigned long npages = COMPRESS_MAPPING(sbi)->nrpages; si->page_mem += (unsigned long long)npages << PAGE_SHIFT; } #endif } static const char *s_flag[MAX_SBI_FLAG] = { [SBI_IS_DIRTY] = "fs_dirty", [SBI_IS_CLOSE] = "closing", [SBI_NEED_FSCK] = "need_fsck", [SBI_POR_DOING] = "recovering", [SBI_NEED_SB_WRITE] = "sb_dirty", [SBI_NEED_CP] = "need_cp", [SBI_IS_SHUTDOWN] = "shutdown", [SBI_IS_RECOVERED] = "recovered", [SBI_CP_DISABLED] = "cp_disabled", [SBI_CP_DISABLED_QUICK] = "cp_disabled_quick", [SBI_QUOTA_NEED_FLUSH] = "quota_need_flush", [SBI_QUOTA_SKIP_FLUSH] = "quota_skip_flush", [SBI_QUOTA_NEED_REPAIR] = "quota_need_repair", [SBI_IS_RESIZEFS] = "resizefs", [SBI_IS_FREEZING] = "freezefs", [SBI_IS_WRITABLE] = "writable", }; static const char *ipu_mode_names[F2FS_IPU_MAX] = { [F2FS_IPU_FORCE] = "FORCE", [F2FS_IPU_SSR] = "SSR", [F2FS_IPU_UTIL] = "UTIL", [F2FS_IPU_SSR_UTIL] = "SSR_UTIL", [F2FS_IPU_FSYNC] = "FSYNC", [F2FS_IPU_ASYNC] = "ASYNC", [F2FS_IPU_NOCACHE] = "NOCACHE", [F2FS_IPU_HONOR_OPU_WRITE] = "HONOR_OPU_WRITE", }; static int stat_show(struct seq_file *s, void *v) { struct f2fs_stat_info *si; int i = 0, j = 0; unsigned long flags; raw_spin_lock_irqsave(&f2fs_stat_lock, flags); list_for_each_entry(si, &f2fs_stat_list, stat_list) { struct f2fs_sb_info *sbi = si->sbi; update_general_status(sbi); seq_printf(s, "\n=====[ partition info(%pg). #%d, %s, CP: %s]=====\n", sbi->sb->s_bdev, i++, f2fs_readonly(sbi->sb) ? "RO" : "RW", is_set_ckpt_flags(sbi, CP_DISABLED_FLAG) ? "Disabled" : (f2fs_cp_error(sbi) ? "Error" : "Good")); if (sbi->s_flag) { seq_puts(s, "[SBI:"); for_each_set_bit(j, &sbi->s_flag, MAX_SBI_FLAG) seq_printf(s, " %s", s_flag[j]); seq_puts(s, "]\n"); } seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ", si->sit_area_segs, si->nat_area_segs); seq_printf(s, "[SSA: %d] [MAIN: %d", si->ssa_area_segs, si->main_area_segs); seq_printf(s, "(OverProv:%d Resv:%d)]\n\n", si->overp_segs, si->rsvd_segs); seq_printf(s, "Current Time Sec: %llu / Mounted Time Sec: %llu\n\n", ktime_get_boottime_seconds(), SIT_I(sbi)->mounted_time); seq_puts(s, "Policy:\n"); seq_puts(s, " - IPU: ["); if (IS_F2FS_IPU_DISABLE(sbi)) { seq_puts(s, " DISABLE"); } else { unsigned long policy = SM_I(sbi)->ipu_policy; for_each_set_bit(j, &policy, F2FS_IPU_MAX) seq_printf(s, " %s", ipu_mode_names[j]); } seq_puts(s, " ]\n\n"); if (test_opt(sbi, DISCARD)) seq_printf(s, "Utilization: %u%% (%u valid blocks, %u discard blocks)\n", si->utilization, si->valid_count, si->discard_blks); else seq_printf(s, "Utilization: %u%% (%u valid blocks)\n", si->utilization, si->valid_count); seq_printf(s, " - Node: %u (Inode: %u, ", si->valid_node_count, si->valid_inode_count); seq_printf(s, "Other: %u)\n - Data: %u\n", si->valid_node_count - si->valid_inode_count, si->valid_count - si->valid_node_count); seq_printf(s, " - Inline_xattr Inode: %u\n", si->inline_xattr); seq_printf(s, " - Inline_data Inode: %u\n", si->inline_inode); seq_printf(s, " - Inline_dentry Inode: %u\n", si->inline_dir); seq_printf(s, " - Compressed Inode: %u, Blocks: %llu\n", si->compr_inode, si->compr_blocks); seq_printf(s, " - Swapfile Inode: %u\n", si->swapfile_inode); seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n", si->orphans, si->append, si->update); seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n", si->main_area_segs, si->main_area_sections, si->main_area_zones); seq_printf(s, " TYPE %8s %8s %8s %10s %10s %10s\n", "segno", "secno", "zoneno", "dirty_seg", "full_seg", "valid_blk"); seq_printf(s, " - COLD data: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_COLD_DATA], si->cursec[CURSEG_COLD_DATA], si->curzone[CURSEG_COLD_DATA], si->dirty_seg[CURSEG_COLD_DATA], si->full_seg[CURSEG_COLD_DATA], si->valid_blks[CURSEG_COLD_DATA]); seq_printf(s, " - WARM data: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_WARM_DATA], si->cursec[CURSEG_WARM_DATA], si->curzone[CURSEG_WARM_DATA], si->dirty_seg[CURSEG_WARM_DATA], si->full_seg[CURSEG_WARM_DATA], si->valid_blks[CURSEG_WARM_DATA]); seq_printf(s, " - HOT data: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_HOT_DATA], si->cursec[CURSEG_HOT_DATA], si->curzone[CURSEG_HOT_DATA], si->dirty_seg[CURSEG_HOT_DATA], si->full_seg[CURSEG_HOT_DATA], si->valid_blks[CURSEG_HOT_DATA]); seq_printf(s, " - Dir dnode: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_HOT_NODE], si->cursec[CURSEG_HOT_NODE], si->curzone[CURSEG_HOT_NODE], si->dirty_seg[CURSEG_HOT_NODE], si->full_seg[CURSEG_HOT_NODE], si->valid_blks[CURSEG_HOT_NODE]); seq_printf(s, " - File dnode: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_WARM_NODE], si->cursec[CURSEG_WARM_NODE], si->curzone[CURSEG_WARM_NODE], si->dirty_seg[CURSEG_WARM_NODE], si->full_seg[CURSEG_WARM_NODE], si->valid_blks[CURSEG_WARM_NODE]); seq_printf(s, " - Indir nodes: %8d %8d %8d %10u %10u %10u\n", si->curseg[CURSEG_COLD_NODE], si->cursec[CURSEG_COLD_NODE], si->curzone[CURSEG_COLD_NODE], si->dirty_seg[CURSEG_COLD_NODE], si->full_seg[CURSEG_COLD_NODE], si->valid_blks[CURSEG_COLD_NODE]); seq_printf(s, " - Pinned file: %8d %8d %8d\n", si->curseg[CURSEG_COLD_DATA_PINNED], si->cursec[CURSEG_COLD_DATA_PINNED], si->curzone[CURSEG_COLD_DATA_PINNED]); seq_printf(s, " - ATGC data: %8d %8d %8d\n", si->curseg[CURSEG_ALL_DATA_ATGC], si->cursec[CURSEG_ALL_DATA_ATGC], si->curzone[CURSEG_ALL_DATA_ATGC]); seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n", si->main_area_segs - si->dirty_count - si->prefree_count - si->free_segs, si->dirty_count); seq_printf(s, " - Prefree: %d\n - Free: %d (%d)\n\n", si->prefree_count, si->free_segs, si->free_secs); if (f2fs_is_multi_device(sbi)) { seq_puts(s, "Multidevice stats:\n"); seq_printf(s, " [seg: %8s %8s %8s %8s %8s]", "inuse", "dirty", "full", "free", "prefree"); if (__is_large_section(sbi)) seq_printf(s, " [sec: %8s %8s %8s %8s %8s]\n", "inuse", "dirty", "full", "free", "prefree"); else seq_puts(s, "\n"); for (i = 0; i < sbi->s_ndevs; i++) { seq_printf(s, " #%-2d %8u %8u %8u %8u %8u", i, si->dev_stats[i].devstats[0][DEVSTAT_INUSE], si->dev_stats[i].devstats[0][DEVSTAT_DIRTY], si->dev_stats[i].devstats[0][DEVSTAT_FULL], si->dev_stats[i].devstats[0][DEVSTAT_FREE], si->dev_stats[i].devstats[0][DEVSTAT_PREFREE]); if (!__is_large_section(sbi)) { seq_puts(s, "\n"); continue; } seq_printf(s, " %8u %8u %8u %8u %8u\n", si->dev_stats[i].devstats[1][DEVSTAT_INUSE], si->dev_stats[i].devstats[1][DEVSTAT_DIRTY], si->dev_stats[i].devstats[1][DEVSTAT_FULL], si->dev_stats[i].devstats[1][DEVSTAT_FREE], si->dev_stats[i].devstats[1][DEVSTAT_PREFREE]); } seq_puts(s, "\n"); } seq_printf(s, "CP calls: %d (BG: %d)\n", si->cp_call_count[TOTAL_CALL], si->cp_call_count[BACKGROUND]); seq_printf(s, "CP count: %d\n", si->cp_count); seq_printf(s, " - cp blocks : %u\n", si->meta_count[META_CP]); seq_printf(s, " - sit blocks : %u\n", si->meta_count[META_SIT]); seq_printf(s, " - nat blocks : %u\n", si->meta_count[META_NAT]); seq_printf(s, " - ssa blocks : %u\n", si->meta_count[META_SSA]); seq_puts(s, "CP merge:\n"); seq_printf(s, " - Queued : %4d\n", si->nr_queued_ckpt); seq_printf(s, " - Issued : %4d\n", si->nr_issued_ckpt); seq_printf(s, " - Total : %4d\n", si->nr_total_ckpt); seq_printf(s, " - Cur time : %4d(ms)\n", si->cur_ckpt_time); seq_printf(s, " - Peak time : %4d(ms)\n", si->peak_ckpt_time); seq_printf(s, "GC calls: %d (gc_thread: %d)\n", si->gc_call_count[BACKGROUND] + si->gc_call_count[FOREGROUND], si->gc_call_count[BACKGROUND]); if (__is_large_section(sbi)) { seq_printf(s, " - data sections : %d (BG: %d)\n", si->gc_secs[DATA][BG_GC] + si->gc_secs[DATA][FG_GC], si->gc_secs[DATA][BG_GC]); seq_printf(s, " - node sections : %d (BG: %d)\n", si->gc_secs[NODE][BG_GC] + si->gc_secs[NODE][FG_GC], si->gc_secs[NODE][BG_GC]); } seq_printf(s, " - data segments : %d (BG: %d)\n", si->gc_segs[DATA][BG_GC] + si->gc_segs[DATA][FG_GC], si->gc_segs[DATA][BG_GC]); seq_printf(s, " - node segments : %d (BG: %d)\n", si->gc_segs[NODE][BG_GC] + si->gc_segs[NODE][FG_GC], si->gc_segs[NODE][BG_GC]); seq_puts(s, " - Reclaimed segs :\n"); seq_printf(s, " - Normal : %d\n", sbi->gc_reclaimed_segs[GC_NORMAL]); seq_printf(s, " - Idle CB : %d\n", sbi->gc_reclaimed_segs[GC_IDLE_CB]); seq_printf(s, " - Idle Greedy : %d\n", sbi->gc_reclaimed_segs[GC_IDLE_GREEDY]); seq_printf(s, " - Idle AT : %d\n", sbi->gc_reclaimed_segs[GC_IDLE_AT]); seq_printf(s, " - Urgent High : %d\n", sbi->gc_reclaimed_segs[GC_URGENT_HIGH]); seq_printf(s, " - Urgent Mid : %d\n", sbi->gc_reclaimed_segs[GC_URGENT_MID]); seq_printf(s, " - Urgent Low : %d\n", sbi->gc_reclaimed_segs[GC_URGENT_LOW]); seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks, si->bg_data_blks + si->bg_node_blks); seq_printf(s, " - data blocks : %d (%d)\n", si->data_blks, si->bg_data_blks); seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks, si->bg_node_blks); seq_printf(s, "BG skip : IO: %u, Other: %u\n", si->io_skip_bggc, si->other_skip_bggc); seq_puts(s, "\nExtent Cache (Read):\n"); seq_printf(s, " - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n", si->hit_largest, si->hit_cached[EX_READ], si->hit_rbtree[EX_READ]); seq_printf(s, " - Hit Ratio: %llu%% (%llu / %llu)\n", !si->total_ext[EX_READ] ? 0 : div64_u64(si->hit_total[EX_READ] * 100, si->total_ext[EX_READ]), si->hit_total[EX_READ], si->total_ext[EX_READ]); seq_printf(s, " - Inner Struct Count: tree: %d(%d), node: %d\n", si->ext_tree[EX_READ], si->zombie_tree[EX_READ], si->ext_node[EX_READ]); seq_puts(s, "\nExtent Cache (Block Age):\n"); seq_printf(s, " - Allocated Data Blocks: %llu\n", si->allocated_data_blocks); seq_printf(s, " - Hit Count: L1:%llu L2:%llu\n", si->hit_cached[EX_BLOCK_AGE], si->hit_rbtree[EX_BLOCK_AGE]); seq_printf(s, " - Hit Ratio: %llu%% (%llu / %llu)\n", !si->total_ext[EX_BLOCK_AGE] ? 0 : div64_u64(si->hit_total[EX_BLOCK_AGE] * 100, si->total_ext[EX_BLOCK_AGE]), si->hit_total[EX_BLOCK_AGE], si->total_ext[EX_BLOCK_AGE]); seq_printf(s, " - Inner Struct Count: tree: %d(%d), node: %d\n", si->ext_tree[EX_BLOCK_AGE], si->zombie_tree[EX_BLOCK_AGE], si->ext_node[EX_BLOCK_AGE]); seq_puts(s, "\nBalancing F2FS Async:\n"); seq_printf(s, " - DIO (R: %4d, W: %4d)\n", si->nr_dio_read, si->nr_dio_write); seq_printf(s, " - IO_R (Data: %4d, Node: %4d, Meta: %4d\n", si->nr_rd_data, si->nr_rd_node, si->nr_rd_meta); seq_printf(s, " - IO_W (CP: %4d, Data: %4d, Flush: (%4d %4d %4d), ", si->nr_wb_cp_data, si->nr_wb_data, si->nr_flushing, si->nr_flushed, si->flush_list_empty); seq_printf(s, "Discard: (%4d %4d)) cmd: %4d undiscard:%4u\n", si->nr_discarding, si->nr_discarded, si->nr_discard_cmd, si->undiscard_blks); seq_printf(s, " - atomic IO: %4d (Max. %4d)\n", si->aw_cnt, si->max_aw_cnt); seq_printf(s, " - compress: %4d, hit:%8d\n", si->compress_pages, si->compress_page_hit); seq_printf(s, " - nodes: %4d in %4d\n", si->ndirty_node, si->node_pages); seq_printf(s, " - dents: %4d in dirs:%4d (%4d)\n", si->ndirty_dent, si->ndirty_dirs, si->ndirty_all); seq_printf(s, " - data: %4d in files:%4d\n", si->ndirty_data, si->ndirty_files); seq_printf(s, " - quota data: %4d in quota files:%4d\n", si->ndirty_qdata, si->nquota_files); seq_printf(s, " - meta: %4d in %4d\n", si->ndirty_meta, si->meta_pages); seq_printf(s, " - imeta: %4d\n", si->ndirty_imeta); seq_printf(s, " - fsync mark: %4lld\n", percpu_counter_sum_positive( &sbi->rf_node_block_count)); seq_printf(s, " - NATs: %9d/%9d\n - SITs: %9d/%9d\n", si->dirty_nats, si->nats, si->dirty_sits, si->sits); seq_printf(s, " - free_nids: %9d/%9d\n - alloc_nids: %9d\n", si->free_nids, si->avail_nids, si->alloc_nids); seq_puts(s, "\nDistribution of User Blocks:"); seq_puts(s, " [ valid | invalid | free ]\n"); seq_puts(s, " ["); for (j = 0; j < si->util_valid; j++) seq_putc(s, '-'); seq_putc(s, '|'); for (j = 0; j < si->util_invalid; j++) seq_putc(s, '-'); seq_putc(s, '|'); for (j = 0; j < si->util_free; j++) seq_putc(s, '-'); seq_puts(s, "]\n\n"); seq_printf(s, "IPU: %u blocks\n", si->inplace_count); seq_printf(s, "SSR: %u blocks in %u segments\n", si->block_count[SSR], si->segment_count[SSR]); seq_printf(s, "LFS: %u blocks in %u segments\n", si->block_count[LFS], si->segment_count[LFS]); /* segment usage info */ f2fs_update_sit_info(sbi); seq_printf(s, "\nBDF: %u, avg. vblocks: %u\n", si->bimodal, si->avg_vblocks); /* memory footprint */ update_mem_info(sbi); seq_printf(s, "\nMemory: %llu KB\n", (si->base_mem + si->cache_mem + si->page_mem) >> 10); seq_printf(s, " - static: %llu KB\n", si->base_mem >> 10); seq_printf(s, " - cached all: %llu KB\n", si->cache_mem >> 10); seq_printf(s, " - read extent cache: %llu KB\n", si->ext_mem[EX_READ] >> 10); seq_printf(s, " - block age extent cache: %llu KB\n", si->ext_mem[EX_BLOCK_AGE] >> 10); seq_printf(s, " - paged : %llu KB\n", si->page_mem >> 10); } raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); return 0; } DEFINE_SHOW_ATTRIBUTE(stat); #endif int f2fs_build_stats(struct f2fs_sb_info *sbi) { struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); struct f2fs_stat_info *si; struct f2fs_dev_stats *dev_stats; unsigned long flags; int i; si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL); if (!si) return -ENOMEM; dev_stats = f2fs_kzalloc(sbi, sizeof(struct f2fs_dev_stats) * sbi->s_ndevs, GFP_KERNEL); if (!dev_stats) { kfree(si); return -ENOMEM; } si->dev_stats = dev_stats; si->all_area_segs = le32_to_cpu(raw_super->segment_count); si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit); si->nat_area_segs = le32_to_cpu(raw_super->segment_count_nat); si->ssa_area_segs = le32_to_cpu(raw_super->segment_count_ssa); si->main_area_segs = le32_to_cpu(raw_super->segment_count_main); si->main_area_sections = le32_to_cpu(raw_super->section_count); si->main_area_zones = si->main_area_sections / le32_to_cpu(raw_super->secs_per_zone); si->sbi = sbi; sbi->stat_info = si; /* general extent cache stats */ for (i = 0; i < NR_EXTENT_CACHES; i++) { atomic64_set(&sbi->total_hit_ext[i], 0); atomic64_set(&sbi->read_hit_rbtree[i], 0); atomic64_set(&sbi->read_hit_cached[i], 0); } /* read extent_cache only */ atomic64_set(&sbi->read_hit_largest, 0); atomic_set(&sbi->inline_xattr, 0); atomic_set(&sbi->inline_inode, 0); atomic_set(&sbi->inline_dir, 0); atomic_set(&sbi->compr_inode, 0); atomic64_set(&sbi->compr_blocks, 0); atomic_set(&sbi->swapfile_inode, 0); atomic_set(&sbi->atomic_files, 0); atomic_set(&sbi->inplace_count, 0); for (i = META_CP; i < META_MAX; i++) atomic_set(&sbi->meta_count[i], 0); for (i = 0; i < MAX_CALL_TYPE; i++) atomic_set(&sbi->cp_call_count[i], 0); atomic_set(&sbi->max_aw_cnt, 0); raw_spin_lock_irqsave(&f2fs_stat_lock, flags); list_add_tail(&si->stat_list, &f2fs_stat_list); raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); return 0; } void f2fs_destroy_stats(struct f2fs_sb_info *sbi) { struct f2fs_stat_info *si = F2FS_STAT(sbi); unsigned long flags; raw_spin_lock_irqsave(&f2fs_stat_lock, flags); list_del(&si->stat_list); raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); kfree(si->dev_stats); kfree(si); } void __init f2fs_create_root_stats(void) { #ifdef CONFIG_DEBUG_FS f2fs_debugfs_root = debugfs_create_dir("f2fs", NULL); debugfs_create_file("status", 0444, f2fs_debugfs_root, NULL, &stat_fops); #endif } void f2fs_destroy_root_stats(void) { #ifdef CONFIG_DEBUG_FS debugfs_remove_recursive(f2fs_debugfs_root); f2fs_debugfs_root = NULL; #endif } |
4 4 8 1 7 3 4 11 11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | /* * Route Plug-In * Copyright (c) 2000 by Abramo Bagnara <abramo@alsa-project.org> * * * This library is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of * the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Library General Public License for more details. * * You should have received a copy of the GNU Library General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <linux/time.h> #include <sound/core.h> #include <sound/pcm.h> #include "pcm_plugin.h" static void zero_areas(struct snd_pcm_plugin_channel *dvp, int ndsts, snd_pcm_uframes_t frames, snd_pcm_format_t format) { int dst = 0; for (; dst < ndsts; ++dst) { if (dvp->wanted) snd_pcm_area_silence(&dvp->area, 0, frames, format); dvp->enabled = 0; dvp++; } } static inline void copy_area(const struct snd_pcm_plugin_channel *src_channel, struct snd_pcm_plugin_channel *dst_channel, snd_pcm_uframes_t frames, snd_pcm_format_t format) { dst_channel->enabled = 1; snd_pcm_area_copy(&src_channel->area, 0, &dst_channel->area, 0, frames, format); } static snd_pcm_sframes_t route_transfer(struct snd_pcm_plugin *plugin, const struct snd_pcm_plugin_channel *src_channels, struct snd_pcm_plugin_channel *dst_channels, snd_pcm_uframes_t frames) { int nsrcs, ndsts, dst; struct snd_pcm_plugin_channel *dvp; snd_pcm_format_t format; if (snd_BUG_ON(!plugin || !src_channels || !dst_channels)) return -ENXIO; if (frames == 0) return 0; if (frames > dst_channels[0].frames) frames = dst_channels[0].frames; nsrcs = plugin->src_format.channels; ndsts = plugin->dst_format.channels; format = plugin->dst_format.format; dvp = dst_channels; if (nsrcs <= 1) { /* expand to all channels */ for (dst = 0; dst < ndsts; ++dst) { copy_area(src_channels, dvp, frames, format); dvp++; } return frames; } for (dst = 0; dst < ndsts && dst < nsrcs; ++dst) { copy_area(src_channels, dvp, frames, format); dvp++; src_channels++; } if (dst < ndsts) zero_areas(dvp, ndsts - dst, frames, format); return frames; } int snd_pcm_plugin_build_route(struct snd_pcm_substream *plug, struct snd_pcm_plugin_format *src_format, struct snd_pcm_plugin_format *dst_format, struct snd_pcm_plugin **r_plugin) { struct snd_pcm_plugin *plugin; int err; if (snd_BUG_ON(!r_plugin)) return -ENXIO; *r_plugin = NULL; if (snd_BUG_ON(src_format->rate != dst_format->rate)) return -ENXIO; if (snd_BUG_ON(src_format->format != dst_format->format)) return -ENXIO; err = snd_pcm_plugin_build(plug, "route conversion", src_format, dst_format, 0, &plugin); if (err < 0) return err; plugin->transfer = route_transfer; *r_plugin = plugin; return 0; } |
112 112 17280 3516 16394 1505 1520 1428 654 1429 1426 586 16 445 581 582 329 329 32 30 2 283 81 275 273 1 84 2 2 84 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 | // SPDX-License-Identifier: GPL-2.0 /* * Fast batching percpu counters. */ #include <linux/percpu_counter.h> #include <linux/mutex.h> #include <linux/init.h> #include <linux/cpu.h> #include <linux/module.h> #include <linux/debugobjects.h> #ifdef CONFIG_HOTPLUG_CPU static LIST_HEAD(percpu_counters); static DEFINE_SPINLOCK(percpu_counters_lock); #endif #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER static const struct debug_obj_descr percpu_counter_debug_descr; static bool percpu_counter_fixup_free(void *addr, enum debug_obj_state state) { struct percpu_counter *fbc = addr; switch (state) { case ODEBUG_STATE_ACTIVE: percpu_counter_destroy(fbc); debug_object_free(fbc, &percpu_counter_debug_descr); return true; default: return false; } } static const struct debug_obj_descr percpu_counter_debug_descr = { .name = "percpu_counter", .fixup_free = percpu_counter_fixup_free, }; static inline void debug_percpu_counter_activate(struct percpu_counter *fbc) { debug_object_init(fbc, &percpu_counter_debug_descr); debug_object_activate(fbc, &percpu_counter_debug_descr); } static inline void debug_percpu_counter_deactivate(struct percpu_counter *fbc) { debug_object_deactivate(fbc, &percpu_counter_debug_descr); debug_object_free(fbc, &percpu_counter_debug_descr); } #else /* CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER */ static inline void debug_percpu_counter_activate(struct percpu_counter *fbc) { } static inline void debug_percpu_counter_deactivate(struct percpu_counter *fbc) { } #endif /* CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER */ void percpu_counter_set(struct percpu_counter *fbc, s64 amount) { int cpu; unsigned long flags; raw_spin_lock_irqsave(&fbc->lock, flags); for_each_possible_cpu(cpu) { s32 *pcount = per_cpu_ptr(fbc->counters, cpu); *pcount = 0; } fbc->count = amount; raw_spin_unlock_irqrestore(&fbc->lock, flags); } EXPORT_SYMBOL(percpu_counter_set); /* * Add to a counter while respecting batch size. * * There are 2 implementations, both dealing with the following problem: * * The decision slow path/fast path and the actual update must be atomic. * Otherwise a call in process context could check the current values and * decide that the fast path can be used. If now an interrupt occurs before * the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters), * then the this_cpu_add() that is executed after the interrupt has completed * can produce values larger than "batch" or even overflows. */ #ifdef CONFIG_HAVE_CMPXCHG_LOCAL /* * Safety against interrupts is achieved in 2 ways: * 1. the fast path uses local cmpxchg (note: no lock prefix) * 2. the slow path operates with interrupts disabled */ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { s64 count; unsigned long flags; count = this_cpu_read(*fbc->counters); do { if (unlikely(abs(count + amount) >= batch)) { raw_spin_lock_irqsave(&fbc->lock, flags); /* * Note: by now we might have migrated to another CPU * or the value might have changed. */ count = __this_cpu_read(*fbc->counters); fbc->count += count + amount; __this_cpu_sub(*fbc->counters, count); raw_spin_unlock_irqrestore(&fbc->lock, flags); return; } } while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount)); } #else /* * local_irq_save() is used to make the function irq safe: * - The slow path would be ok as protected by an irq-safe spinlock. * - this_cpu_add would be ok as it is irq-safe by definition. */ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { s64 count; unsigned long flags; local_irq_save(flags); count = __this_cpu_read(*fbc->counters) + amount; if (abs(count) >= batch) { raw_spin_lock(&fbc->lock); fbc->count += count; __this_cpu_sub(*fbc->counters, count - amount); raw_spin_unlock(&fbc->lock); } else { this_cpu_add(*fbc->counters, amount); } local_irq_restore(flags); } #endif EXPORT_SYMBOL(percpu_counter_add_batch); /* * For percpu_counter with a big batch, the devication of its count could * be big, and there is requirement to reduce the deviation, like when the * counter's batch could be runtime decreased to get a better accuracy, * which can be achieved by running this sync function on each CPU. */ void percpu_counter_sync(struct percpu_counter *fbc) { unsigned long flags; s64 count; raw_spin_lock_irqsave(&fbc->lock, flags); count = __this_cpu_read(*fbc->counters); fbc->count += count; __this_cpu_sub(*fbc->counters, count); raw_spin_unlock_irqrestore(&fbc->lock, flags); } EXPORT_SYMBOL(percpu_counter_sync); /* * Add up all the per-cpu counts, return the result. This is a more accurate * but much slower version of percpu_counter_read_positive(). * * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums * from CPUs that are in the process of being taken offline. Dying cpus have * been removed from the online mask, but may not have had the hotplug dead * notifier called to fold the percpu count back into the global counter sum. * By including dying CPUs in the iteration mask, we avoid this race condition * so __percpu_counter_sum() just does the right thing when CPUs are being taken * offline. */ s64 __percpu_counter_sum(struct percpu_counter *fbc) { s64 ret; int cpu; unsigned long flags; raw_spin_lock_irqsave(&fbc->lock, flags); ret = fbc->count; for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { s32 *pcount = per_cpu_ptr(fbc->counters, cpu); ret += *pcount; } raw_spin_unlock_irqrestore(&fbc->lock, flags); return ret; } EXPORT_SYMBOL(__percpu_counter_sum); int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp, u32 nr_counters, struct lock_class_key *key) { unsigned long flags __maybe_unused; size_t counter_size; s32 __percpu *counters; u32 i; counter_size = ALIGN(sizeof(*counters), __alignof__(*counters)); counters = __alloc_percpu_gfp(nr_counters * counter_size, __alignof__(*counters), gfp); if (!counters) { fbc[0].counters = NULL; return -ENOMEM; } for (i = 0; i < nr_counters; i++) { raw_spin_lock_init(&fbc[i].lock); lockdep_set_class(&fbc[i].lock, key); #ifdef CONFIG_HOTPLUG_CPU INIT_LIST_HEAD(&fbc[i].list); #endif fbc[i].count = amount; fbc[i].counters = (void __percpu *)counters + i * counter_size; debug_percpu_counter_activate(&fbc[i]); } #ifdef CONFIG_HOTPLUG_CPU spin_lock_irqsave(&percpu_counters_lock, flags); for (i = 0; i < nr_counters; i++) list_add(&fbc[i].list, &percpu_counters); spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } EXPORT_SYMBOL(__percpu_counter_init_many); void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters) { unsigned long flags __maybe_unused; u32 i; if (WARN_ON_ONCE(!fbc)) return; if (!fbc[0].counters) return; for (i = 0; i < nr_counters; i++) debug_percpu_counter_deactivate(&fbc[i]); #ifdef CONFIG_HOTPLUG_CPU spin_lock_irqsave(&percpu_counters_lock, flags); for (i = 0; i < nr_counters; i++) list_del(&fbc[i].list); spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif free_percpu(fbc[0].counters); for (i = 0; i < nr_counters; i++) fbc[i].counters = NULL; } EXPORT_SYMBOL(percpu_counter_destroy_many); int percpu_counter_batch __read_mostly = 32; EXPORT_SYMBOL(percpu_counter_batch); static int compute_batch_value(unsigned int cpu) { int nr = num_online_cpus(); percpu_counter_batch = max(32, nr*2); return 0; } static int percpu_counter_cpu_dead(unsigned int cpu) { #ifdef CONFIG_HOTPLUG_CPU struct percpu_counter *fbc; compute_batch_value(cpu); spin_lock_irq(&percpu_counters_lock); list_for_each_entry(fbc, &percpu_counters, list) { s32 *pcount; raw_spin_lock(&fbc->lock); pcount = per_cpu_ptr(fbc->counters, cpu); fbc->count += *pcount; *pcount = 0; raw_spin_unlock(&fbc->lock); } spin_unlock_irq(&percpu_counters_lock); #endif return 0; } /* * Compare counter against given value. * Return 1 if greater, 0 if equal and -1 if less */ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) { s64 count; count = percpu_counter_read(fbc); /* Check to see if rough count will be sufficient for comparison */ if (abs(count - rhs) > (batch * num_online_cpus())) { if (count > rhs) return 1; else return -1; } /* Need to use precise count */ count = percpu_counter_sum(fbc); if (count > rhs) return 1; else if (count < rhs) return -1; else return 0; } EXPORT_SYMBOL(__percpu_counter_compare); /* * Compare counter, and add amount if total is: less than or equal to limit if * amount is positive, or greater than or equal to limit if amount is negative. * Return true if amount is added, or false if total would be beyond the limit. * * Negative limit is allowed, but unusual. * When negative amounts (subs) are given to percpu_counter_limited_add(), * the limit would most naturally be 0 - but other limits are also allowed. * * Overflow beyond S64_MAX is not allowed for: counter, limit and amount * are all assumed to be sane (far from S64_MIN and S64_MAX). */ bool __percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount, s32 batch) { s64 count; s64 unknown; unsigned long flags; bool good = false; if (amount == 0) return true; local_irq_save(flags); unknown = batch * num_online_cpus(); count = __this_cpu_read(*fbc->counters); /* Skip taking the lock when safe */ if (abs(count + amount) <= batch && ((amount > 0 && fbc->count + unknown <= limit) || (amount < 0 && fbc->count - unknown >= limit))) { this_cpu_add(*fbc->counters, amount); local_irq_restore(flags); return true; } raw_spin_lock(&fbc->lock); count = fbc->count + amount; /* Skip percpu_counter_sum() when safe */ if (amount > 0) { if (count - unknown > limit) goto out; if (count + unknown <= limit) good = true; } else { if (count + unknown < limit) goto out; if (count - unknown >= limit) good = true; } if (!good) { s32 *pcount; int cpu; for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { pcount = per_cpu_ptr(fbc->counters, cpu); count += *pcount; } if (amount > 0) { if (count > limit) goto out; } else { if (count < limit) goto out; } good = true; } count = __this_cpu_read(*fbc->counters); fbc->count += count + amount; __this_cpu_sub(*fbc->counters, count); out: raw_spin_unlock(&fbc->lock); local_irq_restore(flags); return good; } static int __init percpu_counter_startup(void) { int ret; ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online", compute_batch_value, NULL); WARN_ON(ret < 0); ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD, "lib/percpu_cnt:dead", NULL, percpu_counter_cpu_dead); WARN_ON(ret < 0); return 0; } module_init(percpu_counter_startup); |
6 5 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 7 2 8 3 2 1 1 4 2 2 3 1 2 4 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | // SPDX-License-Identifier: GPL-2.0-only /* * vivid-vbi-cap.c - vbi capture support functions. * * Copyright 2014 Cisco Systems, Inc. and/or its affiliates. All rights reserved. */ #include <linux/errno.h> #include <linux/kernel.h> #include <linux/videodev2.h> #include <media/v4l2-common.h> #include "vivid-core.h" #include "vivid-kthread-cap.h" #include "vivid-vbi-cap.h" #include "vivid-vbi-gen.h" #include "vivid-vid-common.h" static void vivid_sliced_vbi_cap_fill(struct vivid_dev *dev, unsigned seqnr) { struct vivid_vbi_gen_data *vbi_gen = &dev->vbi_gen; bool is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; vivid_vbi_gen_sliced(vbi_gen, is_60hz, seqnr); if (!is_60hz) { if (vivid_vid_can_loop(dev)) { if (dev->vbi_out_have_wss) { vbi_gen->data[12].data[0] = dev->vbi_out_wss[0]; vbi_gen->data[12].data[1] = dev->vbi_out_wss[1]; } else { vbi_gen->data[12].id = 0; } } else { switch (tpg_g_video_aspect(&dev->tpg)) { case TPG_VIDEO_ASPECT_14X9_CENTRE: vbi_gen->data[12].data[0] = 0x01; break; case TPG_VIDEO_ASPECT_16X9_CENTRE: vbi_gen->data[12].data[0] = 0x0b; break; case TPG_VIDEO_ASPECT_16X9_ANAMORPHIC: vbi_gen->data[12].data[0] = 0x07; break; case TPG_VIDEO_ASPECT_4X3: default: vbi_gen->data[12].data[0] = 0x08; break; } } } else if (vivid_vid_can_loop(dev) && is_60hz) { if (dev->vbi_out_have_cc[0]) { vbi_gen->data[0].data[0] = dev->vbi_out_cc[0][0]; vbi_gen->data[0].data[1] = dev->vbi_out_cc[0][1]; } else { vbi_gen->data[0].id = 0; } if (dev->vbi_out_have_cc[1]) { vbi_gen->data[1].data[0] = dev->vbi_out_cc[1][0]; vbi_gen->data[1].data[1] = dev->vbi_out_cc[1][1]; } else { vbi_gen->data[1].id = 0; } } } static void vivid_g_fmt_vbi_cap(struct vivid_dev *dev, struct v4l2_vbi_format *vbi) { bool is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; vbi->sampling_rate = 27000000; vbi->offset = 24; vbi->samples_per_line = 1440; vbi->sample_format = V4L2_PIX_FMT_GREY; vbi->start[0] = is_60hz ? V4L2_VBI_ITU_525_F1_START + 9 : V4L2_VBI_ITU_625_F1_START + 5; vbi->start[1] = is_60hz ? V4L2_VBI_ITU_525_F2_START + 9 : V4L2_VBI_ITU_625_F2_START + 5; vbi->count[0] = vbi->count[1] = is_60hz ? 12 : 18; vbi->flags = dev->vbi_cap_interlaced ? V4L2_VBI_INTERLACED : 0; vbi->reserved[0] = 0; vbi->reserved[1] = 0; } void vivid_raw_vbi_cap_process(struct vivid_dev *dev, struct vivid_buffer *buf) { struct v4l2_vbi_format vbi; u8 *vbuf = vb2_plane_vaddr(&buf->vb.vb2_buf, 0); vivid_g_fmt_vbi_cap(dev, &vbi); buf->vb.sequence = dev->vbi_cap_seq_count; if (dev->field_cap == V4L2_FIELD_ALTERNATE) buf->vb.sequence /= 2; vivid_sliced_vbi_cap_fill(dev, buf->vb.sequence); memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0)); if (!VIVID_INVALID_SIGNAL(dev->std_signal_mode[dev->input])) vivid_vbi_gen_raw(&dev->vbi_gen, &vbi, vbuf); } void vivid_sliced_vbi_cap_process(struct vivid_dev *dev, struct vivid_buffer *buf) { struct v4l2_sliced_vbi_data *vbuf = vb2_plane_vaddr(&buf->vb.vb2_buf, 0); buf->vb.sequence = dev->vbi_cap_seq_count; if (dev->field_cap == V4L2_FIELD_ALTERNATE) buf->vb.sequence /= 2; vivid_sliced_vbi_cap_fill(dev, buf->vb.sequence); memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0)); if (!VIVID_INVALID_SIGNAL(dev->std_signal_mode[dev->input])) { unsigned i; for (i = 0; i < 25; i++) vbuf[i] = dev->vbi_gen.data[i]; } } static int vbi_cap_queue_setup(struct vb2_queue *vq, unsigned *nbuffers, unsigned *nplanes, unsigned sizes[], struct device *alloc_devs[]) { struct vivid_dev *dev = vb2_get_drv_priv(vq); bool is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; unsigned size = vq->type == V4L2_BUF_TYPE_SLICED_VBI_CAPTURE ? 36 * sizeof(struct v4l2_sliced_vbi_data) : 1440 * 2 * (is_60hz ? 12 : 18); if (!vivid_is_sdtv_cap(dev)) return -EINVAL; if (*nplanes) return sizes[0] < size ? -EINVAL : 0; sizes[0] = size; *nplanes = 1; return 0; } static int vbi_cap_buf_prepare(struct vb2_buffer *vb) { struct vivid_dev *dev = vb2_get_drv_priv(vb->vb2_queue); bool is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; unsigned size = vb->vb2_queue->type == V4L2_BUF_TYPE_SLICED_VBI_CAPTURE ? 36 * sizeof(struct v4l2_sliced_vbi_data) : 1440 * 2 * (is_60hz ? 12 : 18); dprintk(dev, 1, "%s\n", __func__); if (dev->buf_prepare_error) { /* * Error injection: test what happens if buf_prepare() returns * an error. */ dev->buf_prepare_error = false; return -EINVAL; } if (vb2_plane_size(vb, 0) < size) { dprintk(dev, 1, "%s data will not fit into plane (%lu < %u)\n", __func__, vb2_plane_size(vb, 0), size); return -EINVAL; } vb2_set_plane_payload(vb, 0, size); return 0; } static void vbi_cap_buf_queue(struct vb2_buffer *vb) { struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); struct vivid_dev *dev = vb2_get_drv_priv(vb->vb2_queue); struct vivid_buffer *buf = container_of(vbuf, struct vivid_buffer, vb); dprintk(dev, 1, "%s\n", __func__); spin_lock(&dev->slock); list_add_tail(&buf->list, &dev->vbi_cap_active); spin_unlock(&dev->slock); } static int vbi_cap_start_streaming(struct vb2_queue *vq, unsigned count) { struct vivid_dev *dev = vb2_get_drv_priv(vq); int err; dprintk(dev, 1, "%s\n", __func__); dev->vbi_cap_seq_count = 0; if (dev->start_streaming_error) { dev->start_streaming_error = false; err = -EINVAL; } else { err = vivid_start_generating_vid_cap(dev, &dev->vbi_cap_streaming); } if (err) { struct vivid_buffer *buf, *tmp; list_for_each_entry_safe(buf, tmp, &dev->vbi_cap_active, list) { list_del(&buf->list); vb2_buffer_done(&buf->vb.vb2_buf, VB2_BUF_STATE_QUEUED); } } return err; } /* abort streaming and wait for last buffer */ static void vbi_cap_stop_streaming(struct vb2_queue *vq) { struct vivid_dev *dev = vb2_get_drv_priv(vq); dprintk(dev, 1, "%s\n", __func__); vivid_stop_generating_vid_cap(dev, &dev->vbi_cap_streaming); } static void vbi_cap_buf_request_complete(struct vb2_buffer *vb) { struct vivid_dev *dev = vb2_get_drv_priv(vb->vb2_queue); v4l2_ctrl_request_complete(vb->req_obj.req, &dev->ctrl_hdl_vbi_cap); } const struct vb2_ops vivid_vbi_cap_qops = { .queue_setup = vbi_cap_queue_setup, .buf_prepare = vbi_cap_buf_prepare, .buf_queue = vbi_cap_buf_queue, .start_streaming = vbi_cap_start_streaming, .stop_streaming = vbi_cap_stop_streaming, .buf_request_complete = vbi_cap_buf_request_complete, }; int vidioc_g_fmt_vbi_cap(struct file *file, void *priv, struct v4l2_format *f) { struct vivid_dev *dev = video_drvdata(file); struct v4l2_vbi_format *vbi = &f->fmt.vbi; if (!vivid_is_sdtv_cap(dev) || !dev->has_raw_vbi_cap) return -EINVAL; vivid_g_fmt_vbi_cap(dev, vbi); return 0; } int vidioc_s_fmt_vbi_cap(struct file *file, void *priv, struct v4l2_format *f) { struct vivid_dev *dev = video_drvdata(file); int ret = vidioc_g_fmt_vbi_cap(file, priv, f); if (ret) return ret; if (f->type != V4L2_BUF_TYPE_VBI_CAPTURE && vb2_is_busy(&dev->vb_vbi_cap_q)) return -EBUSY; return 0; } void vivid_fill_service_lines(struct v4l2_sliced_vbi_format *vbi, u32 service_set) { vbi->io_size = sizeof(struct v4l2_sliced_vbi_data) * 36; vbi->service_set = service_set; memset(vbi->service_lines, 0, sizeof(vbi->service_lines)); memset(vbi->reserved, 0, sizeof(vbi->reserved)); if (vbi->service_set == 0) return; if (vbi->service_set & V4L2_SLICED_CAPTION_525) { vbi->service_lines[0][21] = V4L2_SLICED_CAPTION_525; vbi->service_lines[1][21] = V4L2_SLICED_CAPTION_525; } if (vbi->service_set & V4L2_SLICED_WSS_625) { unsigned i; for (i = 7; i <= 18; i++) vbi->service_lines[0][i] = vbi->service_lines[1][i] = V4L2_SLICED_TELETEXT_B; vbi->service_lines[0][23] = V4L2_SLICED_WSS_625; } } int vidioc_g_fmt_sliced_vbi_cap(struct file *file, void *fh, struct v4l2_format *fmt) { struct vivid_dev *dev = video_drvdata(file); struct v4l2_sliced_vbi_format *vbi = &fmt->fmt.sliced; if (!vivid_is_sdtv_cap(dev) || !dev->has_sliced_vbi_cap) return -EINVAL; vivid_fill_service_lines(vbi, dev->service_set_cap); return 0; } int vidioc_try_fmt_sliced_vbi_cap(struct file *file, void *fh, struct v4l2_format *fmt) { struct vivid_dev *dev = video_drvdata(file); struct v4l2_sliced_vbi_format *vbi = &fmt->fmt.sliced; bool is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; u32 service_set = vbi->service_set; if (!vivid_is_sdtv_cap(dev) || !dev->has_sliced_vbi_cap) return -EINVAL; service_set &= is_60hz ? V4L2_SLICED_CAPTION_525 : V4L2_SLICED_WSS_625 | V4L2_SLICED_TELETEXT_B; vivid_fill_service_lines(vbi, service_set); return 0; } int vidioc_s_fmt_sliced_vbi_cap(struct file *file, void *fh, struct v4l2_format *fmt) { struct vivid_dev *dev = video_drvdata(file); struct v4l2_sliced_vbi_format *vbi = &fmt->fmt.sliced; int ret = vidioc_try_fmt_sliced_vbi_cap(file, fh, fmt); if (ret) return ret; if (fmt->type != V4L2_BUF_TYPE_SLICED_VBI_CAPTURE && vb2_is_busy(&dev->vb_vbi_cap_q)) return -EBUSY; dev->service_set_cap = vbi->service_set; return 0; } int vidioc_g_sliced_vbi_cap(struct file *file, void *fh, struct v4l2_sliced_vbi_cap *cap) { struct vivid_dev *dev = video_drvdata(file); struct video_device *vdev = video_devdata(file); bool is_60hz; if (vdev->vfl_dir == VFL_DIR_RX) { is_60hz = dev->std_cap[dev->input] & V4L2_STD_525_60; if (!vivid_is_sdtv_cap(dev) || !dev->has_sliced_vbi_cap || cap->type != V4L2_BUF_TYPE_SLICED_VBI_CAPTURE) return -EINVAL; } else { is_60hz = dev->std_out & V4L2_STD_525_60; if (!vivid_is_svid_out(dev) || !dev->has_sliced_vbi_out || cap->type != V4L2_BUF_TYPE_SLICED_VBI_OUTPUT) return -EINVAL; } cap->service_set = is_60hz ? V4L2_SLICED_CAPTION_525 : V4L2_SLICED_WSS_625 | V4L2_SLICED_TELETEXT_B; if (is_60hz) { cap->service_lines[0][21] = V4L2_SLICED_CAPTION_525; cap->service_lines[1][21] = V4L2_SLICED_CAPTION_525; } else { unsigned i; for (i = 7; i <= 18; i++) cap->service_lines[0][i] = cap->service_lines[1][i] = V4L2_SLICED_TELETEXT_B; cap->service_lines[0][23] = V4L2_SLICED_WSS_625; } return 0; } |
2 2 6 6 1 1 1 162 163 1 1 1 1 1 163 163 6 6 163 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | // SPDX-License-Identifier: GPL-2.0 /* * media.c - Media Controller specific ALSA driver code * * Copyright (c) 2019 Shuah Khan <shuah@kernel.org> * */ /* * This file adds Media Controller support to the ALSA driver * to use the Media Controller API to share the tuner with DVB * and V4L2 drivers that control the media device. * * The media device is created based on the existing quirks framework. * Using this approach, the media controller API usage can be added for * a specific device. */ #include <linux/init.h> #include <linux/list.h> #include <linux/mutex.h> #include <linux/slab.h> #include <linux/usb.h> #include <sound/pcm.h> #include <sound/core.h> #include "usbaudio.h" #include "card.h" #include "mixer.h" #include "media.h" int snd_media_stream_init(struct snd_usb_substream *subs, struct snd_pcm *pcm, int stream) { struct media_device *mdev; struct media_ctl *mctl; struct device *pcm_dev = pcm->streams[stream].dev; u32 intf_type; int ret = 0; u16 mixer_pad; struct media_entity *entity; mdev = subs->stream->chip->media_dev; if (!mdev) return 0; if (subs->media_ctl) return 0; /* allocate media_ctl */ mctl = kzalloc(sizeof(*mctl), GFP_KERNEL); if (!mctl) return -ENOMEM; mctl->media_dev = mdev; if (stream == SNDRV_PCM_STREAM_PLAYBACK) { intf_type = MEDIA_INTF_T_ALSA_PCM_PLAYBACK; mctl->media_entity.function = MEDIA_ENT_F_AUDIO_PLAYBACK; mctl->media_pad.flags = MEDIA_PAD_FL_SOURCE; mixer_pad = 1; } else { intf_type = MEDIA_INTF_T_ALSA_PCM_CAPTURE; mctl->media_entity.function = MEDIA_ENT_F_AUDIO_CAPTURE; mctl->media_pad.flags = MEDIA_PAD_FL_SINK; mixer_pad = 2; } mctl->media_entity.name = pcm->name; media_entity_pads_init(&mctl->media_entity, 1, &mctl->media_pad); ret = media_device_register_entity(mctl->media_dev, &mctl->media_entity); if (ret) goto free_mctl; mctl->intf_devnode = media_devnode_create(mdev, intf_type, 0, MAJOR(pcm_dev->devt), MINOR(pcm_dev->devt)); if (!mctl->intf_devnode) { ret = -ENOMEM; goto unregister_entity; } mctl->intf_link = media_create_intf_link(&mctl->media_entity, &mctl->intf_devnode->intf, MEDIA_LNK_FL_ENABLED); if (!mctl->intf_link) { ret = -ENOMEM; goto devnode_remove; } /* create link between mixer and audio */ media_device_for_each_entity(entity, mdev) { switch (entity->function) { case MEDIA_ENT_F_AUDIO_MIXER: ret = media_create_pad_link(entity, mixer_pad, &mctl->media_entity, 0, MEDIA_LNK_FL_ENABLED); if (ret) goto remove_intf_link; break; } } subs->media_ctl = mctl; return 0; remove_intf_link: media_remove_intf_link(mctl->intf_link); devnode_remove: media_devnode_remove(mctl->intf_devnode); unregister_entity: media_device_unregister_entity(&mctl->media_entity); free_mctl: kfree(mctl); return ret; } void snd_media_stream_delete(struct snd_usb_substream *subs) { struct media_ctl *mctl = subs->media_ctl; if (mctl) { struct media_device *mdev; mdev = mctl->media_dev; if (mdev && media_devnode_is_registered(mdev->devnode)) { media_devnode_remove(mctl->intf_devnode); media_device_unregister_entity(&mctl->media_entity); media_entity_cleanup(&mctl->media_entity); } kfree(mctl); subs->media_ctl = NULL; } } int snd_media_start_pipeline(struct snd_usb_substream *subs) { struct media_ctl *mctl = subs->media_ctl; int ret = 0; if (!mctl) return 0; mutex_lock(&mctl->media_dev->graph_mutex); if (mctl->media_dev->enable_source) ret = mctl->media_dev->enable_source(&mctl->media_entity, &mctl->media_pipe); mutex_unlock(&mctl->media_dev->graph_mutex); return ret; } void snd_media_stop_pipeline(struct snd_usb_substream *subs) { struct media_ctl *mctl = subs->media_ctl; if (!mctl) return; mutex_lock(&mctl->media_dev->graph_mutex); if (mctl->media_dev->disable_source) mctl->media_dev->disable_source(&mctl->media_entity); mutex_unlock(&mctl->media_dev->graph_mutex); } static int snd_media_mixer_init(struct snd_usb_audio *chip) { struct device *ctl_dev = chip->card->ctl_dev; struct media_intf_devnode *ctl_intf; struct usb_mixer_interface *mixer; struct media_device *mdev = chip->media_dev; struct media_mixer_ctl *mctl; u32 intf_type = MEDIA_INTF_T_ALSA_CONTROL; int ret; if (!mdev) return -ENODEV; ctl_intf = chip->ctl_intf_media_devnode; if (!ctl_intf) { ctl_intf = media_devnode_create(mdev, intf_type, 0, MAJOR(ctl_dev->devt), MINOR(ctl_dev->devt)); if (!ctl_intf) return -ENOMEM; chip->ctl_intf_media_devnode = ctl_intf; } list_for_each_entry(mixer, &chip->mixer_list, list) { if (mixer->media_mixer_ctl) continue; /* allocate media_mixer_ctl */ mctl = kzalloc(sizeof(*mctl), GFP_KERNEL); if (!mctl) return -ENOMEM; mctl->media_dev = mdev; mctl->media_entity.function = MEDIA_ENT_F_AUDIO_MIXER; mctl->media_entity.name = chip->card->mixername; mctl->media_pad[0].flags = MEDIA_PAD_FL_SINK; mctl->media_pad[1].flags = MEDIA_PAD_FL_SOURCE; mctl->media_pad[2].flags = MEDIA_PAD_FL_SOURCE; media_entity_pads_init(&mctl->media_entity, MEDIA_MIXER_PAD_MAX, mctl->media_pad); ret = media_device_register_entity(mctl->media_dev, &mctl->media_entity); if (ret) { kfree(mctl); return ret; } mctl->intf_link = media_create_intf_link(&mctl->media_entity, &ctl_intf->intf, MEDIA_LNK_FL_ENABLED); if (!mctl->intf_link) { media_device_unregister_entity(&mctl->media_entity); media_entity_cleanup(&mctl->media_entity); kfree(mctl); return -ENOMEM; } mctl->intf_devnode = ctl_intf; mixer->media_mixer_ctl = mctl; } return 0; } static void snd_media_mixer_delete(struct snd_usb_audio *chip) { struct usb_mixer_interface *mixer; struct media_device *mdev = chip->media_dev; if (!mdev) return; list_for_each_entry(mixer, &chip->mixer_list, list) { struct media_mixer_ctl *mctl; mctl = mixer->media_mixer_ctl; if (!mixer->media_mixer_ctl) continue; if (media_devnode_is_registered(mdev->devnode)) { media_device_unregister_entity(&mctl->media_entity); media_entity_cleanup(&mctl->media_entity); } kfree(mctl); mixer->media_mixer_ctl = NULL; } if (media_devnode_is_registered(mdev->devnode)) media_devnode_remove(chip->ctl_intf_media_devnode); chip->ctl_intf_media_devnode = NULL; } int snd_media_device_create(struct snd_usb_audio *chip, struct usb_interface *iface) { struct media_device *mdev; struct usb_device *usbdev = interface_to_usbdev(iface); int ret = 0; /* usb-audio driver is probed for each usb interface, and * there are multiple interfaces per device. Avoid calling * media_device_usb_allocate() each time usb_audio_probe() * is called. Do it only once. */ if (chip->media_dev) { mdev = chip->media_dev; goto snd_mixer_init; } mdev = media_device_usb_allocate(usbdev, KBUILD_MODNAME, THIS_MODULE); if (IS_ERR(mdev)) return -ENOMEM; /* save media device - avoid lookups */ chip->media_dev = mdev; snd_mixer_init: /* Create media entities for mixer and control dev */ ret = snd_media_mixer_init(chip); /* media_device might be registered, print error and continue */ if (ret) dev_err(&usbdev->dev, "Couldn't create media mixer entities. Error: %d\n", ret); if (!media_devnode_is_registered(mdev->devnode)) { /* don't register if snd_media_mixer_init() failed */ if (ret) goto create_fail; /* register media_device */ ret = media_device_register(mdev); create_fail: if (ret) { snd_media_mixer_delete(chip); media_device_delete(mdev, KBUILD_MODNAME, THIS_MODULE); /* clear saved media_dev */ chip->media_dev = NULL; dev_err(&usbdev->dev, "Couldn't register media device. Error: %d\n", ret); return ret; } } return ret; } void snd_media_device_delete(struct snd_usb_audio *chip) { struct media_device *mdev = chip->media_dev; struct snd_usb_stream *stream; /* release resources */ list_for_each_entry(stream, &chip->pcm_list, list) { snd_media_stream_delete(&stream->substream[0]); snd_media_stream_delete(&stream->substream[1]); } snd_media_mixer_delete(chip); if (mdev) { media_device_delete(mdev, KBUILD_MODNAME, THIS_MODULE); chip->media_dev = NULL; } } |
83 53 331 57 318 98 359 360 356 113 44 13 304 15 313 315 58 363 65 345 12 348 382 47 22 27 3 3 3 349 4 347 347 3 349 350 347 47 144 76 115 113 111 1 115 115 2 1 82 85 35 33 84 3 1 305 8 2 2 9 2 308 307 308 304 4 17 18 12 350 4 309 87 350 14 2 12 348 305 53 309 309 9 308 309 244 49 85 15 71 73 7 75 75 86 72 72 16 15 5 85 85 4 136 50 137 51 149 9 1 148 120 54 137 151 150 6 3 9 1 138 5 149 147 55 132 19 18 1 19 18 19 4 19 18 17 50 96 124 150 86 86 18 87 70 10 3 3 3 3 2 2 1 6 2 2 2 2 1 1 2 1 1 3 2 2 2 65 5 56 1 24 23 6 49 3 1 51 1 1 50 1 1 48 1 2 3 63 5 61 5 1 1 4 1 2 1 5 1 3 1 2 11 1 1 9 1 7 1 2 1 4 2 2 1 1 1 2 1 2 5 1 7 1 4 4 1 11 3 8 1 3 3 3 3 3 4 3 6 4 8 12 11 4 4 1 75 75 56 2 17 137 4 132 5 52 1 1 1 1 1 1 1 77 3 75 69 9 78 126 3 126 14 1 12 1 4 10 2 11 8 3 11 11 42 15 29 6 25 56 34 56 59 55 56 56 56 56 56 56 56 3 56 56 11 22 22 33 7 26 26 26 26 1 26 3 2 1 1 1 1 2 1 2 2 5 11 1 8 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 | /* FUSE: Filesystem in Userspace Copyright (C) 2001-2008 Miklos Szeredi <miklos@szeredi.hu> This program can be distributed under the terms of the GNU GPL. See the file COPYING. */ #include "fuse_i.h" #include <linux/init.h> #include <linux/module.h> #include <linux/poll.h> #include <linux/sched/signal.h> #include <linux/uio.h> #include <linux/miscdevice.h> #include <linux/pagemap.h> #include <linux/file.h> #include <linux/slab.h> #include <linux/pipe_fs_i.h> #include <linux/swap.h> #include <linux/splice.h> #include <linux/sched.h> #define CREATE_TRACE_POINTS #include "fuse_trace.h" MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); /* Ordinary requests have even IDs, while interrupts IDs are odd */ #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) static struct kmem_cache *fuse_req_cachep; static void end_requests(struct list_head *head); static struct fuse_dev *fuse_get_dev(struct file *file) { /* * Lockless access is OK, because file->private data is set * once during mount and is valid until the file is released. */ return READ_ONCE(file->private_data); } static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) { INIT_LIST_HEAD(&req->list); INIT_LIST_HEAD(&req->intr_entry); init_waitqueue_head(&req->waitq); refcount_set(&req->count, 1); __set_bit(FR_PENDING, &req->flags); req->fm = fm; } static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags) { struct fuse_req *req = kmem_cache_zalloc(fuse_req_cachep, flags); if (req) fuse_request_init(fm, req); return req; } static void fuse_request_free(struct fuse_req *req) { kmem_cache_free(fuse_req_cachep, req); } static void __fuse_get_request(struct fuse_req *req) { refcount_inc(&req->count); } /* Must be called with > 1 refcount */ static void __fuse_put_request(struct fuse_req *req) { refcount_dec(&req->count); } void fuse_set_initialized(struct fuse_conn *fc) { /* Make sure stores before this are seen on another CPU */ smp_wmb(); fc->initialized = 1; } static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) { return !fc->initialized || (for_background && fc->blocked); } static void fuse_drop_waiting(struct fuse_conn *fc) { /* * lockess check of fc->connected is okay, because atomic_dec_and_test() * provides a memory barrier matched with the one in fuse_wait_aborted() * to ensure no wake-up is missed. */ if (atomic_dec_and_test(&fc->num_waiting) && !READ_ONCE(fc->connected)) { /* wake up aborters */ wake_up_all(&fc->blocked_waitq); } } static void fuse_put_request(struct fuse_req *req); static struct fuse_req *fuse_get_req(struct mnt_idmap *idmap, struct fuse_mount *fm, bool for_background) { struct fuse_conn *fc = fm->fc; struct fuse_req *req; bool no_idmap = !fm->sb || (fm->sb->s_iflags & SB_I_NOIDMAP); kuid_t fsuid; kgid_t fsgid; int err; atomic_inc(&fc->num_waiting); if (fuse_block_alloc(fc, for_background)) { err = -EINTR; if (wait_event_killable_exclusive(fc->blocked_waitq, !fuse_block_alloc(fc, for_background))) goto out; } /* Matches smp_wmb() in fuse_set_initialized() */ smp_rmb(); err = -ENOTCONN; if (!fc->connected) goto out; err = -ECONNREFUSED; if (fc->conn_error) goto out; req = fuse_request_alloc(fm, GFP_KERNEL); err = -ENOMEM; if (!req) { if (for_background) wake_up(&fc->blocked_waitq); goto out; } req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); __set_bit(FR_WAITING, &req->flags); if (for_background) __set_bit(FR_BACKGROUND, &req->flags); /* * Keep the old behavior when idmappings support was not * declared by a FUSE server. * * For those FUSE servers who support idmapped mounts, * we send UID/GID only along with "inode creation" * fuse requests, otherwise idmap == &invalid_mnt_idmap and * req->in.h.{u,g}id will be equal to FUSE_INVALID_UIDGID. */ fsuid = no_idmap ? current_fsuid() : mapped_fsuid(idmap, fc->user_ns); fsgid = no_idmap ? current_fsgid() : mapped_fsgid(idmap, fc->user_ns); req->in.h.uid = from_kuid(fc->user_ns, fsuid); req->in.h.gid = from_kgid(fc->user_ns, fsgid); if (no_idmap && unlikely(req->in.h.uid == ((uid_t)-1) || req->in.h.gid == ((gid_t)-1))) { fuse_put_request(req); return ERR_PTR(-EOVERFLOW); } return req; out: fuse_drop_waiting(fc); return ERR_PTR(err); } static void fuse_put_request(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; if (refcount_dec_and_test(&req->count)) { if (test_bit(FR_BACKGROUND, &req->flags)) { /* * We get here in the unlikely case that a background * request was allocated but not sent */ spin_lock(&fc->bg_lock); if (!fc->blocked) wake_up(&fc->blocked_waitq); spin_unlock(&fc->bg_lock); } if (test_bit(FR_WAITING, &req->flags)) { __clear_bit(FR_WAITING, &req->flags); fuse_drop_waiting(fc); } fuse_request_free(req); } } unsigned int fuse_len_args(unsigned int numargs, struct fuse_arg *args) { unsigned nbytes = 0; unsigned i; for (i = 0; i < numargs; i++) nbytes += args[i].size; return nbytes; } EXPORT_SYMBOL_GPL(fuse_len_args); static u64 fuse_get_unique_locked(struct fuse_iqueue *fiq) { fiq->reqctr += FUSE_REQ_ID_STEP; return fiq->reqctr; } u64 fuse_get_unique(struct fuse_iqueue *fiq) { u64 ret; spin_lock(&fiq->lock); ret = fuse_get_unique_locked(fiq); spin_unlock(&fiq->lock); return ret; } EXPORT_SYMBOL_GPL(fuse_get_unique); static unsigned int fuse_req_hash(u64 unique) { return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS); } /* * A new request is available, wake fiq->waitq */ static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq) __releases(fiq->lock) { wake_up(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); spin_unlock(&fiq->lock); } static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget) { spin_lock(&fiq->lock); if (fiq->connected) { fiq->forget_list_tail->next = forget; fiq->forget_list_tail = forget; fuse_dev_wake_and_unlock(fiq); } else { kfree(forget); spin_unlock(&fiq->lock); } } static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) { spin_lock(&fiq->lock); if (list_empty(&req->intr_entry)) { list_add_tail(&req->intr_entry, &fiq->interrupts); /* * Pairs with smp_mb() implied by test_and_set_bit() * from fuse_request_end(). */ smp_mb(); if (test_bit(FR_FINISHED, &req->flags)) { list_del_init(&req->intr_entry); spin_unlock(&fiq->lock); } else { fuse_dev_wake_and_unlock(fiq); } } else { spin_unlock(&fiq->lock); } } static void fuse_dev_queue_req(struct fuse_iqueue *fiq, struct fuse_req *req) { spin_lock(&fiq->lock); if (fiq->connected) { if (req->in.h.opcode != FUSE_NOTIFY_REPLY) req->in.h.unique = fuse_get_unique_locked(fiq); list_add_tail(&req->list, &fiq->pending); fuse_dev_wake_and_unlock(fiq); } else { spin_unlock(&fiq->lock); req->out.h.error = -ENOTCONN; clear_bit(FR_PENDING, &req->flags); fuse_request_end(req); } } const struct fuse_iqueue_ops fuse_dev_fiq_ops = { .send_forget = fuse_dev_queue_forget, .send_interrupt = fuse_dev_queue_interrupt, .send_req = fuse_dev_queue_req, }; EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops); static void fuse_send_one(struct fuse_iqueue *fiq, struct fuse_req *req) { req->in.h.len = sizeof(struct fuse_in_header) + fuse_len_args(req->args->in_numargs, (struct fuse_arg *) req->args->in_args); trace_fuse_request_send(req); fiq->ops->send_req(fiq, req); } void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, u64 nodeid, u64 nlookup) { struct fuse_iqueue *fiq = &fc->iq; forget->forget_one.nodeid = nodeid; forget->forget_one.nlookup = nlookup; fiq->ops->send_forget(fiq, forget); } static void flush_bg_queue(struct fuse_conn *fc) { struct fuse_iqueue *fiq = &fc->iq; while (fc->active_background < fc->max_background && !list_empty(&fc->bg_queue)) { struct fuse_req *req; req = list_first_entry(&fc->bg_queue, struct fuse_req, list); list_del(&req->list); fc->active_background++; fuse_send_one(fiq, req); } } /* * This function is called when a request is finished. Either a reply * has arrived or it was aborted (and not yet sent) or some error * occurred during communication with userspace, or the device file * was closed. The requester thread is woken up (if still waiting), * the 'end' callback is called if given, else the reference to the * request is released */ void fuse_request_end(struct fuse_req *req) { struct fuse_mount *fm = req->fm; struct fuse_conn *fc = fm->fc; struct fuse_iqueue *fiq = &fc->iq; if (test_and_set_bit(FR_FINISHED, &req->flags)) goto put_request; trace_fuse_request_end(req); /* * test_and_set_bit() implies smp_mb() between bit * changing and below FR_INTERRUPTED check. Pairs with * smp_mb() from queue_interrupt(). */ if (test_bit(FR_INTERRUPTED, &req->flags)) { spin_lock(&fiq->lock); list_del_init(&req->intr_entry); spin_unlock(&fiq->lock); } WARN_ON(test_bit(FR_PENDING, &req->flags)); WARN_ON(test_bit(FR_SENT, &req->flags)); if (test_bit(FR_BACKGROUND, &req->flags)) { spin_lock(&fc->bg_lock); clear_bit(FR_BACKGROUND, &req->flags); if (fc->num_background == fc->max_background) { fc->blocked = 0; wake_up(&fc->blocked_waitq); } else if (!fc->blocked) { /* * Wake up next waiter, if any. It's okay to use * waitqueue_active(), as we've already synced up * fc->blocked with waiters with the wake_up() call * above. */ if (waitqueue_active(&fc->blocked_waitq)) wake_up(&fc->blocked_waitq); } fc->num_background--; fc->active_background--; flush_bg_queue(fc); spin_unlock(&fc->bg_lock); } else { /* Wake up waiter sleeping in request_wait_answer() */ wake_up(&req->waitq); } if (test_bit(FR_ASYNC, &req->flags)) req->args->end(fm, req->args, req->out.h.error); put_request: fuse_put_request(req); } EXPORT_SYMBOL_GPL(fuse_request_end); static int queue_interrupt(struct fuse_req *req) { struct fuse_iqueue *fiq = &req->fm->fc->iq; /* Check for we've sent request to interrupt this req */ if (unlikely(!test_bit(FR_INTERRUPTED, &req->flags))) return -EINVAL; fiq->ops->send_interrupt(fiq, req); return 0; } static void request_wait_answer(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; struct fuse_iqueue *fiq = &fc->iq; int err; if (!fc->no_interrupt) { /* Any signal may interrupt this */ err = wait_event_interruptible(req->waitq, test_bit(FR_FINISHED, &req->flags)); if (!err) return; set_bit(FR_INTERRUPTED, &req->flags); /* matches barrier in fuse_dev_do_read() */ smp_mb__after_atomic(); if (test_bit(FR_SENT, &req->flags)) queue_interrupt(req); } if (!test_bit(FR_FORCE, &req->flags)) { /* Only fatal signals may interrupt this */ err = wait_event_killable(req->waitq, test_bit(FR_FINISHED, &req->flags)); if (!err) return; spin_lock(&fiq->lock); /* Request is not yet in userspace, bail out */ if (test_bit(FR_PENDING, &req->flags)) { list_del(&req->list); spin_unlock(&fiq->lock); __fuse_put_request(req); req->out.h.error = -EINTR; return; } spin_unlock(&fiq->lock); } /* * Either request is already in userspace, or it was forced. * Wait it out. */ wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); } static void __fuse_request_send(struct fuse_req *req) { struct fuse_iqueue *fiq = &req->fm->fc->iq; BUG_ON(test_bit(FR_BACKGROUND, &req->flags)); /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); fuse_send_one(fiq, req); request_wait_answer(req); /* Pairs with smp_wmb() in fuse_request_end() */ smp_rmb(); } static void fuse_adjust_compat(struct fuse_conn *fc, struct fuse_args *args) { if (fc->minor < 4 && args->opcode == FUSE_STATFS) args->out_args[0].size = FUSE_COMPAT_STATFS_SIZE; if (fc->minor < 9) { switch (args->opcode) { case FUSE_LOOKUP: case FUSE_CREATE: case FUSE_MKNOD: case FUSE_MKDIR: case FUSE_SYMLINK: case FUSE_LINK: args->out_args[0].size = FUSE_COMPAT_ENTRY_OUT_SIZE; break; case FUSE_GETATTR: case FUSE_SETATTR: args->out_args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; break; } } if (fc->minor < 12) { switch (args->opcode) { case FUSE_CREATE: args->in_args[0].size = sizeof(struct fuse_open_in); break; case FUSE_MKNOD: args->in_args[0].size = FUSE_COMPAT_MKNOD_IN_SIZE; break; } } } static void fuse_force_creds(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; if (!req->fm->sb || req->fm->sb->s_iflags & SB_I_NOIDMAP) { req->in.h.uid = from_kuid_munged(fc->user_ns, current_fsuid()); req->in.h.gid = from_kgid_munged(fc->user_ns, current_fsgid()); } else { req->in.h.uid = FUSE_INVALID_UIDGID; req->in.h.gid = FUSE_INVALID_UIDGID; } req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); } static void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args) { req->in.h.opcode = args->opcode; req->in.h.nodeid = args->nodeid; req->args = args; if (args->is_ext) req->in.h.total_extlen = args->in_args[args->ext_idx].size / 8; if (args->end) __set_bit(FR_ASYNC, &req->flags); } ssize_t __fuse_simple_request(struct mnt_idmap *idmap, struct fuse_mount *fm, struct fuse_args *args) { struct fuse_conn *fc = fm->fc; struct fuse_req *req; ssize_t ret; if (args->force) { atomic_inc(&fc->num_waiting); req = fuse_request_alloc(fm, GFP_KERNEL | __GFP_NOFAIL); if (!args->nocreds) fuse_force_creds(req); __set_bit(FR_WAITING, &req->flags); __set_bit(FR_FORCE, &req->flags); } else { WARN_ON(args->nocreds); req = fuse_get_req(idmap, fm, false); if (IS_ERR(req)) return PTR_ERR(req); } /* Needs to be done after fuse_get_req() so that fc->minor is valid */ fuse_adjust_compat(fc, args); fuse_args_to_req(req, args); if (!args->noreply) __set_bit(FR_ISREPLY, &req->flags); __fuse_request_send(req); ret = req->out.h.error; if (!ret && args->out_argvar) { BUG_ON(args->out_numargs == 0); ret = args->out_args[args->out_numargs - 1].size; } fuse_put_request(req); return ret; } static bool fuse_request_queue_background(struct fuse_req *req) { struct fuse_mount *fm = req->fm; struct fuse_conn *fc = fm->fc; bool queued = false; WARN_ON(!test_bit(FR_BACKGROUND, &req->flags)); if (!test_bit(FR_WAITING, &req->flags)) { __set_bit(FR_WAITING, &req->flags); atomic_inc(&fc->num_waiting); } __set_bit(FR_ISREPLY, &req->flags); spin_lock(&fc->bg_lock); if (likely(fc->connected)) { fc->num_background++; if (fc->num_background == fc->max_background) fc->blocked = 1; list_add_tail(&req->list, &fc->bg_queue); flush_bg_queue(fc); queued = true; } spin_unlock(&fc->bg_lock); return queued; } int fuse_simple_background(struct fuse_mount *fm, struct fuse_args *args, gfp_t gfp_flags) { struct fuse_req *req; if (args->force) { WARN_ON(!args->nocreds); req = fuse_request_alloc(fm, gfp_flags); if (!req) return -ENOMEM; __set_bit(FR_BACKGROUND, &req->flags); } else { WARN_ON(args->nocreds); req = fuse_get_req(&invalid_mnt_idmap, fm, true); if (IS_ERR(req)) return PTR_ERR(req); } fuse_args_to_req(req, args); if (!fuse_request_queue_background(req)) { fuse_put_request(req); return -ENOTCONN; } return 0; } EXPORT_SYMBOL_GPL(fuse_simple_background); static int fuse_simple_notify_reply(struct fuse_mount *fm, struct fuse_args *args, u64 unique) { struct fuse_req *req; struct fuse_iqueue *fiq = &fm->fc->iq; req = fuse_get_req(&invalid_mnt_idmap, fm, false); if (IS_ERR(req)) return PTR_ERR(req); __clear_bit(FR_ISREPLY, &req->flags); req->in.h.unique = unique; fuse_args_to_req(req, args); fuse_send_one(fiq, req); return 0; } /* * Lock the request. Up to the next unlock_request() there mustn't be * anything that could cause a page-fault. If the request was already * aborted bail out. */ static int lock_request(struct fuse_req *req) { int err = 0; if (req) { spin_lock(&req->waitq.lock); if (test_bit(FR_ABORTED, &req->flags)) err = -ENOENT; else set_bit(FR_LOCKED, &req->flags); spin_unlock(&req->waitq.lock); } return err; } /* * Unlock request. If it was aborted while locked, caller is responsible * for unlocking and ending the request. */ static int unlock_request(struct fuse_req *req) { int err = 0; if (req) { spin_lock(&req->waitq.lock); if (test_bit(FR_ABORTED, &req->flags)) err = -ENOENT; else clear_bit(FR_LOCKED, &req->flags); spin_unlock(&req->waitq.lock); } return err; } struct fuse_copy_state { int write; struct fuse_req *req; struct iov_iter *iter; struct pipe_buffer *pipebufs; struct pipe_buffer *currbuf; struct pipe_inode_info *pipe; unsigned long nr_segs; struct page *pg; unsigned len; unsigned offset; unsigned move_pages:1; }; static void fuse_copy_init(struct fuse_copy_state *cs, int write, struct iov_iter *iter) { memset(cs, 0, sizeof(*cs)); cs->write = write; cs->iter = iter; } /* Unmap and put previous page of userspace buffer */ static void fuse_copy_finish(struct fuse_copy_state *cs) { if (cs->currbuf) { struct pipe_buffer *buf = cs->currbuf; if (cs->write) buf->len = PAGE_SIZE - cs->len; cs->currbuf = NULL; } else if (cs->pg) { if (cs->write) { flush_dcache_page(cs->pg); set_page_dirty_lock(cs->pg); } put_page(cs->pg); } cs->pg = NULL; } /* * Get another pagefull of userspace buffer, and map it to kernel * address space, and lock request */ static int fuse_copy_fill(struct fuse_copy_state *cs) { struct page *page; int err; err = unlock_request(cs->req); if (err) return err; fuse_copy_finish(cs); if (cs->pipebufs) { struct pipe_buffer *buf = cs->pipebufs; if (!cs->write) { err = pipe_buf_confirm(cs->pipe, buf); if (err) return err; BUG_ON(!cs->nr_segs); cs->currbuf = buf; cs->pg = buf->page; cs->offset = buf->offset; cs->len = buf->len; cs->pipebufs++; cs->nr_segs--; } else { if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; page = alloc_page(GFP_HIGHUSER); if (!page) return -ENOMEM; buf->page = page; buf->offset = 0; buf->len = 0; cs->currbuf = buf; cs->pg = page; cs->offset = 0; cs->len = PAGE_SIZE; cs->pipebufs++; cs->nr_segs++; } } else { size_t off; err = iov_iter_get_pages2(cs->iter, &page, PAGE_SIZE, 1, &off); if (err < 0) return err; BUG_ON(!err); cs->len = err; cs->offset = off; cs->pg = page; } return lock_request(cs->req); } /* Do as much copy to/from userspace buffer as we can */ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size) { unsigned ncpy = min(*size, cs->len); if (val) { void *pgaddr = kmap_local_page(cs->pg); void *buf = pgaddr + cs->offset; if (cs->write) memcpy(buf, *val, ncpy); else memcpy(*val, buf, ncpy); kunmap_local(pgaddr); *val += ncpy; } *size -= ncpy; cs->len -= ncpy; cs->offset += ncpy; return ncpy; } static int fuse_check_folio(struct folio *folio) { if (folio_mapped(folio) || folio->mapping != NULL || (folio->flags & PAGE_FLAGS_CHECK_AT_PREP & ~(1 << PG_locked | 1 << PG_referenced | 1 << PG_lru | 1 << PG_active | 1 << PG_workingset | 1 << PG_reclaim | 1 << PG_waiters | LRU_GEN_MASK | LRU_REFS_MASK))) { dump_page(&folio->page, "fuse: trying to steal weird page"); return 1; } return 0; } static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) { int err; struct folio *oldfolio = page_folio(*pagep); struct folio *newfolio; struct pipe_buffer *buf = cs->pipebufs; folio_get(oldfolio); err = unlock_request(cs->req); if (err) goto out_put_old; fuse_copy_finish(cs); err = pipe_buf_confirm(cs->pipe, buf); if (err) goto out_put_old; BUG_ON(!cs->nr_segs); cs->currbuf = buf; cs->len = buf->len; cs->pipebufs++; cs->nr_segs--; if (cs->len != PAGE_SIZE) goto out_fallback; if (!pipe_buf_try_steal(cs->pipe, buf)) goto out_fallback; newfolio = page_folio(buf->page); folio_clear_uptodate(newfolio); folio_clear_mappedtodisk(newfolio); if (fuse_check_folio(newfolio) != 0) goto out_fallback_unlock; /* * This is a new and locked page, it shouldn't be mapped or * have any special flags on it */ if (WARN_ON(folio_mapped(oldfolio))) goto out_fallback_unlock; if (WARN_ON(folio_has_private(oldfolio))) goto out_fallback_unlock; if (WARN_ON(folio_test_dirty(oldfolio) || folio_test_writeback(oldfolio))) goto out_fallback_unlock; if (WARN_ON(folio_test_mlocked(oldfolio))) goto out_fallback_unlock; replace_page_cache_folio(oldfolio, newfolio); folio_get(newfolio); if (!(buf->flags & PIPE_BUF_FLAG_LRU)) folio_add_lru(newfolio); /* * Release while we have extra ref on stolen page. Otherwise * anon_pipe_buf_release() might think the page can be reused. */ pipe_buf_release(cs->pipe, buf); err = 0; spin_lock(&cs->req->waitq.lock); if (test_bit(FR_ABORTED, &cs->req->flags)) err = -ENOENT; else *pagep = &newfolio->page; spin_unlock(&cs->req->waitq.lock); if (err) { folio_unlock(newfolio); folio_put(newfolio); goto out_put_old; } folio_unlock(oldfolio); /* Drop ref for ap->pages[] array */ folio_put(oldfolio); cs->len = 0; err = 0; out_put_old: /* Drop ref obtained in this function */ folio_put(oldfolio); return err; out_fallback_unlock: folio_unlock(newfolio); out_fallback: cs->pg = buf->page; cs->offset = buf->offset; err = lock_request(cs->req); if (!err) err = 1; goto out_put_old; } static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, unsigned offset, unsigned count) { struct pipe_buffer *buf; int err; if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; get_page(page); err = unlock_request(cs->req); if (err) { put_page(page); return err; } fuse_copy_finish(cs); buf = cs->pipebufs; buf->page = page; buf->offset = offset; buf->len = count; cs->pipebufs++; cs->nr_segs++; cs->len = 0; return 0; } /* * Copy a page in the request to/from the userspace buffer. Must be * done atomically */ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, unsigned offset, unsigned count, int zeroing) { int err; struct page *page = *pagep; if (page && zeroing && count < PAGE_SIZE) clear_highpage(page); while (count) { if (cs->write && cs->pipebufs && page) { /* * Can't control lifetime of pipe buffers, so always * copy user pages. */ if (cs->req->args->user_pages) { err = fuse_copy_fill(cs); if (err) return err; } else { return fuse_ref_page(cs, page, offset, count); } } else if (!cs->len) { if (cs->move_pages && page && offset == 0 && count == PAGE_SIZE) { err = fuse_try_move_page(cs, pagep); if (err <= 0) return err; } else { err = fuse_copy_fill(cs); if (err) return err; } } if (page) { void *mapaddr = kmap_local_page(page); void *buf = mapaddr + offset; offset += fuse_copy_do(cs, &buf, &count); kunmap_local(mapaddr); } else offset += fuse_copy_do(cs, NULL, &count); } if (page && !cs->write) flush_dcache_page(page); return 0; } /* Copy pages in the request to/from userspace buffer */ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, int zeroing) { unsigned i; struct fuse_req *req = cs->req; struct fuse_args_pages *ap = container_of(req->args, typeof(*ap), args); for (i = 0; i < ap->num_folios && (nbytes || zeroing); i++) { int err; unsigned int offset = ap->descs[i].offset; unsigned int count = min(nbytes, ap->descs[i].length); struct page *orig, *pagep; orig = pagep = &ap->folios[i]->page; err = fuse_copy_page(cs, &pagep, offset, count, zeroing); if (err) return err; nbytes -= count; /* * fuse_copy_page may have moved a page from a pipe instead of * copying into our given page, so update the folios if it was * replaced. */ if (pagep != orig) ap->folios[i] = page_folio(pagep); } return 0; } /* Copy a single argument in the request to/from userspace buffer */ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size) { while (size) { if (!cs->len) { int err = fuse_copy_fill(cs); if (err) return err; } fuse_copy_do(cs, &val, &size); } return 0; } /* Copy request arguments to/from userspace buffer */ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, unsigned argpages, struct fuse_arg *args, int zeroing) { int err = 0; unsigned i; for (i = 0; !err && i < numargs; i++) { struct fuse_arg *arg = &args[i]; if (i == numargs - 1 && argpages) err = fuse_copy_pages(cs, arg->size, zeroing); else err = fuse_copy_one(cs, arg->value, arg->size); } return err; } static int forget_pending(struct fuse_iqueue *fiq) { return fiq->forget_list_head.next != NULL; } static int request_pending(struct fuse_iqueue *fiq) { return !list_empty(&fiq->pending) || !list_empty(&fiq->interrupts) || forget_pending(fiq); } /* * Transfer an interrupt request to userspace * * Unlike other requests this is assembled on demand, without a need * to allocate a separate fuse_req structure. * * Called with fiq->lock held, releases it */ static int fuse_read_interrupt(struct fuse_iqueue *fiq, struct fuse_copy_state *cs, size_t nbytes, struct fuse_req *req) __releases(fiq->lock) { struct fuse_in_header ih; struct fuse_interrupt_in arg; unsigned reqsize = sizeof(ih) + sizeof(arg); int err; list_del_init(&req->intr_entry); memset(&ih, 0, sizeof(ih)); memset(&arg, 0, sizeof(arg)); ih.len = reqsize; ih.opcode = FUSE_INTERRUPT; ih.unique = (req->in.h.unique | FUSE_INT_REQ_BIT); arg.unique = req->in.h.unique; spin_unlock(&fiq->lock); if (nbytes < reqsize) return -EINVAL; err = fuse_copy_one(cs, &ih, sizeof(ih)); if (!err) err = fuse_copy_one(cs, &arg, sizeof(arg)); fuse_copy_finish(cs); return err ? err : reqsize; } static struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq, unsigned int max, unsigned int *countp) { struct fuse_forget_link *head = fiq->forget_list_head.next; struct fuse_forget_link **newhead = &head; unsigned count; for (count = 0; *newhead != NULL && count < max; count++) newhead = &(*newhead)->next; fiq->forget_list_head.next = *newhead; *newhead = NULL; if (fiq->forget_list_head.next == NULL) fiq->forget_list_tail = &fiq->forget_list_head; if (countp != NULL) *countp = count; return head; } static int fuse_read_single_forget(struct fuse_iqueue *fiq, struct fuse_copy_state *cs, size_t nbytes) __releases(fiq->lock) { int err; struct fuse_forget_link *forget = fuse_dequeue_forget(fiq, 1, NULL); struct fuse_forget_in arg = { .nlookup = forget->forget_one.nlookup, }; struct fuse_in_header ih = { .opcode = FUSE_FORGET, .nodeid = forget->forget_one.nodeid, .unique = fuse_get_unique_locked(fiq), .len = sizeof(ih) + sizeof(arg), }; spin_unlock(&fiq->lock); kfree(forget); if (nbytes < ih.len) return -EINVAL; err = fuse_copy_one(cs, &ih, sizeof(ih)); if (!err) err = fuse_copy_one(cs, &arg, sizeof(arg)); fuse_copy_finish(cs); if (err) return err; return ih.len; } static int fuse_read_batch_forget(struct fuse_iqueue *fiq, struct fuse_copy_state *cs, size_t nbytes) __releases(fiq->lock) { int err; unsigned max_forgets; unsigned count; struct fuse_forget_link *head; struct fuse_batch_forget_in arg = { .count = 0 }; struct fuse_in_header ih = { .opcode = FUSE_BATCH_FORGET, .unique = fuse_get_unique_locked(fiq), .len = sizeof(ih) + sizeof(arg), }; if (nbytes < ih.len) { spin_unlock(&fiq->lock); return -EINVAL; } max_forgets = (nbytes - ih.len) / sizeof(struct fuse_forget_one); head = fuse_dequeue_forget(fiq, max_forgets, &count); spin_unlock(&fiq->lock); arg.count = count; ih.len += count * sizeof(struct fuse_forget_one); err = fuse_copy_one(cs, &ih, sizeof(ih)); if (!err) err = fuse_copy_one(cs, &arg, sizeof(arg)); while (head) { struct fuse_forget_link *forget = head; if (!err) { err = fuse_copy_one(cs, &forget->forget_one, sizeof(forget->forget_one)); } head = forget->next; kfree(forget); } fuse_copy_finish(cs); if (err) return err; return ih.len; } static int fuse_read_forget(struct fuse_conn *fc, struct fuse_iqueue *fiq, struct fuse_copy_state *cs, size_t nbytes) __releases(fiq->lock) { if (fc->minor < 16 || fiq->forget_list_head.next->next == NULL) return fuse_read_single_forget(fiq, cs, nbytes); else return fuse_read_batch_forget(fiq, cs, nbytes); } /* * Read a single request into the userspace filesystem's buffer. This * function waits until a request is available, then removes it from * the pending list and copies request data to userspace buffer. If * no reply is needed (FORGET) or request has been aborted or there * was an error during the copying then it's finished by calling * fuse_request_end(). Otherwise add it to the processing list, and set * the 'sent' flag. */ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, struct fuse_copy_state *cs, size_t nbytes) { ssize_t err; struct fuse_conn *fc = fud->fc; struct fuse_iqueue *fiq = &fc->iq; struct fuse_pqueue *fpq = &fud->pq; struct fuse_req *req; struct fuse_args *args; unsigned reqsize; unsigned int hash; /* * Require sane minimum read buffer - that has capacity for fixed part * of any request header + negotiated max_write room for data. * * Historically libfuse reserves 4K for fixed header room, but e.g. * GlusterFS reserves only 80 bytes * * = `sizeof(fuse_in_header) + sizeof(fuse_write_in)` * * which is the absolute minimum any sane filesystem should be using * for header room. */ if (nbytes < max_t(size_t, FUSE_MIN_READ_BUFFER, sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) + fc->max_write)) return -EINVAL; restart: for (;;) { spin_lock(&fiq->lock); if (!fiq->connected || request_pending(fiq)) break; spin_unlock(&fiq->lock); if (file->f_flags & O_NONBLOCK) return -EAGAIN; err = wait_event_interruptible_exclusive(fiq->waitq, !fiq->connected || request_pending(fiq)); if (err) return err; } if (!fiq->connected) { err = fc->aborted ? -ECONNABORTED : -ENODEV; goto err_unlock; } if (!list_empty(&fiq->interrupts)) { req = list_entry(fiq->interrupts.next, struct fuse_req, intr_entry); return fuse_read_interrupt(fiq, cs, nbytes, req); } if (forget_pending(fiq)) { if (list_empty(&fiq->pending) || fiq->forget_batch-- > 0) return fuse_read_forget(fc, fiq, cs, nbytes); if (fiq->forget_batch <= -8) fiq->forget_batch = 16; } req = list_entry(fiq->pending.next, struct fuse_req, list); clear_bit(FR_PENDING, &req->flags); list_del_init(&req->list); spin_unlock(&fiq->lock); args = req->args; reqsize = req->in.h.len; /* If request is too large, reply with an error and restart the read */ if (nbytes < reqsize) { req->out.h.error = -EIO; /* SETXATTR is special, since it may contain too large data */ if (args->opcode == FUSE_SETXATTR) req->out.h.error = -E2BIG; fuse_request_end(req); goto restart; } spin_lock(&fpq->lock); /* * Must not put request on fpq->io queue after having been shut down by * fuse_abort_conn() */ if (!fpq->connected) { req->out.h.error = err = -ECONNABORTED; goto out_end; } list_add(&req->list, &fpq->io); spin_unlock(&fpq->lock); cs->req = req; err = fuse_copy_one(cs, &req->in.h, sizeof(req->in.h)); if (!err) err = fuse_copy_args(cs, args->in_numargs, args->in_pages, (struct fuse_arg *) args->in_args, 0); fuse_copy_finish(cs); spin_lock(&fpq->lock); clear_bit(FR_LOCKED, &req->flags); if (!fpq->connected) { err = fc->aborted ? -ECONNABORTED : -ENODEV; goto out_end; } if (err) { req->out.h.error = -EIO; goto out_end; } if (!test_bit(FR_ISREPLY, &req->flags)) { err = reqsize; goto out_end; } hash = fuse_req_hash(req->in.h.unique); list_move_tail(&req->list, &fpq->processing[hash]); __fuse_get_request(req); set_bit(FR_SENT, &req->flags); spin_unlock(&fpq->lock); /* matches barrier in request_wait_answer() */ smp_mb__after_atomic(); if (test_bit(FR_INTERRUPTED, &req->flags)) queue_interrupt(req); fuse_put_request(req); return reqsize; out_end: if (!test_bit(FR_PRIVATE, &req->flags)) list_del_init(&req->list); spin_unlock(&fpq->lock); fuse_request_end(req); return err; err_unlock: spin_unlock(&fiq->lock); return err; } static int fuse_dev_open(struct inode *inode, struct file *file) { /* * The fuse device's file's private_data is used to hold * the fuse_conn(ection) when it is mounted, and is used to * keep track of whether the file has been mounted already. */ file->private_data = NULL; return 0; } static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to) { struct fuse_copy_state cs; struct file *file = iocb->ki_filp; struct fuse_dev *fud = fuse_get_dev(file); if (!fud) return -EPERM; if (!user_backed_iter(to)) return -EINVAL; fuse_copy_init(&cs, 1, to); return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to)); } static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { int total, ret; int page_nr = 0; struct pipe_buffer *bufs; struct fuse_copy_state cs; struct fuse_dev *fud = fuse_get_dev(in); if (!fud) return -EPERM; bufs = kvmalloc_array(pipe->max_usage, sizeof(struct pipe_buffer), GFP_KERNEL); if (!bufs) return -ENOMEM; fuse_copy_init(&cs, 1, NULL); cs.pipebufs = bufs; cs.pipe = pipe; ret = fuse_dev_do_read(fud, in, &cs, len); if (ret < 0) goto out; if (pipe_occupancy(pipe->head, pipe->tail) + cs.nr_segs > pipe->max_usage) { ret = -EIO; goto out; } for (ret = total = 0; page_nr < cs.nr_segs; total += ret) { /* * Need to be careful about this. Having buf->ops in module * code can Oops if the buffer persists after module unload. */ bufs[page_nr].ops = &nosteal_pipe_buf_ops; bufs[page_nr].flags = 0; ret = add_to_pipe(pipe, &bufs[page_nr++]); if (unlikely(ret < 0)) break; } if (total) ret = total; out: for (; page_nr < cs.nr_segs; page_nr++) put_page(bufs[page_nr].page); kvfree(bufs); return ret; } static int fuse_notify_poll(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_poll_wakeup_out outarg; int err = -EINVAL; if (size != sizeof(outarg)) goto err; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto err; fuse_copy_finish(cs); return fuse_notify_poll_wakeup(fc, &outarg); err: fuse_copy_finish(cs); return err; } static int fuse_notify_inval_inode(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_inval_inode_out outarg; int err = -EINVAL; if (size != sizeof(outarg)) goto err; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto err; fuse_copy_finish(cs); down_read(&fc->killsb); err = fuse_reverse_inval_inode(fc, outarg.ino, outarg.off, outarg.len); up_read(&fc->killsb); return err; err: fuse_copy_finish(cs); return err; } static int fuse_notify_inval_entry(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_inval_entry_out outarg; int err = -ENOMEM; char *buf; struct qstr name; buf = kzalloc(FUSE_NAME_MAX + 1, GFP_KERNEL); if (!buf) goto err; err = -EINVAL; if (size < sizeof(outarg)) goto err; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto err; err = -ENAMETOOLONG; if (outarg.namelen > FUSE_NAME_MAX) goto err; err = -EINVAL; if (size != sizeof(outarg) + outarg.namelen + 1) goto err; name.name = buf; name.len = outarg.namelen; err = fuse_copy_one(cs, buf, outarg.namelen + 1); if (err) goto err; fuse_copy_finish(cs); buf[outarg.namelen] = 0; down_read(&fc->killsb); err = fuse_reverse_inval_entry(fc, outarg.parent, 0, &name, outarg.flags); up_read(&fc->killsb); kfree(buf); return err; err: kfree(buf); fuse_copy_finish(cs); return err; } static int fuse_notify_delete(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_delete_out outarg; int err = -ENOMEM; char *buf; struct qstr name; buf = kzalloc(FUSE_NAME_MAX + 1, GFP_KERNEL); if (!buf) goto err; err = -EINVAL; if (size < sizeof(outarg)) goto err; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto err; err = -ENAMETOOLONG; if (outarg.namelen > FUSE_NAME_MAX) goto err; err = -EINVAL; if (size != sizeof(outarg) + outarg.namelen + 1) goto err; name.name = buf; name.len = outarg.namelen; err = fuse_copy_one(cs, buf, outarg.namelen + 1); if (err) goto err; fuse_copy_finish(cs); buf[outarg.namelen] = 0; down_read(&fc->killsb); err = fuse_reverse_inval_entry(fc, outarg.parent, outarg.child, &name, 0); up_read(&fc->killsb); kfree(buf); return err; err: kfree(buf); fuse_copy_finish(cs); return err; } static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_store_out outarg; struct inode *inode; struct address_space *mapping; u64 nodeid; int err; pgoff_t index; unsigned int offset; unsigned int num; loff_t file_size; loff_t end; err = -EINVAL; if (size < sizeof(outarg)) goto out_finish; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto out_finish; err = -EINVAL; if (size - sizeof(outarg) != outarg.size) goto out_finish; nodeid = outarg.nodeid; down_read(&fc->killsb); err = -ENOENT; inode = fuse_ilookup(fc, nodeid, NULL); if (!inode) goto out_up_killsb; mapping = inode->i_mapping; index = outarg.offset >> PAGE_SHIFT; offset = outarg.offset & ~PAGE_MASK; file_size = i_size_read(inode); end = outarg.offset + outarg.size; if (end > file_size) { file_size = end; fuse_write_update_attr(inode, file_size, outarg.size); } num = outarg.size; while (num) { struct folio *folio; struct page *page; unsigned int this_num; folio = filemap_grab_folio(mapping, index); err = PTR_ERR(folio); if (IS_ERR(folio)) goto out_iput; page = &folio->page; this_num = min_t(unsigned, num, folio_size(folio) - offset); err = fuse_copy_page(cs, &page, offset, this_num, 0); if (!folio_test_uptodate(folio) && !err && offset == 0 && (this_num == folio_size(folio) || file_size == end)) { folio_zero_segment(folio, this_num, folio_size(folio)); folio_mark_uptodate(folio); } folio_unlock(folio); folio_put(folio); if (err) goto out_iput; num -= this_num; offset = 0; index++; } err = 0; out_iput: iput(inode); out_up_killsb: up_read(&fc->killsb); out_finish: fuse_copy_finish(cs); return err; } struct fuse_retrieve_args { struct fuse_args_pages ap; struct fuse_notify_retrieve_in inarg; }; static void fuse_retrieve_end(struct fuse_mount *fm, struct fuse_args *args, int error) { struct fuse_retrieve_args *ra = container_of(args, typeof(*ra), ap.args); release_pages(ra->ap.folios, ra->ap.num_folios); kfree(ra); } static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, struct fuse_notify_retrieve_out *outarg) { int err; struct address_space *mapping = inode->i_mapping; pgoff_t index; loff_t file_size; unsigned int num; unsigned int offset; size_t total_len = 0; unsigned int num_pages, cur_pages = 0; struct fuse_conn *fc = fm->fc; struct fuse_retrieve_args *ra; size_t args_size = sizeof(*ra); struct fuse_args_pages *ap; struct fuse_args *args; offset = outarg->offset & ~PAGE_MASK; file_size = i_size_read(inode); num = min(outarg->size, fc->max_write); if (outarg->offset > file_size) num = 0; else if (outarg->offset + num > file_size) num = file_size - outarg->offset; num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT; num_pages = min(num_pages, fc->max_pages); args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0])); ra = kzalloc(args_size, GFP_KERNEL); if (!ra) return -ENOMEM; ap = &ra->ap; ap->folios = (void *) (ra + 1); ap->descs = (void *) (ap->folios + num_pages); args = &ap->args; args->nodeid = outarg->nodeid; args->opcode = FUSE_NOTIFY_REPLY; args->in_numargs = 2; args->in_pages = true; args->end = fuse_retrieve_end; index = outarg->offset >> PAGE_SHIFT; while (num && cur_pages < num_pages) { struct folio *folio; unsigned int this_num; folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) break; this_num = min_t(unsigned, num, PAGE_SIZE - offset); ap->folios[ap->num_folios] = folio; ap->descs[ap->num_folios].offset = offset; ap->descs[ap->num_folios].length = this_num; ap->num_folios++; cur_pages++; offset = 0; num -= this_num; total_len += this_num; index++; } ra->inarg.offset = outarg->offset; ra->inarg.size = total_len; args->in_args[0].size = sizeof(ra->inarg); args->in_args[0].value = &ra->inarg; args->in_args[1].size = total_len; err = fuse_simple_notify_reply(fm, args, outarg->notify_unique); if (err) fuse_retrieve_end(fm, args, err); return err; } static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_retrieve_out outarg; struct fuse_mount *fm; struct inode *inode; u64 nodeid; int err; err = -EINVAL; if (size != sizeof(outarg)) goto copy_finish; err = fuse_copy_one(cs, &outarg, sizeof(outarg)); if (err) goto copy_finish; fuse_copy_finish(cs); down_read(&fc->killsb); err = -ENOENT; nodeid = outarg.nodeid; inode = fuse_ilookup(fc, nodeid, &fm); if (inode) { err = fuse_retrieve(fm, inode, &outarg); iput(inode); } up_read(&fc->killsb); return err; copy_finish: fuse_copy_finish(cs); return err; } /* * Resending all processing queue requests. * * During a FUSE daemon panics and failover, it is possible for some inflight * requests to be lost and never returned. As a result, applications awaiting * replies would become stuck forever. To address this, we can use notification * to trigger resending of these pending requests to the FUSE daemon, ensuring * they are properly processed again. * * Please note that this strategy is applicable only to idempotent requests or * if the FUSE daemon takes careful measures to avoid processing duplicated * non-idempotent requests. */ static void fuse_resend(struct fuse_conn *fc) { struct fuse_dev *fud; struct fuse_req *req, *next; struct fuse_iqueue *fiq = &fc->iq; LIST_HEAD(to_queue); unsigned int i; spin_lock(&fc->lock); if (!fc->connected) { spin_unlock(&fc->lock); return; } list_for_each_entry(fud, &fc->devices, entry) { struct fuse_pqueue *fpq = &fud->pq; spin_lock(&fpq->lock); for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) list_splice_tail_init(&fpq->processing[i], &to_queue); spin_unlock(&fpq->lock); } spin_unlock(&fc->lock); list_for_each_entry_safe(req, next, &to_queue, list) { set_bit(FR_PENDING, &req->flags); clear_bit(FR_SENT, &req->flags); /* mark the request as resend request */ req->in.h.unique |= FUSE_UNIQUE_RESEND; } spin_lock(&fiq->lock); if (!fiq->connected) { spin_unlock(&fiq->lock); list_for_each_entry(req, &to_queue, list) clear_bit(FR_PENDING, &req->flags); end_requests(&to_queue); return; } /* iq and pq requests are both oldest to newest */ list_splice(&to_queue, &fiq->pending); fuse_dev_wake_and_unlock(fiq); } static int fuse_notify_resend(struct fuse_conn *fc) { fuse_resend(fc); return 0; } static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code, unsigned int size, struct fuse_copy_state *cs) { /* Don't try to move pages (yet) */ cs->move_pages = 0; switch (code) { case FUSE_NOTIFY_POLL: return fuse_notify_poll(fc, size, cs); case FUSE_NOTIFY_INVAL_INODE: return fuse_notify_inval_inode(fc, size, cs); case FUSE_NOTIFY_INVAL_ENTRY: return fuse_notify_inval_entry(fc, size, cs); case FUSE_NOTIFY_STORE: return fuse_notify_store(fc, size, cs); case FUSE_NOTIFY_RETRIEVE: return fuse_notify_retrieve(fc, size, cs); case FUSE_NOTIFY_DELETE: return fuse_notify_delete(fc, size, cs); case FUSE_NOTIFY_RESEND: return fuse_notify_resend(fc); default: fuse_copy_finish(cs); return -EINVAL; } } /* Look up request on processing list by unique ID */ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique) { unsigned int hash = fuse_req_hash(unique); struct fuse_req *req; list_for_each_entry(req, &fpq->processing[hash], list) { if (req->in.h.unique == unique) return req; } return NULL; } static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, unsigned nbytes) { unsigned reqsize = sizeof(struct fuse_out_header); reqsize += fuse_len_args(args->out_numargs, args->out_args); if (reqsize < nbytes || (reqsize > nbytes && !args->out_argvar)) return -EINVAL; else if (reqsize > nbytes) { struct fuse_arg *lastarg = &args->out_args[args->out_numargs-1]; unsigned diffsize = reqsize - nbytes; if (diffsize > lastarg->size) return -EINVAL; lastarg->size -= diffsize; } return fuse_copy_args(cs, args->out_numargs, args->out_pages, args->out_args, args->page_zeroing); } /* * Write a single reply to a request. First the header is copied from * the write buffer. The request is then searched on the processing * list by the unique ID found in the header. If found, then remove * it from the list and copy the rest of the buffer to the request. * The request is finished by calling fuse_request_end(). */ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, struct fuse_copy_state *cs, size_t nbytes) { int err; struct fuse_conn *fc = fud->fc; struct fuse_pqueue *fpq = &fud->pq; struct fuse_req *req; struct fuse_out_header oh; err = -EINVAL; if (nbytes < sizeof(struct fuse_out_header)) goto out; err = fuse_copy_one(cs, &oh, sizeof(oh)); if (err) goto copy_finish; err = -EINVAL; if (oh.len != nbytes) goto copy_finish; /* * Zero oh.unique indicates unsolicited notification message * and error contains notification code. */ if (!oh.unique) { err = fuse_notify(fc, oh.error, nbytes - sizeof(oh), cs); goto out; } err = -EINVAL; if (oh.error <= -512 || oh.error > 0) goto copy_finish; spin_lock(&fpq->lock); req = NULL; if (fpq->connected) req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT); err = -ENOENT; if (!req) { spin_unlock(&fpq->lock); goto copy_finish; } /* Is it an interrupt reply ID? */ if (oh.unique & FUSE_INT_REQ_BIT) { __fuse_get_request(req); spin_unlock(&fpq->lock); err = 0; if (nbytes != sizeof(struct fuse_out_header)) err = -EINVAL; else if (oh.error == -ENOSYS) fc->no_interrupt = 1; else if (oh.error == -EAGAIN) err = queue_interrupt(req); fuse_put_request(req); goto copy_finish; } clear_bit(FR_SENT, &req->flags); list_move(&req->list, &fpq->io); req->out.h = oh; set_bit(FR_LOCKED, &req->flags); spin_unlock(&fpq->lock); cs->req = req; if (!req->args->page_replace) cs->move_pages = 0; if (oh.error) err = nbytes != sizeof(oh) ? -EINVAL : 0; else err = copy_out_args(cs, req->args, nbytes); fuse_copy_finish(cs); spin_lock(&fpq->lock); clear_bit(FR_LOCKED, &req->flags); if (!fpq->connected) err = -ENOENT; else if (err) req->out.h.error = -EIO; if (!test_bit(FR_PRIVATE, &req->flags)) list_del_init(&req->list); spin_unlock(&fpq->lock); fuse_request_end(req); out: return err ? err : nbytes; copy_finish: fuse_copy_finish(cs); goto out; } static ssize_t fuse_dev_write(struct kiocb *iocb, struct iov_iter *from) { struct fuse_copy_state cs; struct fuse_dev *fud = fuse_get_dev(iocb->ki_filp); if (!fud) return -EPERM; if (!user_backed_iter(from)) return -EINVAL; fuse_copy_init(&cs, 0, from); return fuse_dev_do_write(fud, &cs, iov_iter_count(from)); } static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { unsigned int head, tail, mask, count; unsigned nbuf; unsigned idx; struct pipe_buffer *bufs; struct fuse_copy_state cs; struct fuse_dev *fud; size_t rem; ssize_t ret; fud = fuse_get_dev(out); if (!fud) return -EPERM; pipe_lock(pipe); head = pipe->head; tail = pipe->tail; mask = pipe->ring_size - 1; count = head - tail; bufs = kvmalloc_array(count, sizeof(struct pipe_buffer), GFP_KERNEL); if (!bufs) { pipe_unlock(pipe); return -ENOMEM; } nbuf = 0; rem = 0; for (idx = tail; idx != head && rem < len; idx++) rem += pipe->bufs[idx & mask].len; ret = -EINVAL; if (rem < len) goto out_free; rem = len; while (rem) { struct pipe_buffer *ibuf; struct pipe_buffer *obuf; if (WARN_ON(nbuf >= count || tail == head)) goto out_free; ibuf = &pipe->bufs[tail & mask]; obuf = &bufs[nbuf]; if (rem >= ibuf->len) { *obuf = *ibuf; ibuf->ops = NULL; tail++; pipe->tail = tail; } else { if (!pipe_buf_get(pipe, ibuf)) goto out_free; *obuf = *ibuf; obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; ibuf->offset += obuf->len; ibuf->len -= obuf->len; } nbuf++; rem -= obuf->len; } pipe_unlock(pipe); fuse_copy_init(&cs, 0, NULL); cs.pipebufs = bufs; cs.nr_segs = nbuf; cs.pipe = pipe; if (flags & SPLICE_F_MOVE) cs.move_pages = 1; ret = fuse_dev_do_write(fud, &cs, len); pipe_lock(pipe); out_free: for (idx = 0; idx < nbuf; idx++) { struct pipe_buffer *buf = &bufs[idx]; if (buf->ops) pipe_buf_release(pipe, buf); } pipe_unlock(pipe); kvfree(bufs); return ret; } static __poll_t fuse_dev_poll(struct file *file, poll_table *wait) { __poll_t mask = EPOLLOUT | EPOLLWRNORM; struct fuse_iqueue *fiq; struct fuse_dev *fud = fuse_get_dev(file); if (!fud) return EPOLLERR; fiq = &fud->fc->iq; poll_wait(file, &fiq->waitq, wait); spin_lock(&fiq->lock); if (!fiq->connected) mask = EPOLLERR; else if (request_pending(fiq)) mask |= EPOLLIN | EPOLLRDNORM; spin_unlock(&fiq->lock); return mask; } /* Abort all requests on the given list (pending or processing) */ static void end_requests(struct list_head *head) { while (!list_empty(head)) { struct fuse_req *req; req = list_entry(head->next, struct fuse_req, list); req->out.h.error = -ECONNABORTED; clear_bit(FR_SENT, &req->flags); list_del_init(&req->list); fuse_request_end(req); } } static void end_polls(struct fuse_conn *fc) { struct rb_node *p; p = rb_first(&fc->polled_files); while (p) { struct fuse_file *ff; ff = rb_entry(p, struct fuse_file, polled_node); wake_up_interruptible_all(&ff->poll_wait); p = rb_next(p); } } /* * Abort all requests. * * Emergency exit in case of a malicious or accidental deadlock, or just a hung * filesystem. * * The same effect is usually achievable through killing the filesystem daemon * and all users of the filesystem. The exception is the combination of an * asynchronous request and the tricky deadlock (see * Documentation/filesystems/fuse.rst). * * Aborting requests under I/O goes as follows: 1: Separate out unlocked * requests, they should be finished off immediately. Locked requests will be * finished after unlock; see unlock_request(). 2: Finish off the unlocked * requests. It is possible that some request will finish before we can. This * is OK, the request will in that case be removed from the list before we touch * it. */ void fuse_abort_conn(struct fuse_conn *fc) { struct fuse_iqueue *fiq = &fc->iq; spin_lock(&fc->lock); if (fc->connected) { struct fuse_dev *fud; struct fuse_req *req, *next; LIST_HEAD(to_end); unsigned int i; /* Background queuing checks fc->connected under bg_lock */ spin_lock(&fc->bg_lock); fc->connected = 0; spin_unlock(&fc->bg_lock); fuse_set_initialized(fc); list_for_each_entry(fud, &fc->devices, entry) { struct fuse_pqueue *fpq = &fud->pq; spin_lock(&fpq->lock); fpq->connected = 0; list_for_each_entry_safe(req, next, &fpq->io, list) { req->out.h.error = -ECONNABORTED; spin_lock(&req->waitq.lock); set_bit(FR_ABORTED, &req->flags); if (!test_bit(FR_LOCKED, &req->flags)) { set_bit(FR_PRIVATE, &req->flags); __fuse_get_request(req); list_move(&req->list, &to_end); } spin_unlock(&req->waitq.lock); } for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) list_splice_tail_init(&fpq->processing[i], &to_end); spin_unlock(&fpq->lock); } spin_lock(&fc->bg_lock); fc->blocked = 0; fc->max_background = UINT_MAX; flush_bg_queue(fc); spin_unlock(&fc->bg_lock); spin_lock(&fiq->lock); fiq->connected = 0; list_for_each_entry(req, &fiq->pending, list) clear_bit(FR_PENDING, &req->flags); list_splice_tail_init(&fiq->pending, &to_end); while (forget_pending(fiq)) kfree(fuse_dequeue_forget(fiq, 1, NULL)); wake_up_all(&fiq->waitq); spin_unlock(&fiq->lock); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); end_polls(fc); wake_up_all(&fc->blocked_waitq); spin_unlock(&fc->lock); end_requests(&to_end); } else { spin_unlock(&fc->lock); } } EXPORT_SYMBOL_GPL(fuse_abort_conn); void fuse_wait_aborted(struct fuse_conn *fc) { /* matches implicit memory barrier in fuse_drop_waiting() */ smp_mb(); wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0); } int fuse_dev_release(struct inode *inode, struct file *file) { struct fuse_dev *fud = fuse_get_dev(file); if (fud) { struct fuse_conn *fc = fud->fc; struct fuse_pqueue *fpq = &fud->pq; LIST_HEAD(to_end); unsigned int i; spin_lock(&fpq->lock); WARN_ON(!list_empty(&fpq->io)); for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) list_splice_init(&fpq->processing[i], &to_end); spin_unlock(&fpq->lock); end_requests(&to_end); /* Are we the last open device? */ if (atomic_dec_and_test(&fc->dev_count)) { WARN_ON(fc->iq.fasync != NULL); fuse_abort_conn(fc); } fuse_dev_free(fud); } return 0; } EXPORT_SYMBOL_GPL(fuse_dev_release); static int fuse_dev_fasync(int fd, struct file *file, int on) { struct fuse_dev *fud = fuse_get_dev(file); if (!fud) return -EPERM; /* No locking - fasync_helper does its own locking */ return fasync_helper(fd, file, on, &fud->fc->iq.fasync); } static int fuse_device_clone(struct fuse_conn *fc, struct file *new) { struct fuse_dev *fud; if (new->private_data) return -EINVAL; fud = fuse_dev_alloc_install(fc); if (!fud) return -ENOMEM; new->private_data = fud; atomic_inc(&fc->dev_count); return 0; } static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp) { int res; int oldfd; struct fuse_dev *fud = NULL; if (get_user(oldfd, argp)) return -EFAULT; CLASS(fd, f)(oldfd); if (fd_empty(f)) return -EINVAL; /* * Check against file->f_op because CUSE * uses the same ioctl handler. */ if (fd_file(f)->f_op == file->f_op) fud = fuse_get_dev(fd_file(f)); res = -EINVAL; if (fud) { mutex_lock(&fuse_mutex); res = fuse_device_clone(fud->fc, file); mutex_unlock(&fuse_mutex); } return res; } static long fuse_dev_ioctl_backing_open(struct file *file, struct fuse_backing_map __user *argp) { struct fuse_dev *fud = fuse_get_dev(file); struct fuse_backing_map map; if (!fud) return -EPERM; if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) return -EOPNOTSUPP; if (copy_from_user(&map, argp, sizeof(map))) return -EFAULT; return fuse_backing_open(fud->fc, &map); } static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) { struct fuse_dev *fud = fuse_get_dev(file); int backing_id; if (!fud) return -EPERM; if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) return -EOPNOTSUPP; if (get_user(backing_id, argp)) return -EFAULT; return fuse_backing_close(fud->fc, backing_id); } static long fuse_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; switch (cmd) { case FUSE_DEV_IOC_CLONE: return fuse_dev_ioctl_clone(file, argp); case FUSE_DEV_IOC_BACKING_OPEN: return fuse_dev_ioctl_backing_open(file, argp); case FUSE_DEV_IOC_BACKING_CLOSE: return fuse_dev_ioctl_backing_close(file, argp); default: return -ENOTTY; } } const struct file_operations fuse_dev_operations = { .owner = THIS_MODULE, .open = fuse_dev_open, .read_iter = fuse_dev_read, .splice_read = fuse_dev_splice_read, .write_iter = fuse_dev_write, .splice_write = fuse_dev_splice_write, .poll = fuse_dev_poll, .release = fuse_dev_release, .fasync = fuse_dev_fasync, .unlocked_ioctl = fuse_dev_ioctl, .compat_ioctl = compat_ptr_ioctl, }; EXPORT_SYMBOL_GPL(fuse_dev_operations); static struct miscdevice fuse_miscdevice = { .minor = FUSE_MINOR, .name = "fuse", .fops = &fuse_dev_operations, }; int __init fuse_dev_init(void) { int err = -ENOMEM; fuse_req_cachep = kmem_cache_create("fuse_request", sizeof(struct fuse_req), 0, 0, NULL); if (!fuse_req_cachep) goto out; err = misc_register(&fuse_miscdevice); if (err) goto out_cache_clean; return 0; out_cache_clean: kmem_cache_destroy(fuse_req_cachep); out: return err; } void fuse_dev_cleanup(void) { misc_deregister(&fuse_miscdevice); kmem_cache_destroy(fuse_req_cachep); } |
28 28 256 23 28 462 490 370 6 305 269 268 84 84 83 82 18 2 83 21 17 17 84 84 83 83 84 82 6 18 2 10 7 17 48 68 16 66 17 84 83 82 82 84 84 66 378 378 70 310 380 259 286 365 321 41 5 26 10 1 3 1 4 367 517 4 513 20 20 35 4 4 2 1 1 2 1 20 20 2 2 21 2 20 526 475 12 4 34 352 341 3 316 297 2 1 1 1 1 10 1 10 222 9 214 323 2 1 3 1 313 311 4 313 219 23 29 19 26 141 175 23 292 7 295 14 1 138 88 1 217 224 5 219 233 3 160 150 1 1 3 1 1 1 1 124 120 3 1 175 2 1 1 79 16 76 1 155 9 1 4 1 1 4 135 15 1 2 18 81 72 2 4 118 84 43 131 2 1 1 29 4 2 1 22 1 1 10 2 3 15 18 4 4 1 1 1 1 1 2 3 5 5 1 2 2 43 3 2 2 1 33 1 1 34 1 3 1 3 2 10 3 2 1 1 4 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2017 Facebook */ #include <linux/bpf.h> #include <linux/btf.h> #include <linux/btf_ids.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/vmalloc.h> #include <linux/etherdevice.h> #include <linux/filter.h> #include <linux/rcupdate_trace.h> #include <linux/sched/signal.h> #include <net/bpf_sk_storage.h> #include <net/hotdata.h> #include <net/sock.h> #include <net/tcp.h> #include <net/net_namespace.h> #include <net/page_pool/helpers.h> #include <linux/error-injection.h> #include <linux/smp.h> #include <linux/sock_diag.h> #include <linux/netfilter.h> #include <net/netdev_rx_queue.h> #include <net/xdp.h> #include <net/netfilter/nf_bpf_link.h> #define CREATE_TRACE_POINTS #include <trace/events/bpf_test_run.h> struct bpf_test_timer { enum { NO_PREEMPT, NO_MIGRATE } mode; u32 i; u64 time_start, time_spent; }; static void bpf_test_timer_enter(struct bpf_test_timer *t) __acquires(rcu) { rcu_read_lock(); if (t->mode == NO_PREEMPT) preempt_disable(); else migrate_disable(); t->time_start = ktime_get_ns(); } static void bpf_test_timer_leave(struct bpf_test_timer *t) __releases(rcu) { t->time_start = 0; if (t->mode == NO_PREEMPT) preempt_enable(); else migrate_enable(); rcu_read_unlock(); } static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations, u32 repeat, int *err, u32 *duration) __must_hold(rcu) { t->i += iterations; if (t->i >= repeat) { /* We're done. */ t->time_spent += ktime_get_ns() - t->time_start; do_div(t->time_spent, t->i); *duration = t->time_spent > U32_MAX ? U32_MAX : (u32)t->time_spent; *err = 0; goto reset; } if (signal_pending(current)) { /* During iteration: we've been cancelled, abort. */ *err = -EINTR; goto reset; } if (need_resched()) { /* During iteration: we need to reschedule between runs. */ t->time_spent += ktime_get_ns() - t->time_start; bpf_test_timer_leave(t); cond_resched(); bpf_test_timer_enter(t); } /* Do another round. */ return true; reset: t->i = 0; return false; } /* We put this struct at the head of each page with a context and frame * initialised when the page is allocated, so we don't have to do this on each * repetition of the test run. */ struct xdp_page_head { struct xdp_buff orig_ctx; struct xdp_buff ctx; union { /* ::data_hard_start starts here */ DECLARE_FLEX_ARRAY(struct xdp_frame, frame); DECLARE_FLEX_ARRAY(u8, data); }; }; struct xdp_test_data { struct xdp_buff *orig_ctx; struct xdp_rxq_info rxq; struct net_device *dev; struct page_pool *pp; struct xdp_frame **frames; struct sk_buff **skbs; struct xdp_mem_info mem; u32 batch_size; u32 frame_cnt; }; /* tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c:%MAX_PKT_SIZE * must be updated accordingly this gets changed, otherwise BPF selftests * will fail. */ #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) #define TEST_XDP_MAX_BATCH 256 static void xdp_test_run_init_page(netmem_ref netmem, void *arg) { struct xdp_page_head *head = phys_to_virt(page_to_phys(netmem_to_page(netmem))); struct xdp_buff *new_ctx, *orig_ctx; u32 headroom = XDP_PACKET_HEADROOM; struct xdp_test_data *xdp = arg; size_t frm_len, meta_len; struct xdp_frame *frm; void *data; orig_ctx = xdp->orig_ctx; frm_len = orig_ctx->data_end - orig_ctx->data_meta; meta_len = orig_ctx->data - orig_ctx->data_meta; headroom -= meta_len; new_ctx = &head->ctx; frm = head->frame; data = head->data; memcpy(data + headroom, orig_ctx->data_meta, frm_len); xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); xdp_prepare_buff(new_ctx, data, headroom, frm_len, true); new_ctx->data = new_ctx->data_meta + meta_len; xdp_update_frame_from_buff(new_ctx, frm); frm->mem_type = new_ctx->rxq->mem.type; memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); } static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx) { struct page_pool *pp; int err = -ENOMEM; struct page_pool_params pp_params = { .order = 0, .flags = 0, .pool_size = xdp->batch_size, .nid = NUMA_NO_NODE, .init_callback = xdp_test_run_init_page, .init_arg = xdp, }; xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); if (!xdp->frames) return -ENOMEM; xdp->skbs = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); if (!xdp->skbs) goto err_skbs; pp = page_pool_create(&pp_params); if (IS_ERR(pp)) { err = PTR_ERR(pp); goto err_pp; } /* will copy 'mem.id' into pp->xdp_mem_id */ err = xdp_reg_mem_model(&xdp->mem, MEM_TYPE_PAGE_POOL, pp); if (err) goto err_mmodel; xdp->pp = pp; /* We create a 'fake' RXQ referencing the original dev, but with an * xdp_mem_info pointing to our page_pool */ xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0); xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; xdp->rxq.mem.id = pp->xdp_mem_id; xdp->dev = orig_ctx->rxq->dev; xdp->orig_ctx = orig_ctx; return 0; err_mmodel: page_pool_destroy(pp); err_pp: kvfree(xdp->skbs); err_skbs: kvfree(xdp->frames); return err; } static void xdp_test_run_teardown(struct xdp_test_data *xdp) { xdp_unreg_mem_model(&xdp->mem); page_pool_destroy(xdp->pp); kfree(xdp->frames); kfree(xdp->skbs); } static bool frame_was_changed(const struct xdp_page_head *head) { /* xdp_scrub_frame() zeroes the data pointer, flags is the last field, * i.e. has the highest chances to be overwritten. If those two are * untouched, it's most likely safe to skip the context reset. */ return head->frame->data != head->orig_ctx.data || head->frame->flags != head->orig_ctx.flags; } static bool ctx_was_changed(struct xdp_page_head *head) { return head->orig_ctx.data != head->ctx.data || head->orig_ctx.data_meta != head->ctx.data_meta || head->orig_ctx.data_end != head->ctx.data_end; } static void reset_ctx(struct xdp_page_head *head) { if (likely(!frame_was_changed(head) && !ctx_was_changed(head))) return; head->ctx.data = head->orig_ctx.data; head->ctx.data_meta = head->orig_ctx.data_meta; head->ctx.data_end = head->orig_ctx.data_end; xdp_update_frame_from_buff(&head->ctx, head->frame); head->frame->mem_type = head->orig_ctx.rxq->mem.type; } static int xdp_recv_frames(struct xdp_frame **frames, int nframes, struct sk_buff **skbs, struct net_device *dev) { gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n; LIST_HEAD(list); n = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, nframes, (void **)skbs); if (unlikely(n == 0)) { for (i = 0; i < nframes; i++) xdp_return_frame(frames[i]); return -ENOMEM; } for (i = 0; i < nframes; i++) { struct xdp_frame *xdpf = frames[i]; struct sk_buff *skb = skbs[i]; skb = __xdp_build_skb_from_frame(xdpf, skb, dev); if (!skb) { xdp_return_frame(xdpf); continue; } list_add_tail(&skb->list, &list); } netif_receive_skb_list(&list); return 0; } static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, u32 repeat) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int err = 0, act, ret, i, nframes = 0, batch_sz; struct xdp_frame **frames = xdp->frames; struct bpf_redirect_info *ri; struct xdp_page_head *head; struct xdp_frame *frm; bool redirect = false; struct xdp_buff *ctx; struct page *page; batch_sz = min_t(u32, repeat, xdp->batch_size); local_bh_disable(); bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); ri = bpf_net_ctx_get_ri(); xdp_set_return_frame_no_direct(); for (i = 0; i < batch_sz; i++) { page = page_pool_dev_alloc_pages(xdp->pp); if (!page) { err = -ENOMEM; goto out; } head = phys_to_virt(page_to_phys(page)); reset_ctx(head); ctx = &head->ctx; frm = head->frame; xdp->frame_cnt++; act = bpf_prog_run_xdp(prog, ctx); /* if program changed pkt bounds we need to update the xdp_frame */ if (unlikely(ctx_was_changed(head))) { ret = xdp_update_frame_from_buff(ctx, frm); if (ret) { xdp_return_buff(ctx); continue; } } switch (act) { case XDP_TX: /* we can't do a real XDP_TX since we're not in the * driver, so turn it into a REDIRECT back to the same * index */ ri->tgt_index = xdp->dev->ifindex; ri->map_id = INT_MAX; ri->map_type = BPF_MAP_TYPE_UNSPEC; fallthrough; case XDP_REDIRECT: redirect = true; ret = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog); if (ret) xdp_return_buff(ctx); break; case XDP_PASS: frames[nframes++] = frm; break; default: bpf_warn_invalid_xdp_action(NULL, prog, act); fallthrough; case XDP_DROP: xdp_return_buff(ctx); break; } } out: if (redirect) xdp_do_flush(); if (nframes) { ret = xdp_recv_frames(frames, nframes, xdp->skbs, xdp->dev); if (ret) err = ret; } xdp_clear_return_frame_no_direct(); bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); return err; } static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, u32 repeat, u32 batch_size, u32 *time) { struct xdp_test_data xdp = { .batch_size = batch_size }; struct bpf_test_timer t = { .mode = NO_MIGRATE }; int ret; if (!repeat) repeat = 1; ret = xdp_test_run_setup(&xdp, ctx); if (ret) return ret; bpf_test_timer_enter(&t); do { xdp.frame_cnt = 0; ret = xdp_test_run_batch(&xdp, prog, repeat - t.i); if (unlikely(ret < 0)) break; } while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time)); bpf_test_timer_leave(&t); xdp_test_run_teardown(&xdp); return ret; } static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; struct bpf_cg_run_ctx run_ctx; struct bpf_test_timer t = { NO_MIGRATE }; enum bpf_cgroup_storage_type stype; int ret; for_each_cgroup_storage_type(stype) { item.cgroup_storage[stype] = bpf_cgroup_storage_alloc(prog, stype); if (IS_ERR(item.cgroup_storage[stype])) { item.cgroup_storage[stype] = NULL; for_each_cgroup_storage_type(stype) bpf_cgroup_storage_free(item.cgroup_storage[stype]); return -ENOMEM; } } if (!repeat) repeat = 1; bpf_test_timer_enter(&t); old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); do { run_ctx.prog_item = &item; local_bh_disable(); bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (xdp) *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); for_each_cgroup_storage_type(stype) bpf_cgroup_storage_free(item.cgroup_storage[stype]); return ret; } static int bpf_test_finish(const union bpf_attr *kattr, union bpf_attr __user *uattr, const void *data, struct skb_shared_info *sinfo, u32 size, u32 retval, u32 duration) { void __user *data_out = u64_to_user_ptr(kattr->test.data_out); int err = -EFAULT; u32 copy_size = size; /* Clamp copy if the user has provided a size hint, but copy the full * buffer if not to retain old behaviour. */ if (kattr->test.data_size_out && copy_size > kattr->test.data_size_out) { copy_size = kattr->test.data_size_out; err = -ENOSPC; } if (data_out) { int len = sinfo ? copy_size - sinfo->xdp_frags_size : copy_size; if (len < 0) { err = -ENOSPC; goto out; } if (copy_to_user(data_out, data, len)) goto out; if (sinfo) { int i, offset = len; u32 data_len; for (i = 0; i < sinfo->nr_frags; i++) { skb_frag_t *frag = &sinfo->frags[i]; if (offset >= copy_size) { err = -ENOSPC; break; } data_len = min_t(u32, copy_size - offset, skb_frag_size(frag)); if (copy_to_user(data_out + offset, skb_frag_address(frag), data_len)) goto out; offset += data_len; } } } if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size))) goto out; if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) goto out; if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration))) goto out; if (err != -ENOSPC) err = 0; out: trace_bpf_test_finish(&err); return err; } /* Integer types of various sizes and pointer combinations cover variety of * architecture dependent calling conventions. 7+ can be supported in the * future. */ __bpf_kfunc_start_defs(); __bpf_kfunc int bpf_fentry_test1(int a) { return a + 1; } EXPORT_SYMBOL_GPL(bpf_fentry_test1); int noinline bpf_fentry_test2(int a, u64 b) { return a + b; } int noinline bpf_fentry_test3(char a, int b, u64 c) { return a + b + c; } int noinline bpf_fentry_test4(void *a, char b, int c, u64 d) { return (long)a + b + c + d; } int noinline bpf_fentry_test5(u64 a, void *b, short c, int d, u64 e) { return a + (long)b + c + d + e; } int noinline bpf_fentry_test6(u64 a, void *b, short c, int d, void *e, u64 f) { return a + (long)b + c + d + (long)e + f; } struct bpf_fentry_test_t { struct bpf_fentry_test_t *a; }; int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg) { asm volatile ("": "+r"(arg)); return (long)arg; } int noinline bpf_fentry_test8(struct bpf_fentry_test_t *arg) { return (long)arg->a; } __bpf_kfunc u32 bpf_fentry_test9(u32 *a) { return *a; } void noinline bpf_fentry_test_sinfo(struct skb_shared_info *sinfo) { } __bpf_kfunc int bpf_modify_return_test(int a, int *b) { *b += 1; return a + *b; } __bpf_kfunc int bpf_modify_return_test2(int a, int *b, short c, int d, void *e, char f, int g) { *b += 1; return a + *b + c + d + (long)e + f + g; } __bpf_kfunc int bpf_modify_return_test_tp(int nonce) { trace_bpf_trigger_tp(nonce); return nonce; } int noinline bpf_fentry_shadow_test(int a) { return a + 1; } struct prog_test_member1 { int a; }; struct prog_test_member { struct prog_test_member1 m; int c; }; struct prog_test_ref_kfunc { int a; int b; struct prog_test_member memb; struct prog_test_ref_kfunc *next; refcount_t cnt; }; __bpf_kfunc void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) { refcount_dec(&p->cnt); } __bpf_kfunc void bpf_kfunc_call_test_release_dtor(void *p) { bpf_kfunc_call_test_release(p); } CFI_NOSEAL(bpf_kfunc_call_test_release_dtor); __bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p) { } __bpf_kfunc void bpf_kfunc_call_memb_release_dtor(void *p) { } CFI_NOSEAL(bpf_kfunc_call_memb_release_dtor); __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_test_modify_return_ids) BTF_ID_FLAGS(func, bpf_modify_return_test) BTF_ID_FLAGS(func, bpf_modify_return_test2) BTF_ID_FLAGS(func, bpf_modify_return_test_tp) BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE) BTF_KFUNCS_END(bpf_test_modify_return_ids) static const struct btf_kfunc_id_set bpf_test_modify_return_set = { .owner = THIS_MODULE, .set = &bpf_test_modify_return_ids, }; BTF_KFUNCS_START(test_sk_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE) BTF_KFUNCS_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, u32 size, u32 headroom, u32 tailroom) { void __user *data_in = u64_to_user_ptr(kattr->test.data_in); void *data; if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) return ERR_PTR(-EINVAL); if (user_size > size) return ERR_PTR(-EMSGSIZE); size = SKB_DATA_ALIGN(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data) return ERR_PTR(-ENOMEM); if (copy_from_user(data + headroom, data_in, user_size)) { kfree(data); return ERR_PTR(-EFAULT); } return data; } int bpf_prog_test_run_tracing(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { struct bpf_fentry_test_t arg = {}; u16 side_effect = 0, ret = 0; int b = 2, err = -EFAULT; u32 retval = 0; if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; switch (prog->expected_attach_type) { case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT: if (bpf_fentry_test1(1) != 2 || bpf_fentry_test2(2, 3) != 5 || bpf_fentry_test3(4, 5, 6) != 15 || bpf_fentry_test4((void *)7, 8, 9, 10) != 34 || bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 || bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111 || bpf_fentry_test7((struct bpf_fentry_test_t *)0) != 0 || bpf_fentry_test8(&arg) != 0 || bpf_fentry_test9(&retval) != 0) goto out; break; case BPF_MODIFY_RETURN: ret = bpf_modify_return_test(1, &b); if (b != 2) side_effect++; b = 2; ret += bpf_modify_return_test2(1, &b, 3, 4, (void *)5, 6, 7); if (b != 2) side_effect++; break; default: goto out; } retval = ((u32)side_effect << 16) | ret; if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval))) goto out; err = 0; out: trace_bpf_test_finish(&err); return err; } struct bpf_raw_tp_test_run_info { struct bpf_prog *prog; void *ctx; u32 retval; }; static void __bpf_prog_test_run_raw_tp(void *data) { struct bpf_raw_tp_test_run_info *info = data; struct bpf_trace_run_ctx run_ctx = {}; struct bpf_run_ctx *old_run_ctx; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); rcu_read_lock(); info->retval = bpf_prog_run(info->prog, info->ctx); rcu_read_unlock(); bpf_reset_run_ctx(old_run_ctx); } int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); __u32 ctx_size_in = kattr->test.ctx_size_in; struct bpf_raw_tp_test_run_info info; int cpu = kattr->test.cpu, err = 0; int current_cpu; /* doesn't support data_in/out, ctx_out, duration, or repeat */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || kattr->test.repeat || kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || ctx_size_in > MAX_BPF_FUNC_ARGS * sizeof(u64)) return -EINVAL; if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0) return -EINVAL; if (ctx_size_in) { info.ctx = memdup_user(ctx_in, ctx_size_in); if (IS_ERR(info.ctx)) return PTR_ERR(info.ctx); } else { info.ctx = NULL; } info.prog = prog; current_cpu = get_cpu(); if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 || cpu == current_cpu) { __bpf_prog_test_run_raw_tp(&info); } else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) { /* smp_call_function_single() also checks cpu_online() * after csd_lock(). However, since cpu is from user * space, let's do an extra quick check to filter out * invalid value before smp_call_function_single(). */ err = -ENXIO; } else { err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp, &info, 1); } put_cpu(); if (!err && copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32))) err = -EFAULT; kfree(info.ctx); return err; } static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size) { void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in); void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); u32 size = kattr->test.ctx_size_in; void *data; int err; if (!data_in && !data_out) return NULL; data = kzalloc(max_size, GFP_USER); if (!data) return ERR_PTR(-ENOMEM); if (data_in) { err = bpf_check_uarg_tail_zero(USER_BPFPTR(data_in), max_size, size); if (err) { kfree(data); return ERR_PTR(err); } size = min_t(u32, max_size, size); if (copy_from_user(data, data_in, size)) { kfree(data); return ERR_PTR(-EFAULT); } } return data; } static int bpf_ctx_finish(const union bpf_attr *kattr, union bpf_attr __user *uattr, const void *data, u32 size) { void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out); int err = -EFAULT; u32 copy_size = size; if (!data || !data_out) return 0; if (copy_size > kattr->test.ctx_size_out) { copy_size = kattr->test.ctx_size_out; err = -ENOSPC; } if (copy_to_user(data_out, data, copy_size)) goto out; if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size))) goto out; if (err != -ENOSPC) err = 0; out: return err; } /** * range_is_zero - test whether buffer is initialized * @buf: buffer to check * @from: check from this position * @to: check up until (excluding) this position * * This function returns true if the there is a non-zero byte * in the buf in the range [from,to). */ static inline bool range_is_zero(void *buf, size_t from, size_t to) { return !memchr_inv((u8 *)buf + from, 0, to - from); } static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) { struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; if (!__skb) return 0; /* make sure the fields we don't use are zeroed */ if (!range_is_zero(__skb, 0, offsetof(struct __sk_buff, mark))) return -EINVAL; /* mark is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, mark), offsetof(struct __sk_buff, priority))) return -EINVAL; /* priority is allowed */ /* ingress_ifindex is allowed */ /* ifindex is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, ifindex), offsetof(struct __sk_buff, cb))) return -EINVAL; /* cb is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, cb), offsetof(struct __sk_buff, tstamp))) return -EINVAL; /* tstamp is allowed */ /* wire_len is allowed */ /* gso_segs is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), offsetof(struct __sk_buff, gso_size))) return -EINVAL; /* gso_size is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_size), offsetof(struct __sk_buff, hwtstamp))) return -EINVAL; /* hwtstamp is allowed */ if (!range_is_zero(__skb, offsetofend(struct __sk_buff, hwtstamp), sizeof(struct __sk_buff))) return -EINVAL; skb->mark = __skb->mark; skb->priority = __skb->priority; skb->skb_iif = __skb->ingress_ifindex; skb->tstamp = __skb->tstamp; memcpy(&cb->data, __skb->cb, QDISC_CB_PRIV_LEN); if (__skb->wire_len == 0) { cb->pkt_len = skb->len; } else { if (__skb->wire_len < skb->len || __skb->wire_len > GSO_LEGACY_MAX_SIZE) return -EINVAL; cb->pkt_len = __skb->wire_len; } if (__skb->gso_segs > GSO_MAX_SEGS) return -EINVAL; skb_shinfo(skb)->gso_segs = __skb->gso_segs; skb_shinfo(skb)->gso_size = __skb->gso_size; skb_shinfo(skb)->hwtstamps.hwtstamp = __skb->hwtstamp; return 0; } static void convert_skb_to___skb(struct sk_buff *skb, struct __sk_buff *__skb) { struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; if (!__skb) return; __skb->mark = skb->mark; __skb->priority = skb->priority; __skb->ingress_ifindex = skb->skb_iif; __skb->ifindex = skb->dev->ifindex; __skb->tstamp = skb->tstamp; memcpy(__skb->cb, &cb->data, QDISC_CB_PRIV_LEN); __skb->wire_len = cb->pkt_len; __skb->gso_segs = skb_shinfo(skb)->gso_segs; __skb->hwtstamp = skb_shinfo(skb)->hwtstamps.hwtstamp; } static struct proto bpf_dummy_proto = { .name = "bpf_dummy", .owner = THIS_MODULE, .obj_size = sizeof(struct sock), }; int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { bool is_l2 = false, is_direct_pkt_access = false; struct net *net = current->nsproxy->net_ns; struct net_device *dev = net->loopback_dev; u32 size = kattr->test.data_size_in; u32 repeat = kattr->test.repeat; struct __sk_buff *ctx = NULL; u32 retval, duration; int hh_len = ETH_HLEN; struct sk_buff *skb; struct sock *sk; void *data; int ret; if ((kattr->test.flags & ~BPF_F_TEST_SKB_CHECKSUM_COMPLETE) || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; data = bpf_test_init(kattr, kattr->test.data_size_in, size, NET_SKB_PAD + NET_IP_ALIGN, SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); if (IS_ERR(data)) return PTR_ERR(data); ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff)); if (IS_ERR(ctx)) { kfree(data); return PTR_ERR(ctx); } switch (prog->type) { case BPF_PROG_TYPE_SCHED_CLS: case BPF_PROG_TYPE_SCHED_ACT: is_l2 = true; fallthrough; case BPF_PROG_TYPE_LWT_IN: case BPF_PROG_TYPE_LWT_OUT: case BPF_PROG_TYPE_LWT_XMIT: is_direct_pkt_access = true; break; default: break; } sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, 1); if (!sk) { kfree(data); kfree(ctx); return -ENOMEM; } sock_init_data(NULL, sk); skb = slab_build_skb(data); if (!skb) { kfree(data); kfree(ctx); sk_free(sk); return -ENOMEM; } skb->sk = sk; skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN); __skb_put(skb, size); if (ctx && ctx->ifindex > 1) { dev = dev_get_by_index(net, ctx->ifindex); if (!dev) { ret = -ENODEV; goto out; } } skb->protocol = eth_type_trans(skb, dev); skb_reset_network_header(skb); switch (skb->protocol) { case htons(ETH_P_IP): sk->sk_family = AF_INET; if (sizeof(struct iphdr) <= skb_headlen(skb)) { sk->sk_rcv_saddr = ip_hdr(skb)->saddr; sk->sk_daddr = ip_hdr(skb)->daddr; } break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): sk->sk_family = AF_INET6; if (sizeof(struct ipv6hdr) <= skb_headlen(skb)) { sk->sk_v6_rcv_saddr = ipv6_hdr(skb)->saddr; sk->sk_v6_daddr = ipv6_hdr(skb)->daddr; } break; #endif default: break; } if (is_l2) __skb_push(skb, hh_len); if (is_direct_pkt_access) bpf_compute_data_pointers(skb); ret = convert___skb_to_skb(skb, ctx); if (ret) goto out; if (kattr->test.flags & BPF_F_TEST_SKB_CHECKSUM_COMPLETE) { const int off = skb_network_offset(skb); int len = skb->len - off; skb->csum = skb_checksum(skb, off, len, 0); skb->ip_summed = CHECKSUM_COMPLETE; } ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false); if (ret) goto out; if (!is_l2) { if (skb_headroom(skb) < hh_len) { int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb)); if (pskb_expand_head(skb, nhead, 0, GFP_USER)) { ret = -ENOMEM; goto out; } } memset(__skb_push(skb, hh_len), 0, hh_len); } if (kattr->test.flags & BPF_F_TEST_SKB_CHECKSUM_COMPLETE) { const int off = skb_network_offset(skb); int len = skb->len - off; __wsum csum; csum = skb_checksum(skb, off, len, 0); if (csum_fold(skb->csum) != csum_fold(csum)) { ret = -EBADMSG; goto out; } } convert_skb_to___skb(skb, ctx); size = skb->len; /* bpf program can never convert linear skb to non-linear */ if (WARN_ON_ONCE(skb_is_nonlinear(skb))) size = skb_headlen(skb); ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval, duration); if (!ret) ret = bpf_ctx_finish(kattr, uattr, ctx, sizeof(struct __sk_buff)); out: if (dev && dev != net->loopback_dev) dev_put(dev); kfree_skb(skb); sk_free(sk); kfree(ctx); return ret; } static int xdp_convert_md_to_buff(struct xdp_md *xdp_md, struct xdp_buff *xdp) { unsigned int ingress_ifindex, rx_queue_index; struct netdev_rx_queue *rxqueue; struct net_device *device; if (!xdp_md) return 0; if (xdp_md->egress_ifindex != 0) return -EINVAL; ingress_ifindex = xdp_md->ingress_ifindex; rx_queue_index = xdp_md->rx_queue_index; if (!ingress_ifindex && rx_queue_index) return -EINVAL; if (ingress_ifindex) { device = dev_get_by_index(current->nsproxy->net_ns, ingress_ifindex); if (!device) return -ENODEV; if (rx_queue_index >= device->real_num_rx_queues) goto free_dev; rxqueue = __netif_get_rx_queue(device, rx_queue_index); if (!xdp_rxq_info_is_reg(&rxqueue->xdp_rxq)) goto free_dev; xdp->rxq = &rxqueue->xdp_rxq; /* The device is now tracked in the xdp->rxq for later * dev_put() */ } xdp->data = xdp->data_meta + xdp_md->data; return 0; free_dev: dev_put(device); return -EINVAL; } static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md) { if (!xdp_md) return; xdp_md->data = xdp->data - xdp->data_meta; xdp_md->data_end = xdp->data_end - xdp->data_meta; if (xdp_md->ingress_ifindex) dev_put(xdp->rxq->dev); } int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); u32 batch_size = kattr->test.batch_size; u32 retval = 0, duration, max_data_sz; u32 size = kattr->test.data_size_in; u32 headroom = XDP_PACKET_HEADROOM; u32 repeat = kattr->test.repeat; struct netdev_rx_queue *rxqueue; struct skb_shared_info *sinfo; struct xdp_buff xdp = {}; int i, ret = -EINVAL; struct xdp_md *ctx; void *data; if (prog->expected_attach_type == BPF_XDP_DEVMAP || prog->expected_attach_type == BPF_XDP_CPUMAP) return -EINVAL; if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES) return -EINVAL; if (bpf_prog_is_dev_bound(prog->aux)) return -EINVAL; if (do_live) { if (!batch_size) batch_size = NAPI_POLL_WEIGHT; else if (batch_size > TEST_XDP_MAX_BATCH) return -E2BIG; headroom += sizeof(struct xdp_page_head); } else if (batch_size) { return -EINVAL; } ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md)); if (IS_ERR(ctx)) return PTR_ERR(ctx); if (ctx) { /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end != size || ctx->data > ctx->data_end || unlikely(xdp_metalen_invalid(ctx->data)) || (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; } max_data_sz = 4096 - headroom - tailroom; if (size > max_data_sz) { /* disallow live data mode for jumbo frames */ if (do_live) goto free_ctx; size = max_data_sz; } data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom); if (IS_ERR(data)) { ret = PTR_ERR(data); goto free_ctx; } rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0); rxqueue->xdp_rxq.frag_size = headroom + max_data_sz + tailroom; xdp_init_buff(&xdp, rxqueue->xdp_rxq.frag_size, &rxqueue->xdp_rxq); xdp_prepare_buff(&xdp, data, headroom, size, true); sinfo = xdp_get_shared_info_from_buff(&xdp); ret = xdp_convert_md_to_buff(ctx, &xdp); if (ret) goto free_data; if (unlikely(kattr->test.data_size_in > size)) { void __user *data_in = u64_to_user_ptr(kattr->test.data_in); while (size < kattr->test.data_size_in) { struct page *page; skb_frag_t *frag; u32 data_len; if (sinfo->nr_frags == MAX_SKB_FRAGS) { ret = -ENOMEM; goto out; } page = alloc_page(GFP_KERNEL); if (!page) { ret = -ENOMEM; goto out; } frag = &sinfo->frags[sinfo->nr_frags++]; data_len = min_t(u32, kattr->test.data_size_in - size, PAGE_SIZE); skb_frag_fill_page_desc(frag, page, 0, data_len); if (copy_from_user(page_address(page), data_in + size, data_len)) { ret = -EFAULT; goto out; } sinfo->xdp_frags_size += data_len; size += data_len; } xdp_buff_set_frags_flag(&xdp); } if (repeat > 1) bpf_prog_change_xdp(NULL, prog); if (do_live) ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration); else ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. */ xdp_convert_buff_to_md(&xdp, ctx); if (ret) goto out; size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size; ret = bpf_test_finish(kattr, uattr, xdp.data_meta, sinfo, size, retval, duration); if (!ret) ret = bpf_ctx_finish(kattr, uattr, ctx, sizeof(struct xdp_md)); out: if (repeat > 1) bpf_prog_change_xdp(prog, NULL); free_data: for (i = 0; i < sinfo->nr_frags; i++) __free_page(skb_frag_page(&sinfo->frags[i])); kfree(data); free_ctx: kfree(ctx); return ret; } static int verify_user_bpf_flow_keys(struct bpf_flow_keys *ctx) { /* make sure the fields we don't use are zeroed */ if (!range_is_zero(ctx, 0, offsetof(struct bpf_flow_keys, flags))) return -EINVAL; /* flags is allowed */ if (!range_is_zero(ctx, offsetofend(struct bpf_flow_keys, flags), sizeof(struct bpf_flow_keys))) return -EINVAL; return 0; } int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { struct bpf_test_timer t = { NO_PREEMPT }; u32 size = kattr->test.data_size_in; struct bpf_flow_dissector ctx = {}; u32 repeat = kattr->test.repeat; struct bpf_flow_keys *user_ctx; struct bpf_flow_keys flow_keys; const struct ethhdr *eth; unsigned int flags = 0; u32 retval, duration; void *data; int ret; if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (size < ETH_HLEN) return -EINVAL; data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0); if (IS_ERR(data)) return PTR_ERR(data); eth = (struct ethhdr *)data; if (!repeat) repeat = 1; user_ctx = bpf_ctx_init(kattr, sizeof(struct bpf_flow_keys)); if (IS_ERR(user_ctx)) { kfree(data); return PTR_ERR(user_ctx); } if (user_ctx) { ret = verify_user_bpf_flow_keys(user_ctx); if (ret) goto out; flags = user_ctx->flags; } ctx.flow_keys = &flow_keys; ctx.data = data; ctx.data_end = (__u8 *)data + size; bpf_test_timer_enter(&t); do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, size, flags); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) goto out; ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL, sizeof(flow_keys), retval, duration); if (!ret) ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(struct bpf_flow_keys)); out: kfree(user_ctx); kfree(data); return ret; } int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { struct bpf_test_timer t = { NO_PREEMPT }; struct bpf_prog_array *progs = NULL; struct bpf_sk_lookup_kern ctx = {}; u32 repeat = kattr->test.repeat; struct bpf_sk_lookup *user_ctx; u32 retval, duration; int ret = -EINVAL; if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out || kattr->test.data_size_out) return -EINVAL; if (!repeat) repeat = 1; user_ctx = bpf_ctx_init(kattr, sizeof(*user_ctx)); if (IS_ERR(user_ctx)) return PTR_ERR(user_ctx); if (!user_ctx) return -EINVAL; if (user_ctx->sk) goto out; if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx))) goto out; if (user_ctx->local_port > U16_MAX) { ret = -ERANGE; goto out; } ctx.family = (u16)user_ctx->family; ctx.protocol = (u16)user_ctx->protocol; ctx.dport = (u16)user_ctx->local_port; ctx.sport = user_ctx->remote_port; switch (ctx.family) { case AF_INET: ctx.v4.daddr = (__force __be32)user_ctx->local_ip4; ctx.v4.saddr = (__force __be32)user_ctx->remote_ip4; break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: ctx.v6.daddr = (struct in6_addr *)user_ctx->local_ip6; ctx.v6.saddr = (struct in6_addr *)user_ctx->remote_ip6; break; #endif default: ret = -EAFNOSUPPORT; goto out; } progs = bpf_prog_array_alloc(1, GFP_KERNEL); if (!progs) { ret = -ENOMEM; goto out; } progs->items[0].prog = prog; bpf_test_timer_enter(&t); do { ctx.selected_sk = NULL; retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) goto out; user_ctx->cookie = 0; if (ctx.selected_sk) { if (ctx.selected_sk->sk_reuseport && !ctx.no_reuseport) { ret = -EOPNOTSUPP; goto out; } user_ctx->cookie = sock_gen_cookie(ctx.selected_sk); } ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration); if (!ret) ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx)); out: bpf_prog_array_free(progs); kfree(user_ctx); return ret; } int bpf_prog_test_run_syscall(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in); __u32 ctx_size_in = kattr->test.ctx_size_in; void *ctx = NULL; u32 retval; int err = 0; /* doesn't support data_in/out, ctx_out, duration, or repeat or flags */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || kattr->test.repeat || kattr->test.flags || kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || ctx_size_in > U16_MAX) return -EINVAL; if (ctx_size_in) { ctx = memdup_user(ctx_in, ctx_size_in); if (IS_ERR(ctx)) return PTR_ERR(ctx); } rcu_read_lock_trace(); retval = bpf_prog_run_pin_on_cpu(prog, ctx); rcu_read_unlock_trace(); if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) { err = -EFAULT; goto out; } if (ctx_size_in) if (copy_to_user(ctx_in, ctx, ctx_size_in)) err = -EFAULT; out: kfree(ctx); return err; } static int verify_and_copy_hook_state(struct nf_hook_state *state, const struct nf_hook_state *user, struct net_device *dev) { if (user->in || user->out) return -EINVAL; if (user->net || user->sk || user->okfn) return -EINVAL; switch (user->pf) { case NFPROTO_IPV4: case NFPROTO_IPV6: switch (state->hook) { case NF_INET_PRE_ROUTING: state->in = dev; break; case NF_INET_LOCAL_IN: state->in = dev; break; case NF_INET_FORWARD: state->in = dev; state->out = dev; break; case NF_INET_LOCAL_OUT: state->out = dev; break; case NF_INET_POST_ROUTING: state->out = dev; break; } break; default: return -EINVAL; } state->pf = user->pf; state->hook = user->hook; return 0; } static __be16 nfproto_eth(int nfproto) { switch (nfproto) { case NFPROTO_IPV4: return htons(ETH_P_IP); case NFPROTO_IPV6: break; } return htons(ETH_P_IPV6); } int bpf_prog_test_run_nf(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { struct net *net = current->nsproxy->net_ns; struct net_device *dev = net->loopback_dev; struct nf_hook_state *user_ctx, hook_state = { .pf = NFPROTO_IPV4, .hook = NF_INET_LOCAL_OUT, }; u32 size = kattr->test.data_size_in; u32 repeat = kattr->test.repeat; struct bpf_nf_ctx ctx = { .state = &hook_state, }; struct sk_buff *skb = NULL; u32 retval, duration; void *data; int ret; if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (size < sizeof(struct iphdr)) return -EINVAL; data = bpf_test_init(kattr, kattr->test.data_size_in, size, NET_SKB_PAD + NET_IP_ALIGN, SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); if (IS_ERR(data)) return PTR_ERR(data); if (!repeat) repeat = 1; user_ctx = bpf_ctx_init(kattr, sizeof(struct nf_hook_state)); if (IS_ERR(user_ctx)) { kfree(data); return PTR_ERR(user_ctx); } if (user_ctx) { ret = verify_and_copy_hook_state(&hook_state, user_ctx, dev); if (ret) goto out; } skb = slab_build_skb(data); if (!skb) { ret = -ENOMEM; goto out; } data = NULL; /* data released via kfree_skb */ skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN); __skb_put(skb, size); ret = -EINVAL; if (hook_state.hook != NF_INET_LOCAL_OUT) { if (size < ETH_HLEN + sizeof(struct iphdr)) goto out; skb->protocol = eth_type_trans(skb, dev); switch (skb->protocol) { case htons(ETH_P_IP): if (hook_state.pf == NFPROTO_IPV4) break; goto out; case htons(ETH_P_IPV6): if (size < ETH_HLEN + sizeof(struct ipv6hdr)) goto out; if (hook_state.pf == NFPROTO_IPV6) break; goto out; default: ret = -EPROTO; goto out; } skb_reset_network_header(skb); } else { skb->protocol = nfproto_eth(hook_state.pf); } ctx.skb = skb; ret = bpf_test_run(prog, &ctx, repeat, &retval, &duration, false); if (ret) goto out; ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration); out: kfree(user_ctx); kfree_skb(skb); kfree(data); return ret; } static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { .owner = THIS_MODULE, .set = &test_sk_check_kfunc_ids, }; BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids) BTF_ID(struct, prog_test_ref_kfunc) BTF_ID(func, bpf_kfunc_call_test_release_dtor) BTF_ID(struct, prog_test_member) BTF_ID(func, bpf_kfunc_call_memb_release_dtor) static int __init bpf_prog_test_run_init(void) { const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = { { .btf_id = bpf_prog_test_dtor_kfunc_ids[0], .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1] }, { .btf_id = bpf_prog_test_dtor_kfunc_ids[2], .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[3], }, }; int ret; ret = register_btf_fmodret_id_set(&bpf_test_modify_return_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_prog_test_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_prog_test_kfunc_set); return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc, ARRAY_SIZE(bpf_prog_test_dtor_kfunc), THIS_MODULE); } late_initcall(bpf_prog_test_run_init); |
1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | // SPDX-License-Identifier: GPL-2.0-or-later /* * vimc-lens.c Virtual Media Controller Driver * Copyright (C) 2022 Google, Inc * Author: yunkec@google.com (Yunke Cao) */ #include <media/v4l2-ctrls.h> #include <media/v4l2-event.h> #include <media/v4l2-subdev.h> #include "vimc-common.h" #define VIMC_LENS_MAX_FOCUS_POS 1023 #define VIMC_LENS_MAX_FOCUS_STEP 1 struct vimc_lens_device { struct vimc_ent_device ved; struct v4l2_subdev sd; struct v4l2_ctrl_handler hdl; u32 focus_absolute; }; static const struct v4l2_subdev_core_ops vimc_lens_core_ops = { .log_status = v4l2_ctrl_subdev_log_status, .subscribe_event = v4l2_ctrl_subdev_subscribe_event, .unsubscribe_event = v4l2_event_subdev_unsubscribe, }; static const struct v4l2_subdev_ops vimc_lens_ops = { .core = &vimc_lens_core_ops }; static int vimc_lens_s_ctrl(struct v4l2_ctrl *ctrl) { struct vimc_lens_device *vlens = container_of(ctrl->handler, struct vimc_lens_device, hdl); if (ctrl->id == V4L2_CID_FOCUS_ABSOLUTE) { vlens->focus_absolute = ctrl->val; return 0; } return -EINVAL; } static const struct v4l2_ctrl_ops vimc_lens_ctrl_ops = { .s_ctrl = vimc_lens_s_ctrl, }; static struct vimc_ent_device *vimc_lens_add(struct vimc_device *vimc, const char *vcfg_name) { struct v4l2_device *v4l2_dev = &vimc->v4l2_dev; struct vimc_lens_device *vlens; int ret; /* Allocate the vlens struct */ vlens = kzalloc(sizeof(*vlens), GFP_KERNEL); if (!vlens) return ERR_PTR(-ENOMEM); v4l2_ctrl_handler_init(&vlens->hdl, 1); v4l2_ctrl_new_std(&vlens->hdl, &vimc_lens_ctrl_ops, V4L2_CID_FOCUS_ABSOLUTE, 0, VIMC_LENS_MAX_FOCUS_POS, VIMC_LENS_MAX_FOCUS_STEP, 0); vlens->sd.ctrl_handler = &vlens->hdl; if (vlens->hdl.error) { ret = vlens->hdl.error; goto err_free_vlens; } vlens->ved.dev = vimc->mdev.dev; ret = vimc_ent_sd_register(&vlens->ved, &vlens->sd, v4l2_dev, vcfg_name, MEDIA_ENT_F_LENS, 0, NULL, NULL, &vimc_lens_ops); if (ret) goto err_free_hdl; return &vlens->ved; err_free_hdl: v4l2_ctrl_handler_free(&vlens->hdl); err_free_vlens: kfree(vlens); return ERR_PTR(ret); } static void vimc_lens_release(struct vimc_ent_device *ved) { struct vimc_lens_device *vlens = container_of(ved, struct vimc_lens_device, ved); v4l2_ctrl_handler_free(&vlens->hdl); v4l2_subdev_cleanup(&vlens->sd); media_entity_cleanup(vlens->ved.ent); kfree(vlens); } const struct vimc_ent_type vimc_lens_type = { .add = vimc_lens_add, .release = vimc_lens_release }; |
49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) /* Copyright (C) 2019 Netronome Systems, Inc. */ #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <net/snmp.h> #include <net/tls.h> #include "tls.h" #ifdef CONFIG_PROC_FS static const struct snmp_mib tls_mib_list[] = { SNMP_MIB_ITEM("TlsCurrTxSw", LINUX_MIB_TLSCURRTXSW), SNMP_MIB_ITEM("TlsCurrRxSw", LINUX_MIB_TLSCURRRXSW), SNMP_MIB_ITEM("TlsCurrTxDevice", LINUX_MIB_TLSCURRTXDEVICE), SNMP_MIB_ITEM("TlsCurrRxDevice", LINUX_MIB_TLSCURRRXDEVICE), SNMP_MIB_ITEM("TlsTxSw", LINUX_MIB_TLSTXSW), SNMP_MIB_ITEM("TlsRxSw", LINUX_MIB_TLSRXSW), SNMP_MIB_ITEM("TlsTxDevice", LINUX_MIB_TLSTXDEVICE), SNMP_MIB_ITEM("TlsRxDevice", LINUX_MIB_TLSRXDEVICE), SNMP_MIB_ITEM("TlsDecryptError", LINUX_MIB_TLSDECRYPTERROR), SNMP_MIB_ITEM("TlsRxDeviceResync", LINUX_MIB_TLSRXDEVICERESYNC), SNMP_MIB_ITEM("TlsDecryptRetry", LINUX_MIB_TLSDECRYPTRETRY), SNMP_MIB_ITEM("TlsRxNoPadViolation", LINUX_MIB_TLSRXNOPADVIOL), SNMP_MIB_ITEM("TlsRxRekeyOk", LINUX_MIB_TLSRXREKEYOK), SNMP_MIB_ITEM("TlsRxRekeyError", LINUX_MIB_TLSRXREKEYERROR), SNMP_MIB_ITEM("TlsTxRekeyOk", LINUX_MIB_TLSTXREKEYOK), SNMP_MIB_ITEM("TlsTxRekeyError", LINUX_MIB_TLSTXREKEYERROR), SNMP_MIB_ITEM("TlsRxRekeyReceived", LINUX_MIB_TLSRXREKEYRECEIVED), SNMP_MIB_SENTINEL }; static int tls_statistics_seq_show(struct seq_file *seq, void *v) { unsigned long buf[LINUX_MIB_TLSMAX] = {}; struct net *net = seq->private; int i; snmp_get_cpu_field_batch(buf, tls_mib_list, net->mib.tls_statistics); for (i = 0; tls_mib_list[i].name; i++) seq_printf(seq, "%-32s\t%lu\n", tls_mib_list[i].name, buf[i]); return 0; } #endif int __net_init tls_proc_init(struct net *net) { #ifdef CONFIG_PROC_FS if (!proc_create_net_single("tls_stat", 0444, net->proc_net, tls_statistics_seq_show, NULL)) return -ENOMEM; #endif /* CONFIG_PROC_FS */ return 0; } void __net_exit tls_proc_fini(struct net *net) { remove_proc_entry("tls_stat", net->proc_net); } |
1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | // SPDX-License-Identifier: GPL-2.0 #include <linux/utsname.h> #include <net/cfg80211.h> #include "core.h" #include "rdev-ops.h" void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct device *pdev = wiphy_dev(wdev->wiphy); if (pdev->driver) strscpy(info->driver, pdev->driver->name, sizeof(info->driver)); else strscpy(info->driver, "N/A", sizeof(info->driver)); strscpy(info->version, init_utsname()->release, sizeof(info->version)); if (wdev->wiphy->fw_version[0]) strscpy(info->fw_version, wdev->wiphy->fw_version, sizeof(info->fw_version)); else strscpy(info->fw_version, "N/A", sizeof(info->fw_version)); strscpy(info->bus_info, dev_name(wiphy_dev(wdev->wiphy)), sizeof(info->bus_info)); } EXPORT_SYMBOL(cfg80211_get_drvinfo); |
8 8 16 430 98 11 20 67 15 72 12 75 87 163 140 7 149 1 73 11 1 43 37 62 10 4 11 61 2 2 5 1 4 1 113 112 91 90 90 1 3 107 3 2 3 105 5 108 105 3 113 40 40 1 39 1 2 11 29 40 2 2 2 139 1 138 113 110 25 1 38 39 39 39 28 353 111 127 4 88 91 4 180 71 406 22 407 27 28 28 28 7 28 27 28 28 25 4 28 28 440 178 19 1 422 207 436 12 436 197 435 432 139 2 14 3 440 21 433 18 1 13 13 4 17 4 1 244 6 239 239 1 239 48 239 238 238 98 204 1 1 1 1 1 1 1 1 1 1 1 1 88 89 110 2 2 1 1 2 1 1 100 2 10 96 71 71 70 3 5 4 9 1 52 15 12 12 4 4 4 87 12 9 2 64 71 16 1 1 53 14 13 1 5 2 1 2 43 41 4 4 4 4 4 13 32 26 2 1 14 42 19 20 4 16 2 3 13 3 12 2 20 19 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 | // SPDX-License-Identifier: GPL-2.0 /* * * Copyright (C) 2019-2021 Paragon Software GmbH, All rights reserved. * */ #include <linux/blkdev.h> #include <linux/buffer_head.h> #include <linux/fs.h> #include <linux/kernel.h> #include <linux/nls.h> #include "debug.h" #include "ntfs.h" #include "ntfs_fs.h" // clang-format off const struct cpu_str NAME_MFT = { 4, 0, { '$', 'M', 'F', 'T' }, }; const struct cpu_str NAME_MIRROR = { 8, 0, { '$', 'M', 'F', 'T', 'M', 'i', 'r', 'r' }, }; const struct cpu_str NAME_LOGFILE = { 8, 0, { '$', 'L', 'o', 'g', 'F', 'i', 'l', 'e' }, }; const struct cpu_str NAME_VOLUME = { 7, 0, { '$', 'V', 'o', 'l', 'u', 'm', 'e' }, }; const struct cpu_str NAME_ATTRDEF = { 8, 0, { '$', 'A', 't', 't', 'r', 'D', 'e', 'f' }, }; const struct cpu_str NAME_ROOT = { 1, 0, { '.' }, }; const struct cpu_str NAME_BITMAP = { 7, 0, { '$', 'B', 'i', 't', 'm', 'a', 'p' }, }; const struct cpu_str NAME_BOOT = { 5, 0, { '$', 'B', 'o', 'o', 't' }, }; const struct cpu_str NAME_BADCLUS = { 8, 0, { '$', 'B', 'a', 'd', 'C', 'l', 'u', 's' }, }; const struct cpu_str NAME_QUOTA = { 6, 0, { '$', 'Q', 'u', 'o', 't', 'a' }, }; const struct cpu_str NAME_SECURE = { 7, 0, { '$', 'S', 'e', 'c', 'u', 'r', 'e' }, }; const struct cpu_str NAME_UPCASE = { 7, 0, { '$', 'U', 'p', 'C', 'a', 's', 'e' }, }; const struct cpu_str NAME_EXTEND = { 7, 0, { '$', 'E', 'x', 't', 'e', 'n', 'd' }, }; const struct cpu_str NAME_OBJID = { 6, 0, { '$', 'O', 'b', 'j', 'I', 'd' }, }; const struct cpu_str NAME_REPARSE = { 8, 0, { '$', 'R', 'e', 'p', 'a', 'r', 's', 'e' }, }; const struct cpu_str NAME_USNJRNL = { 8, 0, { '$', 'U', 's', 'n', 'J', 'r', 'n', 'l' }, }; const __le16 BAD_NAME[4] = { cpu_to_le16('$'), cpu_to_le16('B'), cpu_to_le16('a'), cpu_to_le16('d'), }; const __le16 I30_NAME[4] = { cpu_to_le16('$'), cpu_to_le16('I'), cpu_to_le16('3'), cpu_to_le16('0'), }; const __le16 SII_NAME[4] = { cpu_to_le16('$'), cpu_to_le16('S'), cpu_to_le16('I'), cpu_to_le16('I'), }; const __le16 SDH_NAME[4] = { cpu_to_le16('$'), cpu_to_le16('S'), cpu_to_le16('D'), cpu_to_le16('H'), }; const __le16 SDS_NAME[4] = { cpu_to_le16('$'), cpu_to_le16('S'), cpu_to_le16('D'), cpu_to_le16('S'), }; const __le16 SO_NAME[2] = { cpu_to_le16('$'), cpu_to_le16('O'), }; const __le16 SQ_NAME[2] = { cpu_to_le16('$'), cpu_to_le16('Q'), }; const __le16 SR_NAME[2] = { cpu_to_le16('$'), cpu_to_le16('R'), }; #ifdef CONFIG_NTFS3_LZX_XPRESS const __le16 WOF_NAME[17] = { cpu_to_le16('W'), cpu_to_le16('o'), cpu_to_le16('f'), cpu_to_le16('C'), cpu_to_le16('o'), cpu_to_le16('m'), cpu_to_le16('p'), cpu_to_le16('r'), cpu_to_le16('e'), cpu_to_le16('s'), cpu_to_le16('s'), cpu_to_le16('e'), cpu_to_le16('d'), cpu_to_le16('D'), cpu_to_le16('a'), cpu_to_le16('t'), cpu_to_le16('a'), }; #endif static const __le16 CON_NAME[3] = { cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('N'), }; static const __le16 NUL_NAME[3] = { cpu_to_le16('N'), cpu_to_le16('U'), cpu_to_le16('L'), }; static const __le16 AUX_NAME[3] = { cpu_to_le16('A'), cpu_to_le16('U'), cpu_to_le16('X'), }; static const __le16 PRN_NAME[3] = { cpu_to_le16('P'), cpu_to_le16('R'), cpu_to_le16('N'), }; static const __le16 COM_NAME[3] = { cpu_to_le16('C'), cpu_to_le16('O'), cpu_to_le16('M'), }; static const __le16 LPT_NAME[3] = { cpu_to_le16('L'), cpu_to_le16('P'), cpu_to_le16('T'), }; // clang-format on /* * ntfs_fix_pre_write - Insert fixups into @rhdr before writing to disk. */ bool ntfs_fix_pre_write(struct NTFS_RECORD_HEADER *rhdr, size_t bytes) { u16 *fixup, *ptr; u16 sample; u16 fo = le16_to_cpu(rhdr->fix_off); u16 fn = le16_to_cpu(rhdr->fix_num); if ((fo & 1) || fo + fn * sizeof(short) > SECTOR_SIZE || !fn-- || fn * SECTOR_SIZE > bytes) { return false; } /* Get fixup pointer. */ fixup = Add2Ptr(rhdr, fo); if (*fixup >= 0x7FFF) *fixup = 1; else *fixup += 1; sample = *fixup; ptr = Add2Ptr(rhdr, SECTOR_SIZE - sizeof(short)); while (fn--) { *++fixup = *ptr; *ptr = sample; ptr += SECTOR_SIZE / sizeof(short); } return true; } /* * ntfs_fix_post_read - Remove fixups after reading from disk. * * Return: < 0 if error, 0 if ok, 1 if need to update fixups. */ int ntfs_fix_post_read(struct NTFS_RECORD_HEADER *rhdr, size_t bytes, bool simple) { int ret; u16 *fixup, *ptr; u16 sample, fo, fn; fo = le16_to_cpu(rhdr->fix_off); fn = simple ? ((bytes >> SECTOR_SHIFT) + 1) : le16_to_cpu(rhdr->fix_num); /* Check errors. */ if ((fo & 1) || fo + fn * sizeof(short) > SECTOR_SIZE || !fn-- || fn * SECTOR_SIZE > bytes) { return -E_NTFS_CORRUPT; } /* Get fixup pointer. */ fixup = Add2Ptr(rhdr, fo); sample = *fixup; ptr = Add2Ptr(rhdr, SECTOR_SIZE - sizeof(short)); ret = 0; while (fn--) { /* Test current word. */ if (*ptr != sample) { /* Fixup does not match! Is it serious error? */ ret = -E_NTFS_FIXUP; } /* Replace fixup. */ *ptr = *++fixup; ptr += SECTOR_SIZE / sizeof(short); } return ret; } /* * ntfs_extend_init - Load $Extend file. */ int ntfs_extend_init(struct ntfs_sb_info *sbi) { int err; struct super_block *sb = sbi->sb; struct inode *inode, *inode2; struct MFT_REF ref; if (sbi->volume.major_ver < 3) { ntfs_notice(sb, "Skip $Extend 'cause NTFS version"); return 0; } ref.low = cpu_to_le32(MFT_REC_EXTEND); ref.high = 0; ref.seq = cpu_to_le16(MFT_REC_EXTEND); inode = ntfs_iget5(sb, &ref, &NAME_EXTEND); if (IS_ERR(inode)) { err = PTR_ERR(inode); ntfs_err(sb, "Failed to load $Extend (%d).", err); inode = NULL; goto out; } /* If ntfs_iget5() reads from disk it never returns bad inode. */ if (!S_ISDIR(inode->i_mode)) { err = -EINVAL; goto out; } /* Try to find $ObjId */ inode2 = dir_search_u(inode, &NAME_OBJID, NULL); if (inode2 && !IS_ERR(inode2)) { if (is_bad_inode(inode2)) { iput(inode2); } else { sbi->objid.ni = ntfs_i(inode2); sbi->objid_no = inode2->i_ino; } } /* Try to find $Quota */ inode2 = dir_search_u(inode, &NAME_QUOTA, NULL); if (inode2 && !IS_ERR(inode2)) { sbi->quota_no = inode2->i_ino; iput(inode2); } /* Try to find $Reparse */ inode2 = dir_search_u(inode, &NAME_REPARSE, NULL); if (inode2 && !IS_ERR(inode2)) { sbi->reparse.ni = ntfs_i(inode2); sbi->reparse_no = inode2->i_ino; } /* Try to find $UsnJrnl */ inode2 = dir_search_u(inode, &NAME_USNJRNL, NULL); if (inode2 && !IS_ERR(inode2)) { sbi->usn_jrnl_no = inode2->i_ino; iput(inode2); } err = 0; out: iput(inode); return err; } int ntfs_loadlog_and_replay(struct ntfs_inode *ni, struct ntfs_sb_info *sbi) { int err = 0; struct super_block *sb = sbi->sb; bool initialized = false; struct MFT_REF ref; struct inode *inode; /* Check for 4GB. */ if (ni->vfs_inode.i_size >= 0x100000000ull) { ntfs_err(sb, "\x24LogFile is large than 4G."); err = -EINVAL; goto out; } sbi->flags |= NTFS_FLAGS_LOG_REPLAYING; ref.low = cpu_to_le32(MFT_REC_MFT); ref.high = 0; ref.seq = cpu_to_le16(1); inode = ntfs_iget5(sb, &ref, NULL); if (IS_ERR(inode)) inode = NULL; if (!inode) { /* Try to use MFT copy. */ u64 t64 = sbi->mft.lbo; sbi->mft.lbo = sbi->mft.lbo2; inode = ntfs_iget5(sb, &ref, NULL); sbi->mft.lbo = t64; if (IS_ERR(inode)) inode = NULL; } if (!inode) { err = -EINVAL; ntfs_err(sb, "Failed to load $MFT."); goto out; } sbi->mft.ni = ntfs_i(inode); /* LogFile should not contains attribute list. */ err = ni_load_all_mi(sbi->mft.ni); if (!err) err = log_replay(ni, &initialized); iput(inode); sbi->mft.ni = NULL; sync_blockdev(sb->s_bdev); invalidate_bdev(sb->s_bdev); if (sbi->flags & NTFS_FLAGS_NEED_REPLAY) { err = 0; goto out; } if (sb_rdonly(sb) || !initialized) goto out; /* Fill LogFile by '-1' if it is initialized. */ err = ntfs_bio_fill_1(sbi, &ni->file.run); out: sbi->flags &= ~NTFS_FLAGS_LOG_REPLAYING; return err; } /* * ntfs_look_for_free_space - Look for a free space in bitmap. */ int ntfs_look_for_free_space(struct ntfs_sb_info *sbi, CLST lcn, CLST len, CLST *new_lcn, CLST *new_len, enum ALLOCATE_OPT opt) { int err; CLST alen; struct super_block *sb = sbi->sb; size_t alcn, zlen, zeroes, zlcn, zlen2, ztrim, new_zlen; struct wnd_bitmap *wnd = &sbi->used.bitmap; down_write_nested(&wnd->rw_lock, BITMAP_MUTEX_CLUSTERS); if (opt & ALLOCATE_MFT) { zlen = wnd_zone_len(wnd); if (!zlen) { err = ntfs_refresh_zone(sbi); if (err) goto up_write; zlen = wnd_zone_len(wnd); } if (!zlen) { ntfs_err(sbi->sb, "no free space to extend mft"); err = -ENOSPC; goto up_write; } lcn = wnd_zone_bit(wnd); alen = min_t(CLST, len, zlen); wnd_zone_set(wnd, lcn + alen, zlen - alen); err = wnd_set_used(wnd, lcn, alen); if (err) goto up_write; alcn = lcn; goto space_found; } /* * 'Cause cluster 0 is always used this value means that we should use * cached value of 'next_free_lcn' to improve performance. */ if (!lcn) lcn = sbi->used.next_free_lcn; if (lcn >= wnd->nbits) lcn = 0; alen = wnd_find(wnd, len, lcn, BITMAP_FIND_MARK_AS_USED, &alcn); if (alen) goto space_found; /* Try to use clusters from MftZone. */ zlen = wnd_zone_len(wnd); zeroes = wnd_zeroes(wnd); /* Check too big request */ if (len > zeroes + zlen || zlen <= NTFS_MIN_MFT_ZONE) { err = -ENOSPC; goto up_write; } /* How many clusters to cat from zone. */ zlcn = wnd_zone_bit(wnd); zlen2 = zlen >> 1; ztrim = clamp_val(len, zlen2, zlen); new_zlen = max_t(size_t, zlen - ztrim, NTFS_MIN_MFT_ZONE); wnd_zone_set(wnd, zlcn, new_zlen); /* Allocate continues clusters. */ alen = wnd_find(wnd, len, 0, BITMAP_FIND_MARK_AS_USED | BITMAP_FIND_FULL, &alcn); if (!alen) { err = -ENOSPC; goto up_write; } space_found: err = 0; *new_len = alen; *new_lcn = alcn; ntfs_unmap_meta(sb, alcn, alen); /* Set hint for next requests. */ if (!(opt & ALLOCATE_MFT)) sbi->used.next_free_lcn = alcn + alen; up_write: up_write(&wnd->rw_lock); return err; } /* * ntfs_check_for_free_space * * Check if it is possible to allocate 'clen' clusters and 'mlen' Mft records */ bool ntfs_check_for_free_space(struct ntfs_sb_info *sbi, CLST clen, CLST mlen) { size_t free, zlen, avail; struct wnd_bitmap *wnd; wnd = &sbi->used.bitmap; down_read_nested(&wnd->rw_lock, BITMAP_MUTEX_CLUSTERS); free = wnd_zeroes(wnd); zlen = min_t(size_t, NTFS_MIN_MFT_ZONE, wnd_zone_len(wnd)); up_read(&wnd->rw_lock); if (free < zlen + clen) return false; avail = free - (zlen + clen); wnd = &sbi->mft.bitmap; down_read_nested(&wnd->rw_lock, BITMAP_MUTEX_MFT); free = wnd_zeroes(wnd); zlen = wnd_zone_len(wnd); up_read(&wnd->rw_lock); if (free >= zlen + mlen) return true; return avail >= bytes_to_cluster(sbi, mlen << sbi->record_bits); } /* * ntfs_extend_mft - Allocate additional MFT records. * * sbi->mft.bitmap is locked for write. * * NOTE: recursive: * ntfs_look_free_mft -> * ntfs_extend_mft -> * attr_set_size -> * ni_insert_nonresident -> * ni_insert_attr -> * ni_ins_attr_ext -> * ntfs_look_free_mft -> * ntfs_extend_mft * * To avoid recursive always allocate space for two new MFT records * see attrib.c: "at least two MFT to avoid recursive loop". */ static int ntfs_extend_mft(struct ntfs_sb_info *sbi) { int err; struct ntfs_inode *ni = sbi->mft.ni; size_t new_mft_total; u64 new_mft_bytes, new_bitmap_bytes; struct ATTRIB *attr; struct wnd_bitmap *wnd = &sbi->mft.bitmap; new_mft_total = ALIGN(wnd->nbits + NTFS_MFT_INCREASE_STEP, 128); new_mft_bytes = (u64)new_mft_total << sbi->record_bits; /* Step 1: Resize $MFT::DATA. */ down_write(&ni->file.run_lock); err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run, new_mft_bytes, NULL, false, &attr); if (err) { up_write(&ni->file.run_lock); goto out; } attr->nres.valid_size = attr->nres.data_size; new_mft_total = le64_to_cpu(attr->nres.alloc_size) >> sbi->record_bits; ni->mi.dirty = true; /* Step 2: Resize $MFT::BITMAP. */ new_bitmap_bytes = ntfs3_bitmap_size(new_mft_total); err = attr_set_size(ni, ATTR_BITMAP, NULL, 0, &sbi->mft.bitmap.run, new_bitmap_bytes, &new_bitmap_bytes, true, NULL); /* Refresh MFT Zone if necessary. */ down_write_nested(&sbi->used.bitmap.rw_lock, BITMAP_MUTEX_CLUSTERS); ntfs_refresh_zone(sbi); up_write(&sbi->used.bitmap.rw_lock); up_write(&ni->file.run_lock); if (err) goto out; err = wnd_extend(wnd, new_mft_total); if (err) goto out; ntfs_clear_mft_tail(sbi, sbi->mft.used, new_mft_total); err = _ni_write_inode(&ni->vfs_inode, 0); out: return err; } /* * ntfs_look_free_mft - Look for a free MFT record. */ int ntfs_look_free_mft(struct ntfs_sb_info *sbi, CLST *rno, bool mft, struct ntfs_inode *ni, struct mft_inode **mi) { int err = 0; size_t zbit, zlen, from, to, fr; size_t mft_total; struct MFT_REF ref; struct super_block *sb = sbi->sb; struct wnd_bitmap *wnd = &sbi->mft.bitmap; u32 ir; static_assert(sizeof(sbi->mft.reserved_bitmap) * 8 >= MFT_REC_FREE - MFT_REC_RESERVED); if (!mft) down_write_nested(&wnd->rw_lock, BITMAP_MUTEX_MFT); zlen = wnd_zone_len(wnd); /* Always reserve space for MFT. */ if (zlen) { if (mft) { zbit = wnd_zone_bit(wnd); *rno = zbit; wnd_zone_set(wnd, zbit + 1, zlen - 1); } goto found; } /* No MFT zone. Find the nearest to '0' free MFT. */ if (!wnd_find(wnd, 1, MFT_REC_FREE, 0, &zbit)) { /* Resize MFT */ mft_total = wnd->nbits; err = ntfs_extend_mft(sbi); if (!err) { zbit = mft_total; goto reserve_mft; } if (!mft || MFT_REC_FREE == sbi->mft.next_reserved) goto out; err = 0; /* * Look for free record reserved area [11-16) == * [MFT_REC_RESERVED, MFT_REC_FREE ) MFT bitmap always * marks it as used. */ if (!sbi->mft.reserved_bitmap) { /* Once per session create internal bitmap for 5 bits. */ sbi->mft.reserved_bitmap = 0xFF; ref.high = 0; for (ir = MFT_REC_RESERVED; ir < MFT_REC_FREE; ir++) { struct inode *i; struct ntfs_inode *ni; struct MFT_REC *mrec; ref.low = cpu_to_le32(ir); ref.seq = cpu_to_le16(ir); i = ntfs_iget5(sb, &ref, NULL); if (IS_ERR(i)) { next: ntfs_notice( sb, "Invalid reserved record %x", ref.low); continue; } if (is_bad_inode(i)) { iput(i); goto next; } ni = ntfs_i(i); mrec = ni->mi.mrec; if (!is_rec_base(mrec)) goto next; if (mrec->hard_links) goto next; if (!ni_std(ni)) goto next; if (ni_find_attr(ni, NULL, NULL, ATTR_NAME, NULL, 0, NULL, NULL)) goto next; __clear_bit(ir - MFT_REC_RESERVED, &sbi->mft.reserved_bitmap); } } /* Scan 5 bits for zero. Bit 0 == MFT_REC_RESERVED */ zbit = find_next_zero_bit(&sbi->mft.reserved_bitmap, MFT_REC_FREE, MFT_REC_RESERVED); if (zbit >= MFT_REC_FREE) { sbi->mft.next_reserved = MFT_REC_FREE; goto out; } zlen = 1; sbi->mft.next_reserved = zbit; } else { reserve_mft: zlen = zbit == MFT_REC_FREE ? (MFT_REC_USER - MFT_REC_FREE) : 4; if (zbit + zlen > wnd->nbits) zlen = wnd->nbits - zbit; while (zlen > 1 && !wnd_is_free(wnd, zbit, zlen)) zlen -= 1; /* [zbit, zbit + zlen) will be used for MFT itself. */ from = sbi->mft.used; if (from < zbit) from = zbit; to = zbit + zlen; if (from < to) { ntfs_clear_mft_tail(sbi, from, to); sbi->mft.used = to; } } if (mft) { *rno = zbit; zbit += 1; zlen -= 1; } wnd_zone_set(wnd, zbit, zlen); found: if (!mft) { /* The request to get record for general purpose. */ if (sbi->mft.next_free < MFT_REC_USER) sbi->mft.next_free = MFT_REC_USER; for (;;) { if (sbi->mft.next_free >= sbi->mft.bitmap.nbits) { } else if (!wnd_find(wnd, 1, MFT_REC_USER, 0, &fr)) { sbi->mft.next_free = sbi->mft.bitmap.nbits; } else { *rno = fr; sbi->mft.next_free = *rno + 1; break; } err = ntfs_extend_mft(sbi); if (err) goto out; } } if (ni && !ni_add_subrecord(ni, *rno, mi)) { err = -ENOMEM; goto out; } /* We have found a record that are not reserved for next MFT. */ if (*rno >= MFT_REC_FREE) wnd_set_used(wnd, *rno, 1); else if (*rno >= MFT_REC_RESERVED && sbi->mft.reserved_bitmap_inited) __set_bit(*rno - MFT_REC_RESERVED, &sbi->mft.reserved_bitmap); out: if (!mft) up_write(&wnd->rw_lock); return err; } /* * ntfs_mark_rec_free - Mark record as free. * is_mft - true if we are changing MFT */ void ntfs_mark_rec_free(struct ntfs_sb_info *sbi, CLST rno, bool is_mft) { struct wnd_bitmap *wnd = &sbi->mft.bitmap; if (!is_mft) down_write_nested(&wnd->rw_lock, BITMAP_MUTEX_MFT); if (rno >= wnd->nbits) goto out; if (rno >= MFT_REC_FREE) { if (!wnd_is_used(wnd, rno, 1)) ntfs_set_state(sbi, NTFS_DIRTY_ERROR); else wnd_set_free(wnd, rno, 1); } else if (rno >= MFT_REC_RESERVED && sbi->mft.reserved_bitmap_inited) { __clear_bit(rno - MFT_REC_RESERVED, &sbi->mft.reserved_bitmap); } if (rno < wnd_zone_bit(wnd)) wnd_zone_set(wnd, rno, 1); else if (rno < sbi->mft.next_free && rno >= MFT_REC_USER) sbi->mft.next_free = rno; out: if (!is_mft) up_write(&wnd->rw_lock); } /* * ntfs_clear_mft_tail - Format empty records [from, to). * * sbi->mft.bitmap is locked for write. */ int ntfs_clear_mft_tail(struct ntfs_sb_info *sbi, size_t from, size_t to) { int err; u32 rs; u64 vbo; struct runs_tree *run; struct ntfs_inode *ni; if (from >= to) return 0; rs = sbi->record_size; ni = sbi->mft.ni; run = &ni->file.run; down_read(&ni->file.run_lock); vbo = (u64)from * rs; for (; from < to; from++, vbo += rs) { struct ntfs_buffers nb; err = ntfs_get_bh(sbi, run, vbo, rs, &nb); if (err) goto out; err = ntfs_write_bh(sbi, &sbi->new_rec->rhdr, &nb, 0); nb_put(&nb); if (err) goto out; } out: sbi->mft.used = from; up_read(&ni->file.run_lock); return err; } /* * ntfs_refresh_zone - Refresh MFT zone. * * sbi->used.bitmap is locked for rw. * sbi->mft.bitmap is locked for write. * sbi->mft.ni->file.run_lock for write. */ int ntfs_refresh_zone(struct ntfs_sb_info *sbi) { CLST lcn, vcn, len; size_t lcn_s, zlen; struct wnd_bitmap *wnd = &sbi->used.bitmap; struct ntfs_inode *ni = sbi->mft.ni; /* Do not change anything unless we have non empty MFT zone. */ if (wnd_zone_len(wnd)) return 0; vcn = bytes_to_cluster(sbi, (u64)sbi->mft.bitmap.nbits << sbi->record_bits); if (!run_lookup_entry(&ni->file.run, vcn - 1, &lcn, &len, NULL)) lcn = SPARSE_LCN; /* We should always find Last Lcn for MFT. */ if (lcn == SPARSE_LCN) return -EINVAL; lcn_s = lcn + 1; /* Try to allocate clusters after last MFT run. */ zlen = wnd_find(wnd, sbi->zone_max, lcn_s, 0, &lcn_s); wnd_zone_set(wnd, lcn_s, zlen); return 0; } /* * ntfs_update_mftmirr - Update $MFTMirr data. */ void ntfs_update_mftmirr(struct ntfs_sb_info *sbi, int wait) { int err; struct super_block *sb = sbi->sb; u32 blocksize, bytes; sector_t block1, block2; /* * sb can be NULL here. In this case sbi->flags should be 0 too. */ if (!sb || !(sbi->flags & NTFS_FLAGS_MFTMIRR) || unlikely(ntfs3_forced_shutdown(sb))) return; blocksize = sb->s_blocksize; bytes = sbi->mft.recs_mirr << sbi->record_bits; block1 = sbi->mft.lbo >> sb->s_blocksize_bits; block2 = sbi->mft.lbo2 >> sb->s_blocksize_bits; for (; bytes >= blocksize; bytes -= blocksize) { struct buffer_head *bh1, *bh2; bh1 = sb_bread(sb, block1++); if (!bh1) return; bh2 = sb_getblk(sb, block2++); if (!bh2) { put_bh(bh1); return; } if (buffer_locked(bh2)) __wait_on_buffer(bh2); lock_buffer(bh2); memcpy(bh2->b_data, bh1->b_data, blocksize); set_buffer_uptodate(bh2); mark_buffer_dirty(bh2); unlock_buffer(bh2); put_bh(bh1); bh1 = NULL; err = wait ? sync_dirty_buffer(bh2) : 0; put_bh(bh2); if (err) return; } sbi->flags &= ~NTFS_FLAGS_MFTMIRR; } /* * ntfs_bad_inode * * Marks inode as bad and marks fs as 'dirty' */ void ntfs_bad_inode(struct inode *inode, const char *hint) { struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info; ntfs_inode_err(inode, "%s", hint); make_bad_inode(inode); ntfs_set_state(sbi, NTFS_DIRTY_ERROR); } /* * ntfs_set_state * * Mount: ntfs_set_state(NTFS_DIRTY_DIRTY) * Umount: ntfs_set_state(NTFS_DIRTY_CLEAR) * NTFS error: ntfs_set_state(NTFS_DIRTY_ERROR) */ int ntfs_set_state(struct ntfs_sb_info *sbi, enum NTFS_DIRTY_FLAGS dirty) { int err; struct ATTRIB *attr; struct VOLUME_INFO *info; struct mft_inode *mi; struct ntfs_inode *ni; __le16 info_flags; /* * Do not change state if fs was real_dirty. * Do not change state if fs already dirty(clear). * Do not change any thing if mounted read only. */ if (sbi->volume.real_dirty || sb_rdonly(sbi->sb)) return 0; /* Check cached value. */ if ((dirty == NTFS_DIRTY_CLEAR ? 0 : VOLUME_FLAG_DIRTY) == (sbi->volume.flags & VOLUME_FLAG_DIRTY)) return 0; ni = sbi->volume.ni; if (!ni) return -EINVAL; mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_DIRTY); attr = ni_find_attr(ni, NULL, NULL, ATTR_VOL_INFO, NULL, 0, NULL, &mi); if (!attr) { err = -EINVAL; goto out; } info = resident_data_ex(attr, SIZEOF_ATTRIBUTE_VOLUME_INFO); if (!info) { err = -EINVAL; goto out; } info_flags = info->flags; switch (dirty) { case NTFS_DIRTY_ERROR: ntfs_notice(sbi->sb, "Mark volume as dirty due to NTFS errors"); sbi->volume.real_dirty = true; fallthrough; case NTFS_DIRTY_DIRTY: info->flags |= VOLUME_FLAG_DIRTY; break; case NTFS_DIRTY_CLEAR: info->flags &= ~VOLUME_FLAG_DIRTY; break; } /* Cache current volume flags. */ if (info_flags != info->flags) { sbi->volume.flags = info->flags; mi->dirty = true; } err = 0; out: ni_unlock(ni); if (err) return err; mark_inode_dirty_sync(&ni->vfs_inode); /* verify(!ntfs_update_mftmirr()); */ /* write mft record on disk. */ err = _ni_write_inode(&ni->vfs_inode, 1); return err; } /* * security_hash - Calculates a hash of security descriptor. */ static inline __le32 security_hash(const void *sd, size_t bytes) { u32 hash = 0; const __le32 *ptr = sd; bytes >>= 2; while (bytes--) hash = ((hash >> 0x1D) | (hash << 3)) + le32_to_cpu(*ptr++); return cpu_to_le32(hash); } /* * simple wrapper for sb_bread_unmovable. */ struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block) { struct ntfs_sb_info *sbi = sb->s_fs_info; struct buffer_head *bh; if (unlikely(block >= sbi->volume.blocks)) { /* prevent generic message "attempt to access beyond end of device" */ ntfs_err(sb, "try to read out of volume at offset 0x%llx", (u64)block << sb->s_blocksize_bits); return NULL; } bh = sb_bread_unmovable(sb, block); if (bh) return bh; ntfs_err(sb, "failed to read volume at offset 0x%llx", (u64)block << sb->s_blocksize_bits); return NULL; } int ntfs_sb_read(struct super_block *sb, u64 lbo, size_t bytes, void *buffer) { struct block_device *bdev = sb->s_bdev; u32 blocksize = sb->s_blocksize; u64 block = lbo >> sb->s_blocksize_bits; u32 off = lbo & (blocksize - 1); u32 op = blocksize - off; for (; bytes; block += 1, off = 0, op = blocksize) { struct buffer_head *bh = __bread(bdev, block, blocksize); if (!bh) return -EIO; if (op > bytes) op = bytes; memcpy(buffer, bh->b_data + off, op); put_bh(bh); bytes -= op; buffer = Add2Ptr(buffer, op); } return 0; } int ntfs_sb_write(struct super_block *sb, u64 lbo, size_t bytes, const void *buf, int wait) { u32 blocksize = sb->s_blocksize; struct block_device *bdev = sb->s_bdev; sector_t block = lbo >> sb->s_blocksize_bits; u32 off = lbo & (blocksize - 1); u32 op = blocksize - off; struct buffer_head *bh; if (!wait && (sb->s_flags & SB_SYNCHRONOUS)) wait = 1; for (; bytes; block += 1, off = 0, op = blocksize) { if (op > bytes) op = bytes; if (op < blocksize) { bh = __bread(bdev, block, blocksize); if (!bh) { ntfs_err(sb, "failed to read block %llx", (u64)block); return -EIO; } } else { bh = __getblk(bdev, block, blocksize); if (!bh) return -ENOMEM; } if (buffer_locked(bh)) __wait_on_buffer(bh); lock_buffer(bh); if (buf) { memcpy(bh->b_data + off, buf, op); buf = Add2Ptr(buf, op); } else { memset(bh->b_data + off, -1, op); } set_buffer_uptodate(bh); mark_buffer_dirty(bh); unlock_buffer(bh); if (wait) { int err = sync_dirty_buffer(bh); if (err) { ntfs_err( sb, "failed to sync buffer at block %llx, error %d", (u64)block, err); put_bh(bh); return err; } } put_bh(bh); bytes -= op; } return 0; } int ntfs_sb_write_run(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo, const void *buf, size_t bytes, int sync) { struct super_block *sb = sbi->sb; u8 cluster_bits = sbi->cluster_bits; u32 off = vbo & sbi->cluster_mask; CLST lcn, clen, vcn = vbo >> cluster_bits, vcn_next; u64 lbo, len; size_t idx; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) return -ENOENT; if (lcn == SPARSE_LCN) return -EINVAL; lbo = ((u64)lcn << cluster_bits) + off; len = ((u64)clen << cluster_bits) - off; for (;;) { u32 op = min_t(u64, len, bytes); int err = ntfs_sb_write(sb, lbo, op, buf, sync); if (err) return err; bytes -= op; if (!bytes) break; vcn_next = vcn + clen; if (!run_get_entry(run, ++idx, &vcn, &lcn, &clen) || vcn != vcn_next) return -ENOENT; if (lcn == SPARSE_LCN) return -EINVAL; if (buf) buf = Add2Ptr(buf, op); lbo = ((u64)lcn << cluster_bits); len = ((u64)clen << cluster_bits); } return 0; } struct buffer_head *ntfs_bread_run(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo) { struct super_block *sb = sbi->sb; u8 cluster_bits = sbi->cluster_bits; CLST lcn; u64 lbo; if (!run_lookup_entry(run, vbo >> cluster_bits, &lcn, NULL, NULL)) return ERR_PTR(-ENOENT); lbo = ((u64)lcn << cluster_bits) + (vbo & sbi->cluster_mask); return ntfs_bread(sb, lbo >> sb->s_blocksize_bits); } int ntfs_read_run_nb(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo, void *buf, u32 bytes, struct ntfs_buffers *nb) { int err; struct super_block *sb = sbi->sb; u32 blocksize = sb->s_blocksize; u8 cluster_bits = sbi->cluster_bits; u32 off = vbo & sbi->cluster_mask; u32 nbh = 0; CLST vcn_next, vcn = vbo >> cluster_bits; CLST lcn, clen; u64 lbo, len; size_t idx; struct buffer_head *bh; if (!run) { /* First reading of $Volume + $MFTMirr + $LogFile goes here. */ if (vbo > MFT_REC_VOL * sbi->record_size) { err = -ENOENT; goto out; } /* Use absolute boot's 'MFTCluster' to read record. */ lbo = vbo + sbi->mft.lbo; len = sbi->record_size; } else if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) { err = -ENOENT; goto out; } else { if (lcn == SPARSE_LCN) { err = -EINVAL; goto out; } lbo = ((u64)lcn << cluster_bits) + off; len = ((u64)clen << cluster_bits) - off; } off = lbo & (blocksize - 1); if (nb) { nb->off = off; nb->bytes = bytes; } for (;;) { u32 len32 = len >= bytes ? bytes : len; sector_t block = lbo >> sb->s_blocksize_bits; do { u32 op = blocksize - off; if (op > len32) op = len32; bh = ntfs_bread(sb, block); if (!bh) { err = -EIO; goto out; } if (buf) { memcpy(buf, bh->b_data + off, op); buf = Add2Ptr(buf, op); } if (!nb) { put_bh(bh); } else if (nbh >= ARRAY_SIZE(nb->bh)) { err = -EINVAL; goto out; } else { nb->bh[nbh++] = bh; nb->nbufs = nbh; } bytes -= op; if (!bytes) return 0; len32 -= op; block += 1; off = 0; } while (len32); vcn_next = vcn + clen; if (!run_get_entry(run, ++idx, &vcn, &lcn, &clen) || vcn != vcn_next) { err = -ENOENT; goto out; } if (lcn == SPARSE_LCN) { err = -EINVAL; goto out; } lbo = ((u64)lcn << cluster_bits); len = ((u64)clen << cluster_bits); } out: if (!nbh) return err; while (nbh) { put_bh(nb->bh[--nbh]); nb->bh[nbh] = NULL; } nb->nbufs = 0; return err; } /* * ntfs_read_bh * * Return: < 0 if error, 0 if ok, -E_NTFS_FIXUP if need to update fixups. */ int ntfs_read_bh(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo, struct NTFS_RECORD_HEADER *rhdr, u32 bytes, struct ntfs_buffers *nb) { int err = ntfs_read_run_nb(sbi, run, vbo, rhdr, bytes, nb); if (err) return err; return ntfs_fix_post_read(rhdr, nb->bytes, true); } int ntfs_get_bh(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo, u32 bytes, struct ntfs_buffers *nb) { int err = 0; struct super_block *sb = sbi->sb; u32 blocksize = sb->s_blocksize; u8 cluster_bits = sbi->cluster_bits; CLST vcn_next, vcn = vbo >> cluster_bits; u32 off; u32 nbh = 0; CLST lcn, clen; u64 lbo, len; size_t idx; nb->bytes = bytes; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) { err = -ENOENT; goto out; } off = vbo & sbi->cluster_mask; lbo = ((u64)lcn << cluster_bits) + off; len = ((u64)clen << cluster_bits) - off; nb->off = off = lbo & (blocksize - 1); for (;;) { u32 len32 = min_t(u64, len, bytes); sector_t block = lbo >> sb->s_blocksize_bits; do { u32 op; struct buffer_head *bh; if (nbh >= ARRAY_SIZE(nb->bh)) { err = -EINVAL; goto out; } op = blocksize - off; if (op > len32) op = len32; if (op == blocksize) { bh = sb_getblk(sb, block); if (!bh) { err = -ENOMEM; goto out; } if (buffer_locked(bh)) __wait_on_buffer(bh); set_buffer_uptodate(bh); } else { bh = ntfs_bread(sb, block); if (!bh) { err = -EIO; goto out; } } nb->bh[nbh++] = bh; bytes -= op; if (!bytes) { nb->nbufs = nbh; return 0; } block += 1; len32 -= op; off = 0; } while (len32); vcn_next = vcn + clen; if (!run_get_entry(run, ++idx, &vcn, &lcn, &clen) || vcn != vcn_next) { err = -ENOENT; goto out; } lbo = ((u64)lcn << cluster_bits); len = ((u64)clen << cluster_bits); } out: while (nbh) { put_bh(nb->bh[--nbh]); nb->bh[nbh] = NULL; } nb->nbufs = 0; return err; } int ntfs_write_bh(struct ntfs_sb_info *sbi, struct NTFS_RECORD_HEADER *rhdr, struct ntfs_buffers *nb, int sync) { int err = 0; struct super_block *sb = sbi->sb; u32 block_size = sb->s_blocksize; u32 bytes = nb->bytes; u32 off = nb->off; u16 fo = le16_to_cpu(rhdr->fix_off); u16 fn = le16_to_cpu(rhdr->fix_num); u32 idx; __le16 *fixup; __le16 sample; if ((fo & 1) || fo + fn * sizeof(short) > SECTOR_SIZE || !fn-- || fn * SECTOR_SIZE > bytes) { return -EINVAL; } for (idx = 0; bytes && idx < nb->nbufs; idx += 1, off = 0) { u32 op = block_size - off; char *bh_data; struct buffer_head *bh = nb->bh[idx]; __le16 *ptr, *end_data; if (op > bytes) op = bytes; if (buffer_locked(bh)) __wait_on_buffer(bh); lock_buffer(bh); bh_data = bh->b_data + off; end_data = Add2Ptr(bh_data, op); memcpy(bh_data, rhdr, op); if (!idx) { u16 t16; fixup = Add2Ptr(bh_data, fo); sample = *fixup; t16 = le16_to_cpu(sample); if (t16 >= 0x7FFF) { sample = *fixup = cpu_to_le16(1); } else { sample = cpu_to_le16(t16 + 1); *fixup = sample; } *(__le16 *)Add2Ptr(rhdr, fo) = sample; } ptr = Add2Ptr(bh_data, SECTOR_SIZE - sizeof(short)); do { *++fixup = *ptr; *ptr = sample; ptr += SECTOR_SIZE / sizeof(short); } while (ptr < end_data); set_buffer_uptodate(bh); mark_buffer_dirty(bh); unlock_buffer(bh); if (sync) { int err2 = sync_dirty_buffer(bh); if (!err && err2) err = err2; } bytes -= op; rhdr = Add2Ptr(rhdr, op); } return err; } /* * ntfs_bio_pages - Read/write pages from/to disk. */ int ntfs_bio_pages(struct ntfs_sb_info *sbi, const struct runs_tree *run, struct page **pages, u32 nr_pages, u64 vbo, u32 bytes, enum req_op op) { int err = 0; struct bio *new, *bio = NULL; struct super_block *sb = sbi->sb; struct block_device *bdev = sb->s_bdev; struct page *page; u8 cluster_bits = sbi->cluster_bits; CLST lcn, clen, vcn, vcn_next; u32 add, off, page_idx; u64 lbo, len; size_t run_idx; struct blk_plug plug; if (!bytes) return 0; blk_start_plug(&plug); /* Align vbo and bytes to be 512 bytes aligned. */ lbo = (vbo + bytes + 511) & ~511ull; vbo = vbo & ~511ull; bytes = lbo - vbo; vcn = vbo >> cluster_bits; if (!run_lookup_entry(run, vcn, &lcn, &clen, &run_idx)) { err = -ENOENT; goto out; } off = vbo & sbi->cluster_mask; page_idx = 0; page = pages[0]; for (;;) { lbo = ((u64)lcn << cluster_bits) + off; len = ((u64)clen << cluster_bits) - off; new_bio: new = bio_alloc(bdev, nr_pages - page_idx, op, GFP_NOFS); if (bio) { bio_chain(bio, new); submit_bio(bio); } bio = new; bio->bi_iter.bi_sector = lbo >> 9; while (len) { off = vbo & (PAGE_SIZE - 1); add = off + len > PAGE_SIZE ? (PAGE_SIZE - off) : len; if (bio_add_page(bio, page, add, off) < add) goto new_bio; if (bytes <= add) goto out; bytes -= add; vbo += add; if (add + off == PAGE_SIZE) { page_idx += 1; if (WARN_ON(page_idx >= nr_pages)) { err = -EINVAL; goto out; } page = pages[page_idx]; } if (len <= add) break; len -= add; lbo += add; } vcn_next = vcn + clen; if (!run_get_entry(run, ++run_idx, &vcn, &lcn, &clen) || vcn != vcn_next) { err = -ENOENT; goto out; } off = 0; } out: if (bio) { if (!err) err = submit_bio_wait(bio); bio_put(bio); } blk_finish_plug(&plug); return err; } /* * ntfs_bio_fill_1 - Helper for ntfs_loadlog_and_replay(). * * Fill on-disk logfile range by (-1) * this means empty logfile. */ int ntfs_bio_fill_1(struct ntfs_sb_info *sbi, const struct runs_tree *run) { int err = 0; struct super_block *sb = sbi->sb; struct block_device *bdev = sb->s_bdev; u8 cluster_bits = sbi->cluster_bits; struct bio *new, *bio = NULL; CLST lcn, clen; u64 lbo, len; size_t run_idx; struct page *fill; void *kaddr; struct blk_plug plug; fill = alloc_page(GFP_KERNEL); if (!fill) return -ENOMEM; kaddr = kmap_atomic(fill); memset(kaddr, -1, PAGE_SIZE); kunmap_atomic(kaddr); flush_dcache_page(fill); lock_page(fill); if (!run_lookup_entry(run, 0, &lcn, &clen, &run_idx)) { err = -ENOENT; goto out; } /* * TODO: Try blkdev_issue_write_same. */ blk_start_plug(&plug); do { lbo = (u64)lcn << cluster_bits; len = (u64)clen << cluster_bits; new_bio: new = bio_alloc(bdev, BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOFS); if (bio) { bio_chain(bio, new); submit_bio(bio); } bio = new; bio->bi_iter.bi_sector = lbo >> 9; for (;;) { u32 add = len > PAGE_SIZE ? PAGE_SIZE : len; if (bio_add_page(bio, fill, add, 0) < add) goto new_bio; lbo += add; if (len <= add) break; len -= add; } } while (run_get_entry(run, ++run_idx, NULL, &lcn, &clen)); if (!err) err = submit_bio_wait(bio); bio_put(bio); blk_finish_plug(&plug); out: unlock_page(fill); put_page(fill); return err; } int ntfs_vbo_to_lbo(struct ntfs_sb_info *sbi, const struct runs_tree *run, u64 vbo, u64 *lbo, u64 *bytes) { u32 off; CLST lcn, len; u8 cluster_bits = sbi->cluster_bits; if (!run_lookup_entry(run, vbo >> cluster_bits, &lcn, &len, NULL)) return -ENOENT; off = vbo & sbi->cluster_mask; *lbo = lcn == SPARSE_LCN ? -1 : (((u64)lcn << cluster_bits) + off); *bytes = ((u64)len << cluster_bits) - off; return 0; } struct ntfs_inode *ntfs_new_inode(struct ntfs_sb_info *sbi, CLST rno, enum RECORD_FLAG flag) { int err = 0; struct super_block *sb = sbi->sb; struct inode *inode = new_inode(sb); struct ntfs_inode *ni; if (!inode) return ERR_PTR(-ENOMEM); ni = ntfs_i(inode); err = mi_format_new(&ni->mi, sbi, rno, flag, false); if (err) goto out; inode->i_ino = rno; if (insert_inode_locked(inode) < 0) { err = -EIO; goto out; } out: if (err) { make_bad_inode(inode); iput(inode); ni = ERR_PTR(err); } return ni; } /* * O:BAG:BAD:(A;OICI;FA;;;WD) * Owner S-1-5-32-544 (Administrators) * Group S-1-5-32-544 (Administrators) * ACE: allow S-1-1-0 (Everyone) with FILE_ALL_ACCESS */ const u8 s_default_security[] __aligned(8) = { 0x01, 0x00, 0x04, 0x80, 0x30, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x02, 0x00, 0x1C, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x03, 0x14, 0x00, 0xFF, 0x01, 0x1F, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x20, 0x00, 0x00, 0x00, 0x20, 0x02, 0x00, 0x00, 0x01, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x20, 0x00, 0x00, 0x00, 0x20, 0x02, 0x00, 0x00, }; static_assert(sizeof(s_default_security) == 0x50); static inline u32 sid_length(const struct SID *sid) { return struct_size(sid, SubAuthority, sid->SubAuthorityCount); } /* * is_acl_valid * * Thanks Mark Harmstone for idea. */ static bool is_acl_valid(const struct ACL *acl, u32 len) { const struct ACE_HEADER *ace; u32 i; u16 ace_count, ace_size; if (acl->AclRevision != ACL_REVISION && acl->AclRevision != ACL_REVISION_DS) { /* * This value should be ACL_REVISION, unless the ACL contains an * object-specific ACE, in which case this value must be ACL_REVISION_DS. * All ACEs in an ACL must be at the same revision level. */ return false; } if (acl->Sbz1) return false; if (le16_to_cpu(acl->AclSize) > len) return false; if (acl->Sbz2) return false; len -= sizeof(struct ACL); ace = (struct ACE_HEADER *)&acl[1]; ace_count = le16_to_cpu(acl->AceCount); for (i = 0; i < ace_count; i++) { if (len < sizeof(struct ACE_HEADER)) return false; ace_size = le16_to_cpu(ace->AceSize); if (len < ace_size) return false; len -= ace_size; ace = Add2Ptr(ace, ace_size); } return true; } bool is_sd_valid(const struct SECURITY_DESCRIPTOR_RELATIVE *sd, u32 len) { u32 sd_owner, sd_group, sd_sacl, sd_dacl; if (len < sizeof(struct SECURITY_DESCRIPTOR_RELATIVE)) return false; if (sd->Revision != 1) return false; if (sd->Sbz1) return false; if (!(sd->Control & SE_SELF_RELATIVE)) return false; sd_owner = le32_to_cpu(sd->Owner); if (sd_owner) { const struct SID *owner = Add2Ptr(sd, sd_owner); if (sd_owner + offsetof(struct SID, SubAuthority) > len) return false; if (owner->Revision != 1) return false; if (sd_owner + sid_length(owner) > len) return false; } sd_group = le32_to_cpu(sd->Group); if (sd_group) { const struct SID *group = Add2Ptr(sd, sd_group); if (sd_group + offsetof(struct SID, SubAuthority) > len) return false; if (group->Revision != 1) return false; if (sd_group + sid_length(group) > len) return false; } sd_sacl = le32_to_cpu(sd->Sacl); if (sd_sacl) { const struct ACL *sacl = Add2Ptr(sd, sd_sacl); if (sd_sacl + sizeof(struct ACL) > len) return false; if (!is_acl_valid(sacl, len - sd_sacl)) return false; } sd_dacl = le32_to_cpu(sd->Dacl); if (sd_dacl) { const struct ACL *dacl = Add2Ptr(sd, sd_dacl); if (sd_dacl + sizeof(struct ACL) > len) return false; if (!is_acl_valid(dacl, len - sd_dacl)) return false; } return true; } /* * ntfs_security_init - Load and parse $Secure. */ int ntfs_security_init(struct ntfs_sb_info *sbi) { int err; struct super_block *sb = sbi->sb; struct inode *inode; struct ntfs_inode *ni; struct MFT_REF ref; struct ATTRIB *attr; struct ATTR_LIST_ENTRY *le; u64 sds_size; size_t off; struct NTFS_DE *ne; struct NTFS_DE_SII *sii_e; struct ntfs_fnd *fnd_sii = NULL; const struct INDEX_ROOT *root_sii; const struct INDEX_ROOT *root_sdh; struct ntfs_index *indx_sdh = &sbi->security.index_sdh; struct ntfs_index *indx_sii = &sbi->security.index_sii; ref.low = cpu_to_le32(MFT_REC_SECURE); ref.high = 0; ref.seq = cpu_to_le16(MFT_REC_SECURE); inode = ntfs_iget5(sb, &ref, &NAME_SECURE); if (IS_ERR(inode)) { err = PTR_ERR(inode); ntfs_err(sb, "Failed to load $Secure (%d).", err); inode = NULL; goto out; } ni = ntfs_i(inode); le = NULL; attr = ni_find_attr(ni, NULL, &le, ATTR_ROOT, SDH_NAME, ARRAY_SIZE(SDH_NAME), NULL, NULL); if (!attr || !(root_sdh = resident_data_ex(attr, sizeof(struct INDEX_ROOT))) || root_sdh->type != ATTR_ZERO || root_sdh->rule != NTFS_COLLATION_TYPE_SECURITY_HASH || offsetof(struct INDEX_ROOT, ihdr) + le32_to_cpu(root_sdh->ihdr.used) > le32_to_cpu(attr->res.data_size)) { ntfs_err(sb, "$Secure::$SDH is corrupted."); err = -EINVAL; goto out; } err = indx_init(indx_sdh, sbi, attr, INDEX_MUTEX_SDH); if (err) { ntfs_err(sb, "Failed to initialize $Secure::$SDH (%d).", err); goto out; } attr = ni_find_attr(ni, attr, &le, ATTR_ROOT, SII_NAME, ARRAY_SIZE(SII_NAME), NULL, NULL); if (!attr || !(root_sii = resident_data_ex(attr, sizeof(struct INDEX_ROOT))) || root_sii->type != ATTR_ZERO || root_sii->rule != NTFS_COLLATION_TYPE_UINT || offsetof(struct INDEX_ROOT, ihdr) + le32_to_cpu(root_sii->ihdr.used) > le32_to_cpu(attr->res.data_size)) { ntfs_err(sb, "$Secure::$SII is corrupted."); err = -EINVAL; goto out; } err = indx_init(indx_sii, sbi, attr, INDEX_MUTEX_SII); if (err) { ntfs_err(sb, "Failed to initialize $Secure::$SII (%d).", err); goto out; } fnd_sii = fnd_get(); if (!fnd_sii) { err = -ENOMEM; goto out; } sds_size = inode->i_size; /* Find the last valid Id. */ sbi->security.next_id = SECURITY_ID_FIRST; /* Always write new security at the end of bucket. */ sbi->security.next_off = ALIGN(sds_size - SecurityDescriptorsBlockSize, 16); off = 0; ne = NULL; for (;;) { u32 next_id; err = indx_find_raw(indx_sii, ni, root_sii, &ne, &off, fnd_sii); if (err || !ne) break; sii_e = (struct NTFS_DE_SII *)ne; if (le16_to_cpu(ne->view.data_size) < sizeof(sii_e->sec_hdr)) continue; next_id = le32_to_cpu(sii_e->sec_id) + 1; if (next_id >= sbi->security.next_id) sbi->security.next_id = next_id; } sbi->security.ni = ni; inode = NULL; out: iput(inode); fnd_put(fnd_sii); return err; } /* * ntfs_get_security_by_id - Read security descriptor by id. */ int ntfs_get_security_by_id(struct ntfs_sb_info *sbi, __le32 security_id, struct SECURITY_DESCRIPTOR_RELATIVE **sd, size_t *size) { int err; int diff; struct ntfs_inode *ni = sbi->security.ni; struct ntfs_index *indx = &sbi->security.index_sii; void *p = NULL; struct NTFS_DE_SII *sii_e; struct ntfs_fnd *fnd_sii; struct SECURITY_HDR d_security; const struct INDEX_ROOT *root_sii; u32 t32; *sd = NULL; mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_SECURITY); fnd_sii = fnd_get(); if (!fnd_sii) { err = -ENOMEM; goto out; } root_sii = indx_get_root(indx, ni, NULL, NULL); if (!root_sii) { err = -EINVAL; goto out; } /* Try to find this SECURITY descriptor in SII indexes. */ err = indx_find(indx, ni, root_sii, &security_id, sizeof(security_id), NULL, &diff, (struct NTFS_DE **)&sii_e, fnd_sii); if (err) goto out; if (diff) goto out; t32 = le32_to_cpu(sii_e->sec_hdr.size); if (t32 < sizeof(struct SECURITY_HDR)) { err = -EINVAL; goto out; } if (t32 > sizeof(struct SECURITY_HDR) + 0x10000) { /* Looks like too big security. 0x10000 - is arbitrary big number. */ err = -EFBIG; goto out; } *size = t32 - sizeof(struct SECURITY_HDR); p = kmalloc(*size, GFP_NOFS); if (!p) { err = -ENOMEM; goto out; } err = ntfs_read_run_nb(sbi, &ni->file.run, le64_to_cpu(sii_e->sec_hdr.off), &d_security, sizeof(d_security), NULL); if (err) goto out; if (memcmp(&d_security, &sii_e->sec_hdr, sizeof(d_security))) { err = -EINVAL; goto out; } err = ntfs_read_run_nb(sbi, &ni->file.run, le64_to_cpu(sii_e->sec_hdr.off) + sizeof(struct SECURITY_HDR), p, *size, NULL); if (err) goto out; *sd = p; p = NULL; out: kfree(p); fnd_put(fnd_sii); ni_unlock(ni); return err; } /* * ntfs_insert_security - Insert security descriptor into $Secure::SDS. * * SECURITY Descriptor Stream data is organized into chunks of 256K bytes * and it contains a mirror copy of each security descriptor. When writing * to a security descriptor at location X, another copy will be written at * location (X+256K). * When writing a security descriptor that will cross the 256K boundary, * the pointer will be advanced by 256K to skip * over the mirror portion. */ int ntfs_insert_security(struct ntfs_sb_info *sbi, const struct SECURITY_DESCRIPTOR_RELATIVE *sd, u32 size_sd, __le32 *security_id, bool *inserted) { int err, diff; struct ntfs_inode *ni = sbi->security.ni; struct ntfs_index *indx_sdh = &sbi->security.index_sdh; struct ntfs_index *indx_sii = &sbi->security.index_sii; struct NTFS_DE_SDH *e; struct NTFS_DE_SDH sdh_e; struct NTFS_DE_SII sii_e; struct SECURITY_HDR *d_security; u32 new_sec_size = size_sd + sizeof(struct SECURITY_HDR); u32 aligned_sec_size = ALIGN(new_sec_size, 16); struct SECURITY_KEY hash_key; struct ntfs_fnd *fnd_sdh = NULL; const struct INDEX_ROOT *root_sdh; const struct INDEX_ROOT *root_sii; u64 mirr_off, new_sds_size; u32 next, left; static_assert((1 << Log2OfSecurityDescriptorsBlockSize) == SecurityDescriptorsBlockSize); hash_key.hash = security_hash(sd, size_sd); hash_key.sec_id = SECURITY_ID_INVALID; if (inserted) *inserted = false; *security_id = SECURITY_ID_INVALID; /* Allocate a temporal buffer. */ d_security = kzalloc(aligned_sec_size, GFP_NOFS); if (!d_security) return -ENOMEM; mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_SECURITY); fnd_sdh = fnd_get(); if (!fnd_sdh) { err = -ENOMEM; goto out; } root_sdh = indx_get_root(indx_sdh, ni, NULL, NULL); if (!root_sdh) { err = -EINVAL; goto out; } root_sii = indx_get_root(indx_sii, ni, NULL, NULL); if (!root_sii) { err = -EINVAL; goto out; } /* * Check if such security already exists. * Use "SDH" and hash -> to get the offset in "SDS". */ err = indx_find(indx_sdh, ni, root_sdh, &hash_key, sizeof(hash_key), &d_security->key.sec_id, &diff, (struct NTFS_DE **)&e, fnd_sdh); if (err) goto out; while (e) { if (le32_to_cpu(e->sec_hdr.size) == new_sec_size) { err = ntfs_read_run_nb(sbi, &ni->file.run, le64_to_cpu(e->sec_hdr.off), d_security, new_sec_size, NULL); if (err) goto out; if (le32_to_cpu(d_security->size) == new_sec_size && d_security->key.hash == hash_key.hash && !memcmp(d_security + 1, sd, size_sd)) { /* Such security already exists. */ *security_id = d_security->key.sec_id; err = 0; goto out; } } err = indx_find_sort(indx_sdh, ni, root_sdh, (struct NTFS_DE **)&e, fnd_sdh); if (err) goto out; if (!e || e->key.hash != hash_key.hash) break; } /* Zero unused space. */ next = sbi->security.next_off & (SecurityDescriptorsBlockSize - 1); left = SecurityDescriptorsBlockSize - next; /* Zero gap until SecurityDescriptorsBlockSize. */ if (left < new_sec_size) { /* Zero "left" bytes from sbi->security.next_off. */ sbi->security.next_off += SecurityDescriptorsBlockSize + left; } /* Zero tail of previous security. */ //used = ni->vfs_inode.i_size & (SecurityDescriptorsBlockSize - 1); /* * Example: * 0x40438 == ni->vfs_inode.i_size * 0x00440 == sbi->security.next_off * need to zero [0x438-0x440) * if (next > used) { * u32 tozero = next - used; * zero "tozero" bytes from sbi->security.next_off - tozero */ /* Format new security descriptor. */ d_security->key.hash = hash_key.hash; d_security->key.sec_id = cpu_to_le32(sbi->security.next_id); d_security->off = cpu_to_le64(sbi->security.next_off); d_security->size = cpu_to_le32(new_sec_size); memcpy(d_security + 1, sd, size_sd); /* Write main SDS bucket. */ err = ntfs_sb_write_run(sbi, &ni->file.run, sbi->security.next_off, d_security, aligned_sec_size, 0); if (err) goto out; mirr_off = sbi->security.next_off + SecurityDescriptorsBlockSize; new_sds_size = mirr_off + aligned_sec_size; if (new_sds_size > ni->vfs_inode.i_size) { err = attr_set_size(ni, ATTR_DATA, SDS_NAME, ARRAY_SIZE(SDS_NAME), &ni->file.run, new_sds_size, &new_sds_size, false, NULL); if (err) goto out; } /* Write copy SDS bucket. */ err = ntfs_sb_write_run(sbi, &ni->file.run, mirr_off, d_security, aligned_sec_size, 0); if (err) goto out; /* Fill SII entry. */ sii_e.de.view.data_off = cpu_to_le16(offsetof(struct NTFS_DE_SII, sec_hdr)); sii_e.de.view.data_size = cpu_to_le16(sizeof(struct SECURITY_HDR)); sii_e.de.view.res = 0; sii_e.de.size = cpu_to_le16(sizeof(struct NTFS_DE_SII)); sii_e.de.key_size = cpu_to_le16(sizeof(d_security->key.sec_id)); sii_e.de.flags = 0; sii_e.de.res = 0; sii_e.sec_id = d_security->key.sec_id; memcpy(&sii_e.sec_hdr, d_security, sizeof(struct SECURITY_HDR)); err = indx_insert_entry(indx_sii, ni, &sii_e.de, NULL, NULL, 0); if (err) goto out; /* Fill SDH entry. */ sdh_e.de.view.data_off = cpu_to_le16(offsetof(struct NTFS_DE_SDH, sec_hdr)); sdh_e.de.view.data_size = cpu_to_le16(sizeof(struct SECURITY_HDR)); sdh_e.de.view.res = 0; sdh_e.de.size = cpu_to_le16(SIZEOF_SDH_DIRENTRY); sdh_e.de.key_size = cpu_to_le16(sizeof(sdh_e.key)); sdh_e.de.flags = 0; sdh_e.de.res = 0; sdh_e.key.hash = d_security->key.hash; sdh_e.key.sec_id = d_security->key.sec_id; memcpy(&sdh_e.sec_hdr, d_security, sizeof(struct SECURITY_HDR)); sdh_e.magic[0] = cpu_to_le16('I'); sdh_e.magic[1] = cpu_to_le16('I'); fnd_clear(fnd_sdh); err = indx_insert_entry(indx_sdh, ni, &sdh_e.de, (void *)(size_t)1, fnd_sdh, 0); if (err) goto out; *security_id = d_security->key.sec_id; if (inserted) *inserted = true; /* Update Id and offset for next descriptor. */ sbi->security.next_id += 1; sbi->security.next_off += aligned_sec_size; out: fnd_put(fnd_sdh); mark_inode_dirty(&ni->vfs_inode); ni_unlock(ni); kfree(d_security); return err; } /* * ntfs_reparse_init - Load and parse $Extend/$Reparse. */ int ntfs_reparse_init(struct ntfs_sb_info *sbi) { int err; struct ntfs_inode *ni = sbi->reparse.ni; struct ntfs_index *indx = &sbi->reparse.index_r; struct ATTRIB *attr; struct ATTR_LIST_ENTRY *le; const struct INDEX_ROOT *root_r; if (!ni) return 0; le = NULL; attr = ni_find_attr(ni, NULL, &le, ATTR_ROOT, SR_NAME, ARRAY_SIZE(SR_NAME), NULL, NULL); if (!attr) { err = -EINVAL; goto out; } root_r = resident_data(attr); if (root_r->type != ATTR_ZERO || root_r->rule != NTFS_COLLATION_TYPE_UINTS) { err = -EINVAL; goto out; } err = indx_init(indx, sbi, attr, INDEX_MUTEX_SR); if (err) goto out; out: return err; } /* * ntfs_objid_init - Load and parse $Extend/$ObjId. */ int ntfs_objid_init(struct ntfs_sb_info *sbi) { int err; struct ntfs_inode *ni = sbi->objid.ni; struct ntfs_index *indx = &sbi->objid.index_o; struct ATTRIB *attr; struct ATTR_LIST_ENTRY *le; const struct INDEX_ROOT *root; if (!ni) return 0; le = NULL; attr = ni_find_attr(ni, NULL, &le, ATTR_ROOT, SO_NAME, ARRAY_SIZE(SO_NAME), NULL, NULL); if (!attr) { err = -EINVAL; goto out; } root = resident_data(attr); if (root->type != ATTR_ZERO || root->rule != NTFS_COLLATION_TYPE_UINTS) { err = -EINVAL; goto out; } err = indx_init(indx, sbi, attr, INDEX_MUTEX_SO); if (err) goto out; out: return err; } int ntfs_objid_remove(struct ntfs_sb_info *sbi, struct GUID *guid) { int err; struct ntfs_inode *ni = sbi->objid.ni; struct ntfs_index *indx = &sbi->objid.index_o; if (!ni) return -EINVAL; mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_OBJID); err = indx_delete_entry(indx, ni, guid, sizeof(*guid), NULL); mark_inode_dirty(&ni->vfs_inode); ni_unlock(ni); return err; } int ntfs_insert_reparse(struct ntfs_sb_info *sbi, __le32 rtag, const struct MFT_REF *ref) { int err; struct ntfs_inode *ni = sbi->reparse.ni; struct ntfs_index *indx = &sbi->reparse.index_r; struct NTFS_DE_R re; if (!ni) return -EINVAL; memset(&re, 0, sizeof(re)); re.de.view.data_off = cpu_to_le16(offsetof(struct NTFS_DE_R, zero)); re.de.size = cpu_to_le16(sizeof(struct NTFS_DE_R)); re.de.key_size = cpu_to_le16(sizeof(re.key)); re.key.ReparseTag = rtag; memcpy(&re.key.ref, ref, sizeof(*ref)); mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_REPARSE); err = indx_insert_entry(indx, ni, &re.de, NULL, NULL, 0); mark_inode_dirty(&ni->vfs_inode); ni_unlock(ni); return err; } int ntfs_remove_reparse(struct ntfs_sb_info *sbi, __le32 rtag, const struct MFT_REF *ref) { int err, diff; struct ntfs_inode *ni = sbi->reparse.ni; struct ntfs_index *indx = &sbi->reparse.index_r; struct ntfs_fnd *fnd = NULL; struct REPARSE_KEY rkey; struct NTFS_DE_R *re; struct INDEX_ROOT *root_r; if (!ni) return -EINVAL; rkey.ReparseTag = rtag; rkey.ref = *ref; mutex_lock_nested(&ni->ni_lock, NTFS_INODE_MUTEX_REPARSE); if (rtag) { err = indx_delete_entry(indx, ni, &rkey, sizeof(rkey), NULL); goto out1; } fnd = fnd_get(); if (!fnd) { err = -ENOMEM; goto out1; } root_r = indx_get_root(indx, ni, NULL, NULL); if (!root_r) { err = -EINVAL; goto out; } /* 1 - forces to ignore rkey.ReparseTag when comparing keys. */ err = indx_find(indx, ni, root_r, &rkey, sizeof(rkey), (void *)1, &diff, (struct NTFS_DE **)&re, fnd); if (err) goto out; if (memcmp(&re->key.ref, ref, sizeof(*ref))) { /* Impossible. Looks like volume corrupt? */ goto out; } memcpy(&rkey, &re->key, sizeof(rkey)); fnd_put(fnd); fnd = NULL; err = indx_delete_entry(indx, ni, &rkey, sizeof(rkey), NULL); if (err) goto out; out: fnd_put(fnd); out1: mark_inode_dirty(&ni->vfs_inode); ni_unlock(ni); return err; } static inline void ntfs_unmap_and_discard(struct ntfs_sb_info *sbi, CLST lcn, CLST len) { ntfs_unmap_meta(sbi->sb, lcn, len); ntfs_discard(sbi, lcn, len); } void mark_as_free_ex(struct ntfs_sb_info *sbi, CLST lcn, CLST len, bool trim) { CLST end, i, zone_len, zlen; struct wnd_bitmap *wnd = &sbi->used.bitmap; bool dirty = false; down_write_nested(&wnd->rw_lock, BITMAP_MUTEX_CLUSTERS); if (!wnd_is_used(wnd, lcn, len)) { /* mark volume as dirty out of wnd->rw_lock */ dirty = true; end = lcn + len; len = 0; for (i = lcn; i < end; i++) { if (wnd_is_used(wnd, i, 1)) { if (!len) lcn = i; len += 1; continue; } if (!len) continue; if (trim) ntfs_unmap_and_discard(sbi, lcn, len); wnd_set_free(wnd, lcn, len); len = 0; } if (!len) goto out; } if (trim) ntfs_unmap_and_discard(sbi, lcn, len); wnd_set_free(wnd, lcn, len); /* append to MFT zone, if possible. */ zone_len = wnd_zone_len(wnd); zlen = min(zone_len + len, sbi->zone_max); if (zlen == zone_len) { /* MFT zone already has maximum size. */ } else if (!zone_len) { /* Create MFT zone only if 'zlen' is large enough. */ if (zlen == sbi->zone_max) wnd_zone_set(wnd, lcn, zlen); } else { CLST zone_lcn = wnd_zone_bit(wnd); if (lcn + len == zone_lcn) { /* Append into head MFT zone. */ wnd_zone_set(wnd, lcn, zlen); } else if (zone_lcn + zone_len == lcn) { /* Append into tail MFT zone. */ wnd_zone_set(wnd, zone_lcn, zlen); } } out: up_write(&wnd->rw_lock); if (dirty) ntfs_set_state(sbi, NTFS_DIRTY_ERROR); } /* * run_deallocate - Deallocate clusters. */ int run_deallocate(struct ntfs_sb_info *sbi, const struct runs_tree *run, bool trim) { CLST lcn, len; size_t idx = 0; while (run_get_entry(run, idx++, NULL, &lcn, &len)) { if (lcn == SPARSE_LCN) continue; mark_as_free_ex(sbi, lcn, len, trim); } return 0; } static inline bool name_has_forbidden_chars(const struct le_str *fname) { int i, ch; /* check for forbidden chars */ for (i = 0; i < fname->len; ++i) { ch = le16_to_cpu(fname->name[i]); /* control chars */ if (ch < 0x20) return true; switch (ch) { /* disallowed by Windows */ case '\\': case '/': case ':': case '*': case '?': case '<': case '>': case '|': case '\"': return true; default: /* allowed char */ break; } } /* file names cannot end with space or . */ if (fname->len > 0) { ch = le16_to_cpu(fname->name[fname->len - 1]); if (ch == ' ' || ch == '.') return true; } return false; } static inline bool is_reserved_name(const struct ntfs_sb_info *sbi, const struct le_str *fname) { int port_digit; const __le16 *name = fname->name; int len = fname->len; const u16 *upcase = sbi->upcase; /* check for 3 chars reserved names (device names) */ /* name by itself or with any extension is forbidden */ if (len == 3 || (len > 3 && le16_to_cpu(name[3]) == '.')) if (!ntfs_cmp_names(name, 3, CON_NAME, 3, upcase, false) || !ntfs_cmp_names(name, 3, NUL_NAME, 3, upcase, false) || !ntfs_cmp_names(name, 3, AUX_NAME, 3, upcase, false) || !ntfs_cmp_names(name, 3, PRN_NAME, 3, upcase, false)) return true; /* check for 4 chars reserved names (port name followed by 1..9) */ /* name by itself or with any extension is forbidden */ if (len == 4 || (len > 4 && le16_to_cpu(name[4]) == '.')) { port_digit = le16_to_cpu(name[3]); if (port_digit >= '1' && port_digit <= '9') if (!ntfs_cmp_names(name, 3, COM_NAME, 3, upcase, false) || !ntfs_cmp_names(name, 3, LPT_NAME, 3, upcase, false)) return true; } return false; } /* * valid_windows_name - Check if a file name is valid in Windows. */ bool valid_windows_name(struct ntfs_sb_info *sbi, const struct le_str *fname) { return !name_has_forbidden_chars(fname) && !is_reserved_name(sbi, fname); } /* * ntfs_set_label - updates current ntfs label. */ int ntfs_set_label(struct ntfs_sb_info *sbi, u8 *label, int len) { int err; struct ATTRIB *attr; u32 uni_bytes; struct ntfs_inode *ni = sbi->volume.ni; /* Allocate PATH_MAX bytes. */ struct cpu_str *uni = __getname(); if (!uni) return -ENOMEM; err = ntfs_nls_to_utf16(sbi, label, len, uni, (PATH_MAX - 2) / 2, UTF16_LITTLE_ENDIAN); if (err < 0) goto out; uni_bytes = uni->len * sizeof(u16); if (uni_bytes > NTFS_LABEL_MAX_LENGTH * sizeof(u16)) { ntfs_warn(sbi->sb, "new label is too long"); err = -EFBIG; goto out; } ni_lock(ni); /* Ignore any errors. */ ni_remove_attr(ni, ATTR_LABEL, NULL, 0, false, NULL); err = ni_insert_resident(ni, uni_bytes, ATTR_LABEL, NULL, 0, &attr, NULL, NULL); if (err < 0) goto unlock_out; /* write new label in on-disk struct. */ memcpy(resident_data(attr), uni->name, uni_bytes); /* update cached value of current label. */ if (len >= ARRAY_SIZE(sbi->volume.label)) len = ARRAY_SIZE(sbi->volume.label) - 1; memcpy(sbi->volume.label, label, len); sbi->volume.label[len] = 0; mark_inode_dirty_sync(&ni->vfs_inode); unlock_out: ni_unlock(ni); if (!err) err = _ni_write_inode(&ni->vfs_inode, 0); out: __putname(uni); return err; } |
2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2006-2007 Silicon Graphics, Inc. * All Rights Reserved. */ #ifndef __XFS_FILESTREAM_H__ #define __XFS_FILESTREAM_H__ struct xfs_mount; struct xfs_inode; struct xfs_bmalloca; struct xfs_alloc_arg; int xfs_filestream_mount(struct xfs_mount *mp); void xfs_filestream_unmount(struct xfs_mount *mp); void xfs_filestream_deassociate(struct xfs_inode *ip); int xfs_filestream_select_ag(struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t *blen); static inline int xfs_inode_is_filestream( struct xfs_inode *ip) { return xfs_has_filestreams(ip->i_mount) || (ip->i_diflags & XFS_DIFLAG_FILESTREAM); } #endif /* __XFS_FILESTREAM_H__ */ |
1 11 9 11 11 8 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2014 Filipe David Borba Manana <fdmanana@gmail.com> */ #include <linux/hashtable.h> #include <linux/xattr.h> #include "messages.h" #include "props.h" #include "btrfs_inode.h" #include "transaction.h" #include "ctree.h" #include "xattr.h" #include "compression.h" #include "space-info.h" #include "fs.h" #include "accessors.h" #include "super.h" #include "dir-item.h" #define BTRFS_PROP_HANDLERS_HT_BITS 8 static DEFINE_HASHTABLE(prop_handlers_ht, BTRFS_PROP_HANDLERS_HT_BITS); struct prop_handler { struct hlist_node node; const char *xattr_name; int (*validate)(const struct btrfs_inode *inode, const char *value, size_t len); int (*apply)(struct inode *inode, const char *value, size_t len); const char *(*extract)(const struct inode *inode); bool (*ignore)(const struct btrfs_inode *inode); int inheritable; }; static const struct hlist_head *find_prop_handlers_by_hash(const u64 hash) { struct hlist_head *h; h = &prop_handlers_ht[hash_min(hash, BTRFS_PROP_HANDLERS_HT_BITS)]; if (hlist_empty(h)) return NULL; return h; } static const struct prop_handler * find_prop_handler(const char *name, const struct hlist_head *handlers) { struct prop_handler *h; if (!handlers) { u64 hash = btrfs_name_hash(name, strlen(name)); handlers = find_prop_handlers_by_hash(hash); if (!handlers) return NULL; } hlist_for_each_entry(h, handlers, node) if (!strcmp(h->xattr_name, name)) return h; return NULL; } int btrfs_validate_prop(const struct btrfs_inode *inode, const char *name, const char *value, size_t value_len) { const struct prop_handler *handler; if (strlen(name) <= XATTR_BTRFS_PREFIX_LEN) return -EINVAL; handler = find_prop_handler(name, NULL); if (!handler) return -EINVAL; if (value_len == 0) return 0; return handler->validate(inode, value, value_len); } /* * Check if a property should be ignored (not set) for an inode. * * @inode: The target inode. * @name: The property's name. * * The caller must be sure the given property name is valid, for example by * having previously called btrfs_validate_prop(). * * Returns: true if the property should be ignored for the given inode * false if the property must not be ignored for the given inode */ bool btrfs_ignore_prop(const struct btrfs_inode *inode, const char *name) { const struct prop_handler *handler; handler = find_prop_handler(name, NULL); ASSERT(handler != NULL); return handler->ignore(inode); } int btrfs_set_prop(struct btrfs_trans_handle *trans, struct btrfs_inode *inode, const char *name, const char *value, size_t value_len, int flags) { const struct prop_handler *handler; int ret; handler = find_prop_handler(name, NULL); if (!handler) return -EINVAL; if (value_len == 0) { ret = btrfs_setxattr(trans, &inode->vfs_inode, handler->xattr_name, NULL, 0, flags); if (ret) return ret; ret = handler->apply(&inode->vfs_inode, NULL, 0); ASSERT(ret == 0); return ret; } ret = btrfs_setxattr(trans, &inode->vfs_inode, handler->xattr_name, value, value_len, flags); if (ret) return ret; ret = handler->apply(&inode->vfs_inode, value, value_len); if (ret) { btrfs_setxattr(trans, &inode->vfs_inode, handler->xattr_name, NULL, 0, flags); return ret; } set_bit(BTRFS_INODE_HAS_PROPS, &inode->runtime_flags); return 0; } static int iterate_object_props(struct btrfs_root *root, struct btrfs_path *path, u64 objectid, void (*iterator)(void *, const struct prop_handler *, const char *, size_t), void *ctx) { int ret; char *name_buf = NULL; char *value_buf = NULL; int name_buf_len = 0; int value_buf_len = 0; while (1) { struct btrfs_key key; struct btrfs_dir_item *di; struct extent_buffer *leaf; u32 total_len, cur, this_len; int slot; const struct hlist_head *handlers; slot = path->slots[0]; leaf = path->nodes[0]; if (slot >= btrfs_header_nritems(leaf)) { ret = btrfs_next_leaf(root, path); if (ret < 0) goto out; else if (ret > 0) break; continue; } btrfs_item_key_to_cpu(leaf, &key, slot); if (key.objectid != objectid) break; if (key.type != BTRFS_XATTR_ITEM_KEY) break; handlers = find_prop_handlers_by_hash(key.offset); if (!handlers) goto next_slot; di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); cur = 0; total_len = btrfs_item_size(leaf, slot); while (cur < total_len) { u32 name_len = btrfs_dir_name_len(leaf, di); u32 data_len = btrfs_dir_data_len(leaf, di); unsigned long name_ptr, data_ptr; const struct prop_handler *handler; this_len = sizeof(*di) + name_len + data_len; name_ptr = (unsigned long)(di + 1); data_ptr = name_ptr + name_len; if (name_len <= XATTR_BTRFS_PREFIX_LEN || memcmp_extent_buffer(leaf, XATTR_BTRFS_PREFIX, name_ptr, XATTR_BTRFS_PREFIX_LEN)) goto next_dir_item; if (name_len >= name_buf_len) { kfree(name_buf); name_buf_len = name_len + 1; name_buf = kmalloc(name_buf_len, GFP_NOFS); if (!name_buf) { ret = -ENOMEM; goto out; } } read_extent_buffer(leaf, name_buf, name_ptr, name_len); name_buf[name_len] = '\0'; handler = find_prop_handler(name_buf, handlers); if (!handler) goto next_dir_item; if (data_len > value_buf_len) { kfree(value_buf); value_buf_len = data_len; value_buf = kmalloc(data_len, GFP_NOFS); if (!value_buf) { ret = -ENOMEM; goto out; } } read_extent_buffer(leaf, value_buf, data_ptr, data_len); iterator(ctx, handler, value_buf, data_len); next_dir_item: cur += this_len; di = (struct btrfs_dir_item *)((char *) di + this_len); } next_slot: path->slots[0]++; } ret = 0; out: btrfs_release_path(path); kfree(name_buf); kfree(value_buf); return ret; } static void inode_prop_iterator(void *ctx, const struct prop_handler *handler, const char *value, size_t len) { struct inode *inode = ctx; struct btrfs_root *root = BTRFS_I(inode)->root; int ret; ret = handler->apply(inode, value, len); if (unlikely(ret)) btrfs_warn(root->fs_info, "error applying prop %s to ino %llu (root %llu): %d", handler->xattr_name, btrfs_ino(BTRFS_I(inode)), btrfs_root_id(root), ret); else set_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(inode)->runtime_flags); } int btrfs_load_inode_props(struct inode *inode, struct btrfs_path *path) { struct btrfs_root *root = BTRFS_I(inode)->root; u64 ino = btrfs_ino(BTRFS_I(inode)); return iterate_object_props(root, path, ino, inode_prop_iterator, inode); } static int prop_compression_validate(const struct btrfs_inode *inode, const char *value, size_t len) { if (!btrfs_inode_can_compress(inode)) return -EINVAL; if (!value) return 0; if (btrfs_compress_is_valid_type(value, len)) return 0; if ((len == 2 && strncmp("no", value, 2) == 0) || (len == 4 && strncmp("none", value, 4) == 0)) return 0; return -EINVAL; } static int prop_compression_apply(struct inode *inode, const char *value, size_t len) { struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); int type; /* Reset to defaults */ if (len == 0) { BTRFS_I(inode)->flags &= ~BTRFS_INODE_COMPRESS; BTRFS_I(inode)->flags &= ~BTRFS_INODE_NOCOMPRESS; BTRFS_I(inode)->prop_compress = BTRFS_COMPRESS_NONE; return 0; } /* Set NOCOMPRESS flag */ if ((len == 2 && strncmp("no", value, 2) == 0) || (len == 4 && strncmp("none", value, 4) == 0)) { BTRFS_I(inode)->flags |= BTRFS_INODE_NOCOMPRESS; BTRFS_I(inode)->flags &= ~BTRFS_INODE_COMPRESS; BTRFS_I(inode)->prop_compress = BTRFS_COMPRESS_NONE; return 0; } if (!strncmp("lzo", value, 3)) { type = BTRFS_COMPRESS_LZO; btrfs_set_fs_incompat(fs_info, COMPRESS_LZO); } else if (!strncmp("zlib", value, 4)) { type = BTRFS_COMPRESS_ZLIB; } else if (!strncmp("zstd", value, 4)) { type = BTRFS_COMPRESS_ZSTD; btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD); } else { return -EINVAL; } BTRFS_I(inode)->flags &= ~BTRFS_INODE_NOCOMPRESS; BTRFS_I(inode)->flags |= BTRFS_INODE_COMPRESS; BTRFS_I(inode)->prop_compress = type; return 0; } static bool prop_compression_ignore(const struct btrfs_inode *inode) { /* * Compression only has effect for regular files, and for directories * we set it just to propagate it to new files created inside them. * Everything else (symlinks, devices, sockets, fifos) is pointless as * it will do nothing, so don't waste metadata space on a compression * xattr for anything that is neither a file nor a directory. */ if (!S_ISREG(inode->vfs_inode.i_mode) && !S_ISDIR(inode->vfs_inode.i_mode)) return true; return false; } static const char *prop_compression_extract(const struct inode *inode) { switch (BTRFS_I(inode)->prop_compress) { case BTRFS_COMPRESS_ZLIB: case BTRFS_COMPRESS_LZO: case BTRFS_COMPRESS_ZSTD: return btrfs_compress_type2str(BTRFS_I(inode)->prop_compress); default: break; } return NULL; } static struct prop_handler prop_handlers[] = { { .xattr_name = XATTR_BTRFS_PREFIX "compression", .validate = prop_compression_validate, .apply = prop_compression_apply, .extract = prop_compression_extract, .ignore = prop_compression_ignore, .inheritable = 1 }, }; int btrfs_inode_inherit_props(struct btrfs_trans_handle *trans, struct inode *inode, const struct inode *parent) { struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; int ret; int i; bool need_reserve = false; if (!test_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(parent)->runtime_flags)) return 0; for (i = 0; i < ARRAY_SIZE(prop_handlers); i++) { const struct prop_handler *h = &prop_handlers[i]; const char *value; u64 num_bytes = 0; if (!h->inheritable) continue; if (h->ignore(BTRFS_I(inode))) continue; value = h->extract(parent); if (!value) continue; /* * This is not strictly necessary as the property should be * valid, but in case it isn't, don't propagate it further. */ ret = h->validate(BTRFS_I(inode), value, strlen(value)); if (ret) continue; /* * Currently callers should be reserving 1 item for properties, * since we only have 1 property that we currently support. If * we add more in the future we need to try and reserve more * space for them. But we should also revisit how we do space * reservations if we do add more properties in the future. */ if (need_reserve) { num_bytes = btrfs_calc_insert_metadata_size(fs_info, 1); ret = btrfs_block_rsv_add(fs_info, trans->block_rsv, num_bytes, BTRFS_RESERVE_NO_FLUSH); if (ret) return ret; } ret = btrfs_setxattr(trans, inode, h->xattr_name, value, strlen(value), 0); if (!ret) { ret = h->apply(inode, value, strlen(value)); if (ret) btrfs_setxattr(trans, inode, h->xattr_name, NULL, 0, 0); else set_bit(BTRFS_INODE_HAS_PROPS, &BTRFS_I(inode)->runtime_flags); } if (need_reserve) { btrfs_block_rsv_release(fs_info, trans->block_rsv, num_bytes, NULL); if (ret) return ret; } need_reserve = true; } return 0; } int __init btrfs_props_init(void) { int i; for (i = 0; i < ARRAY_SIZE(prop_handlers); i++) { struct prop_handler *p = &prop_handlers[i]; u64 h = btrfs_name_hash(p->xattr_name, strlen(p->xattr_name)); hash_add(prop_handlers_ht, &p->node, h); } return 0; } |
2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright 2024 Linaro Limited * * Author: Daniel Lezcano <daniel.lezcano@linaro.org> * * Thermal thresholds */ #include <linux/list.h> #include <linux/list_sort.h> #include <linux/slab.h> #include "thermal_core.h" #include "thermal_thresholds.h" int thermal_thresholds_init(struct thermal_zone_device *tz) { INIT_LIST_HEAD(&tz->user_thresholds); return 0; } static void __thermal_thresholds_flush(struct thermal_zone_device *tz) { struct list_head *thresholds = &tz->user_thresholds; struct user_threshold *entry, *tmp; list_for_each_entry_safe(entry, tmp, thresholds, list_node) { list_del(&entry->list_node); kfree(entry); } } void thermal_thresholds_flush(struct thermal_zone_device *tz) { lockdep_assert_held(&tz->lock); __thermal_thresholds_flush(tz); thermal_notify_threshold_flush(tz); __thermal_zone_device_update(tz, THERMAL_TZ_FLUSH_THRESHOLDS); } void thermal_thresholds_exit(struct thermal_zone_device *tz) { __thermal_thresholds_flush(tz); } static int __thermal_thresholds_cmp(void *data, const struct list_head *l1, const struct list_head *l2) { struct user_threshold *t1 = container_of(l1, struct user_threshold, list_node); struct user_threshold *t2 = container_of(l2, struct user_threshold, list_node); return t1->temperature - t2->temperature; } static struct user_threshold *__thermal_thresholds_find(const struct list_head *thresholds, int temperature) { struct user_threshold *t; list_for_each_entry(t, thresholds, list_node) if (t->temperature == temperature) return t; return NULL; } static bool thermal_thresholds_handle_raising(struct list_head *thresholds, int temperature, int last_temperature) { struct user_threshold *t; list_for_each_entry(t, thresholds, list_node) { if (!(t->direction & THERMAL_THRESHOLD_WAY_UP)) continue; if (temperature >= t->temperature && last_temperature < t->temperature) return true; } return false; } static bool thermal_thresholds_handle_dropping(struct list_head *thresholds, int temperature, int last_temperature) { struct user_threshold *t; list_for_each_entry_reverse(t, thresholds, list_node) { if (!(t->direction & THERMAL_THRESHOLD_WAY_DOWN)) continue; if (temperature <= t->temperature && last_temperature > t->temperature) return true; } return false; } static void thermal_threshold_find_boundaries(struct list_head *thresholds, int temperature, int *low, int *high) { struct user_threshold *t; list_for_each_entry(t, thresholds, list_node) { if (temperature < t->temperature && (t->direction & THERMAL_THRESHOLD_WAY_UP) && *high > t->temperature) *high = t->temperature; } list_for_each_entry_reverse(t, thresholds, list_node) { if (temperature > t->temperature && (t->direction & THERMAL_THRESHOLD_WAY_DOWN) && *low < t->temperature) *low = t->temperature; } } void thermal_thresholds_handle(struct thermal_zone_device *tz, int *low, int *high) { struct list_head *thresholds = &tz->user_thresholds; int temperature = tz->temperature; int last_temperature = tz->last_temperature; lockdep_assert_held(&tz->lock); thermal_threshold_find_boundaries(thresholds, temperature, low, high); /* * We need a second update in order to detect a threshold being crossed */ if (last_temperature == THERMAL_TEMP_INVALID) return; /* * The temperature is stable, so obviously we can not have * crossed a threshold. */ if (last_temperature == temperature) return; /* * Since last update the temperature: * - increased : thresholds are crossed the way up * - decreased : thresholds are crossed the way down */ if (temperature > last_temperature) { if (thermal_thresholds_handle_raising(thresholds, temperature, last_temperature)) thermal_notify_threshold_up(tz); } else { if (thermal_thresholds_handle_dropping(thresholds, temperature, last_temperature)) thermal_notify_threshold_down(tz); } } int thermal_thresholds_add(struct thermal_zone_device *tz, int temperature, int direction) { struct list_head *thresholds = &tz->user_thresholds; struct user_threshold *t; lockdep_assert_held(&tz->lock); t = __thermal_thresholds_find(thresholds, temperature); if (t) { if (t->direction == direction) return -EEXIST; t->direction |= direction; } else { t = kmalloc(sizeof(*t), GFP_KERNEL); if (!t) return -ENOMEM; INIT_LIST_HEAD(&t->list_node); t->temperature = temperature; t->direction = direction; list_add(&t->list_node, thresholds); list_sort(NULL, thresholds, __thermal_thresholds_cmp); } thermal_notify_threshold_add(tz, temperature, direction); __thermal_zone_device_update(tz, THERMAL_TZ_ADD_THRESHOLD); return 0; } int thermal_thresholds_delete(struct thermal_zone_device *tz, int temperature, int direction) { struct list_head *thresholds = &tz->user_thresholds; struct user_threshold *t; lockdep_assert_held(&tz->lock); t = __thermal_thresholds_find(thresholds, temperature); if (!t) return -ENOENT; if (t->direction == direction) { list_del(&t->list_node); kfree(t); } else { t->direction &= ~direction; } thermal_notify_threshold_delete(tz, temperature, direction); __thermal_zone_device_update(tz, THERMAL_TZ_DEL_THRESHOLD); return 0; } int thermal_thresholds_for_each(struct thermal_zone_device *tz, int (*cb)(struct user_threshold *, void *arg), void *arg) { struct list_head *thresholds = &tz->user_thresholds; struct user_threshold *entry; int ret; guard(thermal_zone)(tz); list_for_each_entry(entry, thresholds, list_node) { ret = cb(entry, arg); if (ret) return ret; } return 0; } |
18 3 18 18 18 18 3 18 15 3 3 18 1 18 18 18 1 1 18 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 | // SPDX-License-Identifier: GPL-2.0+ /* * comedi/comedi_fops.c * comedi kernel module * * COMEDI - Linux Control and Measurement Device Interface * Copyright (C) 1997-2007 David A. Schleef <ds@schleef.org> * compat ioctls: * Author: Ian Abbott, MEV Ltd. <abbotti@mev.co.uk> * Copyright (C) 2007 MEV Ltd. <http://www.mev.co.uk/> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/sched/signal.h> #include <linux/fcntl.h> #include <linux/delay.h> #include <linux/mm.h> #include <linux/slab.h> #include <linux/poll.h> #include <linux/device.h> #include <linux/fs.h> #include <linux/comedi/comedidev.h> #include <linux/cdev.h> #include <linux/io.h> #include <linux/uaccess.h> #include <linux/compat.h> #include "comedi_internal.h" /* * comedi_subdevice "runflags" * COMEDI_SRF_RT: DEPRECATED: command is running real-time * COMEDI_SRF_ERROR: indicates an COMEDI_CB_ERROR event has occurred * since the last command was started * COMEDI_SRF_RUNNING: command is running * COMEDI_SRF_FREE_SPRIV: free s->private on detach * * COMEDI_SRF_BUSY_MASK: runflags that indicate the subdevice is "busy" */ #define COMEDI_SRF_RT BIT(1) #define COMEDI_SRF_ERROR BIT(2) #define COMEDI_SRF_RUNNING BIT(27) #define COMEDI_SRF_FREE_SPRIV BIT(31) #define COMEDI_SRF_BUSY_MASK (COMEDI_SRF_ERROR | COMEDI_SRF_RUNNING) /** * struct comedi_file - Per-file private data for COMEDI device * @dev: COMEDI device. * @read_subdev: Current "read" subdevice. * @write_subdev: Current "write" subdevice. * @last_detach_count: Last known detach count. * @last_attached: Last known attached/detached state. */ struct comedi_file { struct comedi_device *dev; struct comedi_subdevice *read_subdev; struct comedi_subdevice *write_subdev; unsigned int last_detach_count; unsigned int last_attached:1; }; #define COMEDI_NUM_MINORS 0x100 #define COMEDI_NUM_SUBDEVICE_MINORS \ (COMEDI_NUM_MINORS - COMEDI_NUM_BOARD_MINORS) static unsigned short comedi_num_legacy_minors; module_param(comedi_num_legacy_minors, ushort, 0444); MODULE_PARM_DESC(comedi_num_legacy_minors, "number of comedi minor devices to reserve for non-auto-configured devices (default 0)" ); unsigned int comedi_default_buf_size_kb = CONFIG_COMEDI_DEFAULT_BUF_SIZE_KB; module_param(comedi_default_buf_size_kb, uint, 0644); MODULE_PARM_DESC(comedi_default_buf_size_kb, "default asynchronous buffer size in KiB (default " __MODULE_STRING(CONFIG_COMEDI_DEFAULT_BUF_SIZE_KB) ")"); unsigned int comedi_default_buf_maxsize_kb = CONFIG_COMEDI_DEFAULT_BUF_MAXSIZE_KB; module_param(comedi_default_buf_maxsize_kb, uint, 0644); MODULE_PARM_DESC(comedi_default_buf_maxsize_kb, "default maximum size of asynchronous buffer in KiB (default " __MODULE_STRING(CONFIG_COMEDI_DEFAULT_BUF_MAXSIZE_KB) ")"); static DEFINE_MUTEX(comedi_board_minor_table_lock); static struct comedi_device *comedi_board_minor_table[COMEDI_NUM_BOARD_MINORS]; static DEFINE_MUTEX(comedi_subdevice_minor_table_lock); /* Note: indexed by minor - COMEDI_NUM_BOARD_MINORS. */ static struct comedi_subdevice *comedi_subdevice_minor_table[COMEDI_NUM_SUBDEVICE_MINORS]; static struct cdev comedi_cdev; static void comedi_device_init(struct comedi_device *dev) { kref_init(&dev->refcount); spin_lock_init(&dev->spinlock); mutex_init(&dev->mutex); init_rwsem(&dev->attach_lock); dev->minor = -1; } static void comedi_dev_kref_release(struct kref *kref) { struct comedi_device *dev = container_of(kref, struct comedi_device, refcount); mutex_destroy(&dev->mutex); put_device(dev->class_dev); kfree(dev); } /** * comedi_dev_put() - Release a use of a COMEDI device * @dev: COMEDI device. * * Must be called when a user of a COMEDI device is finished with it. * When the last user of the COMEDI device calls this function, the * COMEDI device is destroyed. * * Return: 1 if the COMEDI device is destroyed by this call or @dev is * NULL, otherwise return 0. Callers must not assume the COMEDI * device is still valid if this function returns 0. */ int comedi_dev_put(struct comedi_device *dev) { if (dev) return kref_put(&dev->refcount, comedi_dev_kref_release); return 1; } EXPORT_SYMBOL_GPL(comedi_dev_put); static struct comedi_device *comedi_dev_get(struct comedi_device *dev) { if (dev) kref_get(&dev->refcount); return dev; } static void comedi_device_cleanup(struct comedi_device *dev) { struct module *driver_module = NULL; if (!dev) return; mutex_lock(&dev->mutex); if (dev->attached) driver_module = dev->driver->module; comedi_device_detach(dev); if (driver_module && dev->use_count) module_put(driver_module); mutex_unlock(&dev->mutex); } static bool comedi_clear_board_dev(struct comedi_device *dev) { unsigned int i = dev->minor; bool cleared = false; lockdep_assert_held(&dev->mutex); mutex_lock(&comedi_board_minor_table_lock); if (dev == comedi_board_minor_table[i]) { comedi_board_minor_table[i] = NULL; cleared = true; } mutex_unlock(&comedi_board_minor_table_lock); return cleared; } static struct comedi_device *comedi_clear_board_minor(unsigned int minor) { struct comedi_device *dev; mutex_lock(&comedi_board_minor_table_lock); dev = comedi_board_minor_table[minor]; comedi_board_minor_table[minor] = NULL; mutex_unlock(&comedi_board_minor_table_lock); return dev; } static struct comedi_subdevice * comedi_subdevice_from_minor(const struct comedi_device *dev, unsigned int minor) { struct comedi_subdevice *s; unsigned int i = minor - COMEDI_NUM_BOARD_MINORS; mutex_lock(&comedi_subdevice_minor_table_lock); s = comedi_subdevice_minor_table[i]; if (s && s->device != dev) s = NULL; mutex_unlock(&comedi_subdevice_minor_table_lock); return s; } static struct comedi_device *comedi_dev_get_from_board_minor(unsigned int minor) { struct comedi_device *dev; mutex_lock(&comedi_board_minor_table_lock); dev = comedi_dev_get(comedi_board_minor_table[minor]); mutex_unlock(&comedi_board_minor_table_lock); return dev; } static struct comedi_device * comedi_dev_get_from_subdevice_minor(unsigned int minor) { struct comedi_device *dev; struct comedi_subdevice *s; unsigned int i = minor - COMEDI_NUM_BOARD_MINORS; mutex_lock(&comedi_subdevice_minor_table_lock); s = comedi_subdevice_minor_table[i]; dev = comedi_dev_get(s ? s->device : NULL); mutex_unlock(&comedi_subdevice_minor_table_lock); return dev; } /** * comedi_dev_get_from_minor() - Get COMEDI device by minor device number * @minor: Minor device number. * * Finds the COMEDI device associated with the minor device number, if any, * and increments its reference count. The COMEDI device is prevented from * being freed until a matching call is made to comedi_dev_put(). * * Return: A pointer to the COMEDI device if it exists, with its usage * reference incremented. Return NULL if no COMEDI device exists with the * specified minor device number. */ struct comedi_device *comedi_dev_get_from_minor(unsigned int minor) { if (minor < COMEDI_NUM_BOARD_MINORS) return comedi_dev_get_from_board_minor(minor); return comedi_dev_get_from_subdevice_minor(minor); } EXPORT_SYMBOL_GPL(comedi_dev_get_from_minor); static struct comedi_subdevice * comedi_read_subdevice(const struct comedi_device *dev, unsigned int minor) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (minor >= COMEDI_NUM_BOARD_MINORS) { s = comedi_subdevice_from_minor(dev, minor); if (!s || (s->subdev_flags & SDF_CMD_READ)) return s; } return dev->read_subdev; } static struct comedi_subdevice * comedi_write_subdevice(const struct comedi_device *dev, unsigned int minor) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (minor >= COMEDI_NUM_BOARD_MINORS) { s = comedi_subdevice_from_minor(dev, minor); if (!s || (s->subdev_flags & SDF_CMD_WRITE)) return s; } return dev->write_subdev; } static void comedi_file_reset(struct file *file) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_subdevice *s, *read_s, *write_s; unsigned int minor = iminor(file_inode(file)); read_s = dev->read_subdev; write_s = dev->write_subdev; if (minor >= COMEDI_NUM_BOARD_MINORS) { s = comedi_subdevice_from_minor(dev, minor); if (!s || s->subdev_flags & SDF_CMD_READ) read_s = s; if (!s || s->subdev_flags & SDF_CMD_WRITE) write_s = s; } cfp->last_attached = dev->attached; cfp->last_detach_count = dev->detach_count; WRITE_ONCE(cfp->read_subdev, read_s); WRITE_ONCE(cfp->write_subdev, write_s); } static void comedi_file_check(struct file *file) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; if (cfp->last_attached != dev->attached || cfp->last_detach_count != dev->detach_count) comedi_file_reset(file); } static struct comedi_subdevice *comedi_file_read_subdevice(struct file *file) { struct comedi_file *cfp = file->private_data; comedi_file_check(file); return READ_ONCE(cfp->read_subdev); } static struct comedi_subdevice *comedi_file_write_subdevice(struct file *file) { struct comedi_file *cfp = file->private_data; comedi_file_check(file); return READ_ONCE(cfp->write_subdev); } static int resize_async_buffer(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int new_size) { struct comedi_async *async = s->async; int retval; lockdep_assert_held(&dev->mutex); if (new_size > async->max_bufsize) return -EPERM; if (s->busy) { dev_dbg(dev->class_dev, "subdevice is busy, cannot resize buffer\n"); return -EBUSY; } if (comedi_buf_is_mmapped(s)) { dev_dbg(dev->class_dev, "subdevice is mmapped, cannot resize buffer\n"); return -EBUSY; } /* make sure buffer is an integral number of pages (we round up) */ new_size = (new_size + PAGE_SIZE - 1) & PAGE_MASK; retval = comedi_buf_alloc(dev, s, new_size); if (retval < 0) return retval; if (s->buf_change) { retval = s->buf_change(dev, s); if (retval < 0) return retval; } dev_dbg(dev->class_dev, "subd %d buffer resized to %i bytes\n", s->index, async->prealloc_bufsz); return 0; } /* sysfs attribute files */ static ssize_t max_read_buffer_kb_show(struct device *csdev, struct device_attribute *attr, char *buf) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size = 0; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_read_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_READ) && s->async) size = s->async->max_bufsize / 1024; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return sysfs_emit(buf, "%u\n", size); } static ssize_t max_read_buffer_kb_store(struct device *csdev, struct device_attribute *attr, const char *buf, size_t count) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size; int err; err = kstrtouint(buf, 10, &size); if (err) return err; if (size > (UINT_MAX / 1024)) return -EINVAL; size *= 1024; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_read_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_READ) && s->async) s->async->max_bufsize = size; else err = -EINVAL; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return err ? err : count; } static DEVICE_ATTR_RW(max_read_buffer_kb); static ssize_t read_buffer_kb_show(struct device *csdev, struct device_attribute *attr, char *buf) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size = 0; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_read_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_READ) && s->async) size = s->async->prealloc_bufsz / 1024; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return sysfs_emit(buf, "%u\n", size); } static ssize_t read_buffer_kb_store(struct device *csdev, struct device_attribute *attr, const char *buf, size_t count) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size; int err; err = kstrtouint(buf, 10, &size); if (err) return err; if (size > (UINT_MAX / 1024)) return -EINVAL; size *= 1024; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_read_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_READ) && s->async) err = resize_async_buffer(dev, s, size); else err = -EINVAL; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return err ? err : count; } static DEVICE_ATTR_RW(read_buffer_kb); static ssize_t max_write_buffer_kb_show(struct device *csdev, struct device_attribute *attr, char *buf) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size = 0; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_write_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_WRITE) && s->async) size = s->async->max_bufsize / 1024; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return sysfs_emit(buf, "%u\n", size); } static ssize_t max_write_buffer_kb_store(struct device *csdev, struct device_attribute *attr, const char *buf, size_t count) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size; int err; err = kstrtouint(buf, 10, &size); if (err) return err; if (size > (UINT_MAX / 1024)) return -EINVAL; size *= 1024; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_write_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_WRITE) && s->async) s->async->max_bufsize = size; else err = -EINVAL; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return err ? err : count; } static DEVICE_ATTR_RW(max_write_buffer_kb); static ssize_t write_buffer_kb_show(struct device *csdev, struct device_attribute *attr, char *buf) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size = 0; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_write_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_WRITE) && s->async) size = s->async->prealloc_bufsz / 1024; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return sysfs_emit(buf, "%u\n", size); } static ssize_t write_buffer_kb_store(struct device *csdev, struct device_attribute *attr, const char *buf, size_t count) { unsigned int minor = MINOR(csdev->devt); struct comedi_device *dev; struct comedi_subdevice *s; unsigned int size; int err; err = kstrtouint(buf, 10, &size); if (err) return err; if (size > (UINT_MAX / 1024)) return -EINVAL; size *= 1024; dev = comedi_dev_get_from_minor(minor); if (!dev) return -ENODEV; mutex_lock(&dev->mutex); s = comedi_write_subdevice(dev, minor); if (s && (s->subdev_flags & SDF_CMD_WRITE) && s->async) err = resize_async_buffer(dev, s, size); else err = -EINVAL; mutex_unlock(&dev->mutex); comedi_dev_put(dev); return err ? err : count; } static DEVICE_ATTR_RW(write_buffer_kb); static struct attribute *comedi_dev_attrs[] = { &dev_attr_max_read_buffer_kb.attr, &dev_attr_read_buffer_kb.attr, &dev_attr_max_write_buffer_kb.attr, &dev_attr_write_buffer_kb.attr, NULL, }; ATTRIBUTE_GROUPS(comedi_dev); static const struct class comedi_class = { .name = "comedi", .dev_groups = comedi_dev_groups, }; static void comedi_free_board_dev(struct comedi_device *dev) { if (dev) { comedi_device_cleanup(dev); if (dev->class_dev) { device_destroy(&comedi_class, MKDEV(COMEDI_MAJOR, dev->minor)); } comedi_dev_put(dev); } } static void __comedi_clear_subdevice_runflags(struct comedi_subdevice *s, unsigned int bits) { s->runflags &= ~bits; } static void __comedi_set_subdevice_runflags(struct comedi_subdevice *s, unsigned int bits) { s->runflags |= bits; } static void comedi_update_subdevice_runflags(struct comedi_subdevice *s, unsigned int mask, unsigned int bits) { unsigned long flags; spin_lock_irqsave(&s->spin_lock, flags); __comedi_clear_subdevice_runflags(s, mask); __comedi_set_subdevice_runflags(s, bits & mask); spin_unlock_irqrestore(&s->spin_lock, flags); } static unsigned int __comedi_get_subdevice_runflags(struct comedi_subdevice *s) { return s->runflags; } static unsigned int comedi_get_subdevice_runflags(struct comedi_subdevice *s) { unsigned long flags; unsigned int runflags; spin_lock_irqsave(&s->spin_lock, flags); runflags = __comedi_get_subdevice_runflags(s); spin_unlock_irqrestore(&s->spin_lock, flags); return runflags; } static bool comedi_is_runflags_running(unsigned int runflags) { return runflags & COMEDI_SRF_RUNNING; } static bool comedi_is_runflags_in_error(unsigned int runflags) { return runflags & COMEDI_SRF_ERROR; } /** * comedi_is_subdevice_running() - Check if async command running on subdevice * @s: COMEDI subdevice. * * Return: %true if an asynchronous COMEDI command is active on the * subdevice, else %false. */ bool comedi_is_subdevice_running(struct comedi_subdevice *s) { unsigned int runflags = comedi_get_subdevice_runflags(s); return comedi_is_runflags_running(runflags); } EXPORT_SYMBOL_GPL(comedi_is_subdevice_running); static bool __comedi_is_subdevice_running(struct comedi_subdevice *s) { unsigned int runflags = __comedi_get_subdevice_runflags(s); return comedi_is_runflags_running(runflags); } bool comedi_can_auto_free_spriv(struct comedi_subdevice *s) { unsigned int runflags = __comedi_get_subdevice_runflags(s); return runflags & COMEDI_SRF_FREE_SPRIV; } /** * comedi_set_spriv_auto_free() - Mark subdevice private data as freeable * @s: COMEDI subdevice. * * Mark the subdevice as having a pointer to private data that can be * automatically freed when the COMEDI device is detached from the low-level * driver. */ void comedi_set_spriv_auto_free(struct comedi_subdevice *s) { __comedi_set_subdevice_runflags(s, COMEDI_SRF_FREE_SPRIV); } EXPORT_SYMBOL_GPL(comedi_set_spriv_auto_free); /** * comedi_alloc_spriv - Allocate memory for the subdevice private data * @s: COMEDI subdevice. * @size: Size of the memory to allocate. * * Allocate memory for the subdevice private data and point @s->private * to it. The memory will be freed automatically when the COMEDI device * is detached from the low-level driver. * * Return: A pointer to the allocated memory @s->private on success. * Return NULL on failure. */ void *comedi_alloc_spriv(struct comedi_subdevice *s, size_t size) { s->private = kzalloc(size, GFP_KERNEL); if (s->private) comedi_set_spriv_auto_free(s); return s->private; } EXPORT_SYMBOL_GPL(comedi_alloc_spriv); /* * This function restores a subdevice to an idle state. */ static void do_become_nonbusy(struct comedi_device *dev, struct comedi_subdevice *s) { struct comedi_async *async = s->async; lockdep_assert_held(&dev->mutex); comedi_update_subdevice_runflags(s, COMEDI_SRF_RUNNING, 0); if (async) { comedi_buf_reset(s); async->inttrig = NULL; kfree(async->cmd.chanlist); async->cmd.chanlist = NULL; s->busy = NULL; wake_up_interruptible_all(&async->wait_head); } else { dev_err(dev->class_dev, "BUG: (?) %s called with async=NULL\n", __func__); s->busy = NULL; } } static int do_cancel(struct comedi_device *dev, struct comedi_subdevice *s) { int ret = 0; lockdep_assert_held(&dev->mutex); if (comedi_is_subdevice_running(s) && s->cancel) ret = s->cancel(dev, s); do_become_nonbusy(dev, s); return ret; } void comedi_device_cancel_all(struct comedi_device *dev) { struct comedi_subdevice *s; int i; lockdep_assert_held(&dev->mutex); if (!dev->attached) return; for (i = 0; i < dev->n_subdevices; i++) { s = &dev->subdevices[i]; if (s->async) do_cancel(dev, s); } } static int is_device_busy(struct comedi_device *dev) { struct comedi_subdevice *s; int i; lockdep_assert_held(&dev->mutex); if (!dev->attached) return 0; for (i = 0; i < dev->n_subdevices; i++) { s = &dev->subdevices[i]; if (s->busy) return 1; if (s->async && comedi_buf_is_mmapped(s)) return 1; } return 0; } /* * COMEDI_DEVCONFIG ioctl * attaches (and configures) or detaches a legacy device * * arg: * pointer to comedi_devconfig structure (NULL if detaching) * * reads: * comedi_devconfig structure (if attaching) * * writes: * nothing */ static int do_devconfig_ioctl(struct comedi_device *dev, struct comedi_devconfig __user *arg) { struct comedi_devconfig it; lockdep_assert_held(&dev->mutex); if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (!arg) { if (is_device_busy(dev)) return -EBUSY; if (dev->attached) { struct module *driver_module = dev->driver->module; comedi_device_detach(dev); module_put(driver_module); } return 0; } if (copy_from_user(&it, arg, sizeof(it))) return -EFAULT; it.board_name[COMEDI_NAMELEN - 1] = 0; if (it.options[COMEDI_DEVCONF_AUX_DATA_LENGTH]) { dev_warn(dev->class_dev, "comedi_config --init_data is deprecated\n"); return -EINVAL; } if (dev->minor >= comedi_num_legacy_minors) /* don't re-use dynamically allocated comedi devices */ return -EBUSY; /* This increments the driver module count on success. */ return comedi_device_attach(dev, &it); } /* * COMEDI_BUFCONFIG ioctl * buffer configuration * * arg: * pointer to comedi_bufconfig structure * * reads: * comedi_bufconfig structure * * writes: * modified comedi_bufconfig structure */ static int do_bufconfig_ioctl(struct comedi_device *dev, struct comedi_bufconfig __user *arg) { struct comedi_bufconfig bc; struct comedi_async *async; struct comedi_subdevice *s; int retval = 0; lockdep_assert_held(&dev->mutex); if (copy_from_user(&bc, arg, sizeof(bc))) return -EFAULT; if (bc.subdevice >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[bc.subdevice]; async = s->async; if (!async) { dev_dbg(dev->class_dev, "subdevice does not have async capability\n"); bc.size = 0; bc.maximum_size = 0; goto copyback; } if (bc.maximum_size) { if (!capable(CAP_SYS_ADMIN)) return -EPERM; async->max_bufsize = bc.maximum_size; } if (bc.size) { retval = resize_async_buffer(dev, s, bc.size); if (retval < 0) return retval; } bc.size = async->prealloc_bufsz; bc.maximum_size = async->max_bufsize; copyback: if (copy_to_user(arg, &bc, sizeof(bc))) return -EFAULT; return 0; } /* * COMEDI_DEVINFO ioctl * device info * * arg: * pointer to comedi_devinfo structure * * reads: * nothing * * writes: * comedi_devinfo structure */ static int do_devinfo_ioctl(struct comedi_device *dev, struct comedi_devinfo __user *arg, struct file *file) { struct comedi_subdevice *s; struct comedi_devinfo devinfo; lockdep_assert_held(&dev->mutex); memset(&devinfo, 0, sizeof(devinfo)); /* fill devinfo structure */ devinfo.version_code = COMEDI_VERSION_CODE; devinfo.n_subdevs = dev->n_subdevices; strscpy(devinfo.driver_name, dev->driver->driver_name, COMEDI_NAMELEN); strscpy(devinfo.board_name, dev->board_name, COMEDI_NAMELEN); s = comedi_file_read_subdevice(file); if (s) devinfo.read_subdevice = s->index; else devinfo.read_subdevice = -1; s = comedi_file_write_subdevice(file); if (s) devinfo.write_subdevice = s->index; else devinfo.write_subdevice = -1; if (copy_to_user(arg, &devinfo, sizeof(devinfo))) return -EFAULT; return 0; } /* * COMEDI_SUBDINFO ioctl * subdevices info * * arg: * pointer to array of comedi_subdinfo structures * * reads: * nothing * * writes: * array of comedi_subdinfo structures */ static int do_subdinfo_ioctl(struct comedi_device *dev, struct comedi_subdinfo __user *arg, void *file) { int ret, i; struct comedi_subdinfo *tmp, *us; struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); tmp = kcalloc(dev->n_subdevices, sizeof(*tmp), GFP_KERNEL); if (!tmp) return -ENOMEM; /* fill subdinfo structs */ for (i = 0; i < dev->n_subdevices; i++) { s = &dev->subdevices[i]; us = tmp + i; us->type = s->type; us->n_chan = s->n_chan; us->subd_flags = s->subdev_flags; if (comedi_is_subdevice_running(s)) us->subd_flags |= SDF_RUNNING; #define TIMER_nanosec 5 /* backwards compatibility */ us->timer_type = TIMER_nanosec; us->len_chanlist = s->len_chanlist; us->maxdata = s->maxdata; if (s->range_table) { us->range_type = (i << 24) | (0 << 16) | (s->range_table->length); } else { us->range_type = 0; /* XXX */ } if (s->busy) us->subd_flags |= SDF_BUSY; if (s->busy == file) us->subd_flags |= SDF_BUSY_OWNER; if (s->lock) us->subd_flags |= SDF_LOCKED; if (s->lock == file) us->subd_flags |= SDF_LOCK_OWNER; if (!s->maxdata && s->maxdata_list) us->subd_flags |= SDF_MAXDATA; if (s->range_table_list) us->subd_flags |= SDF_RANGETYPE; if (s->do_cmd) us->subd_flags |= SDF_CMD; if (s->insn_bits != &insn_inval) us->insn_bits_support = COMEDI_SUPPORTED; else us->insn_bits_support = COMEDI_UNSUPPORTED; } ret = copy_to_user(arg, tmp, dev->n_subdevices * sizeof(*tmp)); kfree(tmp); return ret ? -EFAULT : 0; } /* * COMEDI_CHANINFO ioctl * subdevice channel info * * arg: * pointer to comedi_chaninfo structure * * reads: * comedi_chaninfo structure * * writes: * array of maxdata values to chaninfo->maxdata_list if requested * array of range table lengths to chaninfo->range_table_list if requested */ static int do_chaninfo_ioctl(struct comedi_device *dev, struct comedi_chaninfo *it) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (it->subdev >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[it->subdev]; if (it->maxdata_list) { if (s->maxdata || !s->maxdata_list) return -EINVAL; if (copy_to_user(it->maxdata_list, s->maxdata_list, s->n_chan * sizeof(unsigned int))) return -EFAULT; } if (it->flaglist) return -EINVAL; /* flaglist not supported */ if (it->rangelist) { int i; if (!s->range_table_list) return -EINVAL; for (i = 0; i < s->n_chan; i++) { int x; x = (dev->minor << 28) | (it->subdev << 24) | (i << 16) | (s->range_table_list[i]->length); if (put_user(x, it->rangelist + i)) return -EFAULT; } } return 0; } /* * COMEDI_BUFINFO ioctl * buffer information * * arg: * pointer to comedi_bufinfo structure * * reads: * comedi_bufinfo structure * * writes: * modified comedi_bufinfo structure */ static int do_bufinfo_ioctl(struct comedi_device *dev, struct comedi_bufinfo __user *arg, void *file) { struct comedi_bufinfo bi; struct comedi_subdevice *s; struct comedi_async *async; unsigned int runflags; int retval = 0; bool become_nonbusy = false; lockdep_assert_held(&dev->mutex); if (copy_from_user(&bi, arg, sizeof(bi))) return -EFAULT; if (bi.subdevice >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[bi.subdevice]; async = s->async; if (!async || s->busy != file) return -EINVAL; runflags = comedi_get_subdevice_runflags(s); if (!(async->cmd.flags & CMDF_WRITE)) { /* command was set up in "read" direction */ if (bi.bytes_read) { comedi_buf_read_alloc(s, bi.bytes_read); bi.bytes_read = comedi_buf_read_free(s, bi.bytes_read); } /* * If nothing left to read, and command has stopped, and * {"read" position not updated or command stopped normally}, * then become non-busy. */ if (comedi_buf_read_n_available(s) == 0 && !comedi_is_runflags_running(runflags) && (bi.bytes_read == 0 || !comedi_is_runflags_in_error(runflags))) { become_nonbusy = true; if (comedi_is_runflags_in_error(runflags)) retval = -EPIPE; } bi.bytes_written = 0; } else { /* command was set up in "write" direction */ if (!comedi_is_runflags_running(runflags)) { bi.bytes_written = 0; become_nonbusy = true; if (comedi_is_runflags_in_error(runflags)) retval = -EPIPE; } else if (bi.bytes_written) { comedi_buf_write_alloc(s, bi.bytes_written); bi.bytes_written = comedi_buf_write_free(s, bi.bytes_written); } bi.bytes_read = 0; } bi.buf_write_count = async->buf_write_count; bi.buf_write_ptr = async->buf_write_ptr; bi.buf_read_count = async->buf_read_count; bi.buf_read_ptr = async->buf_read_ptr; if (become_nonbusy) do_become_nonbusy(dev, s); if (retval) return retval; if (copy_to_user(arg, &bi, sizeof(bi))) return -EFAULT; return 0; } static int check_insn_config_length(struct comedi_insn *insn, unsigned int *data) { if (insn->n < 1) return -EINVAL; switch (data[0]) { case INSN_CONFIG_DIO_OUTPUT: case INSN_CONFIG_DIO_INPUT: case INSN_CONFIG_DISARM: case INSN_CONFIG_RESET: if (insn->n == 1) return 0; break; case INSN_CONFIG_ARM: case INSN_CONFIG_DIO_QUERY: case INSN_CONFIG_BLOCK_SIZE: case INSN_CONFIG_FILTER: case INSN_CONFIG_SERIAL_CLOCK: case INSN_CONFIG_BIDIRECTIONAL_DATA: case INSN_CONFIG_ALT_SOURCE: case INSN_CONFIG_SET_COUNTER_MODE: case INSN_CONFIG_8254_READ_STATUS: case INSN_CONFIG_SET_ROUTING: case INSN_CONFIG_GET_ROUTING: case INSN_CONFIG_GET_PWM_STATUS: case INSN_CONFIG_PWM_SET_PERIOD: case INSN_CONFIG_PWM_GET_PERIOD: if (insn->n == 2) return 0; break; case INSN_CONFIG_SET_GATE_SRC: case INSN_CONFIG_GET_GATE_SRC: case INSN_CONFIG_SET_CLOCK_SRC: case INSN_CONFIG_GET_CLOCK_SRC: case INSN_CONFIG_SET_OTHER_SRC: case INSN_CONFIG_GET_COUNTER_STATUS: case INSN_CONFIG_GET_PWM_OUTPUT: case INSN_CONFIG_PWM_SET_H_BRIDGE: case INSN_CONFIG_PWM_GET_H_BRIDGE: case INSN_CONFIG_GET_HARDWARE_BUFFER_SIZE: if (insn->n == 3) return 0; break; case INSN_CONFIG_PWM_OUTPUT: case INSN_CONFIG_ANALOG_TRIG: case INSN_CONFIG_TIMER_1: if (insn->n == 5) return 0; break; case INSN_CONFIG_DIGITAL_TRIG: if (insn->n == 6) return 0; break; case INSN_CONFIG_GET_CMD_TIMING_CONSTRAINTS: if (insn->n >= 4) return 0; break; /* * by default we allow the insn since we don't have checks for * all possible cases yet */ default: pr_warn("No check for data length of config insn id %i is implemented\n", data[0]); pr_warn("Add a check to %s in %s\n", __func__, __FILE__); pr_warn("Assuming n=%i is correct\n", insn->n); return 0; } return -EINVAL; } static int check_insn_device_config_length(struct comedi_insn *insn, unsigned int *data) { if (insn->n < 1) return -EINVAL; switch (data[0]) { case INSN_DEVICE_CONFIG_TEST_ROUTE: case INSN_DEVICE_CONFIG_CONNECT_ROUTE: case INSN_DEVICE_CONFIG_DISCONNECT_ROUTE: if (insn->n == 3) return 0; break; case INSN_DEVICE_CONFIG_GET_ROUTES: /* * Big enough for config_id and the length of the userland * memory buffer. Additional length should be in factors of 2 * to communicate any returned route pairs (source,destination). */ if (insn->n >= 2) return 0; break; } return -EINVAL; } /** * get_valid_routes() - Calls low-level driver get_valid_routes function to * either return a count of valid routes to user, or copy * of list of all valid device routes to buffer in * userspace. * @dev: comedi device pointer * @data: data from user insn call. The length of the data must be >= 2. * data[0] must contain the INSN_DEVICE_CONFIG config_id. * data[1](input) contains the number of _pairs_ for which memory is * allotted from the user. If the user specifies '0', then only * the number of pairs available is returned. * data[1](output) returns either the number of pairs available (if none * where requested) or the number of _pairs_ that are copied back * to the user. * data[2::2] returns each (source, destination) pair. * * Return: -EINVAL if low-level driver does not allocate and return routes as * expected. Returns 0 otherwise. */ static int get_valid_routes(struct comedi_device *dev, unsigned int *data) { lockdep_assert_held(&dev->mutex); data[1] = dev->get_valid_routes(dev, data[1], data + 2); return 0; } static int parse_insn(struct comedi_device *dev, struct comedi_insn *insn, unsigned int *data, void *file) { struct comedi_subdevice *s; int ret = 0; int i; lockdep_assert_held(&dev->mutex); if (insn->insn & INSN_MASK_SPECIAL) { /* a non-subdevice instruction */ switch (insn->insn) { case INSN_GTOD: { struct timespec64 tv; if (insn->n != 2) { ret = -EINVAL; break; } ktime_get_real_ts64(&tv); /* unsigned data safe until 2106 */ data[0] = (unsigned int)tv.tv_sec; data[1] = tv.tv_nsec / NSEC_PER_USEC; ret = 2; break; } case INSN_WAIT: if (insn->n != 1 || data[0] >= 100000) { ret = -EINVAL; break; } udelay(data[0] / 1000); ret = 1; break; case INSN_INTTRIG: if (insn->n != 1) { ret = -EINVAL; break; } if (insn->subdev >= dev->n_subdevices) { dev_dbg(dev->class_dev, "%d not usable subdevice\n", insn->subdev); ret = -EINVAL; break; } s = &dev->subdevices[insn->subdev]; if (!s->async) { dev_dbg(dev->class_dev, "no async\n"); ret = -EINVAL; break; } if (!s->async->inttrig) { dev_dbg(dev->class_dev, "no inttrig\n"); ret = -EAGAIN; break; } ret = s->async->inttrig(dev, s, data[0]); if (ret >= 0) ret = 1; break; case INSN_DEVICE_CONFIG: ret = check_insn_device_config_length(insn, data); if (ret) break; if (data[0] == INSN_DEVICE_CONFIG_GET_ROUTES) { /* * data[1] should be the number of _pairs_ that * the memory can hold. */ data[1] = (insn->n - 2) / 2; ret = get_valid_routes(dev, data); break; } /* other global device config instructions. */ ret = dev->insn_device_config(dev, insn, data); break; default: dev_dbg(dev->class_dev, "invalid insn\n"); ret = -EINVAL; break; } } else { /* a subdevice instruction */ unsigned int maxdata; if (insn->subdev >= dev->n_subdevices) { dev_dbg(dev->class_dev, "subdevice %d out of range\n", insn->subdev); ret = -EINVAL; goto out; } s = &dev->subdevices[insn->subdev]; if (s->type == COMEDI_SUBD_UNUSED) { dev_dbg(dev->class_dev, "%d not usable subdevice\n", insn->subdev); ret = -EIO; goto out; } /* are we locked? (ioctl lock) */ if (s->lock && s->lock != file) { dev_dbg(dev->class_dev, "device locked\n"); ret = -EACCES; goto out; } ret = comedi_check_chanlist(s, 1, &insn->chanspec); if (ret < 0) { ret = -EINVAL; dev_dbg(dev->class_dev, "bad chanspec\n"); goto out; } if (s->busy) { ret = -EBUSY; goto out; } /* This looks arbitrary. It is. */ s->busy = parse_insn; switch (insn->insn) { case INSN_READ: ret = s->insn_read(dev, s, insn, data); if (ret == -ETIMEDOUT) { dev_dbg(dev->class_dev, "subdevice %d read instruction timed out\n", s->index); } break; case INSN_WRITE: maxdata = s->maxdata_list ? s->maxdata_list[CR_CHAN(insn->chanspec)] : s->maxdata; for (i = 0; i < insn->n; ++i) { if (data[i] > maxdata) { ret = -EINVAL; dev_dbg(dev->class_dev, "bad data value(s)\n"); break; } } if (ret == 0) { ret = s->insn_write(dev, s, insn, data); if (ret == -ETIMEDOUT) { dev_dbg(dev->class_dev, "subdevice %d write instruction timed out\n", s->index); } } break; case INSN_BITS: if (insn->n != 2) { ret = -EINVAL; } else { /* * Most drivers ignore the base channel in * insn->chanspec. Fix this here if * the subdevice has <= 32 channels. */ unsigned int orig_mask = data[0]; unsigned int shift = 0; if (s->n_chan <= 32) { shift = CR_CHAN(insn->chanspec); if (shift > 0) { insn->chanspec = 0; data[0] <<= shift; data[1] <<= shift; } } ret = s->insn_bits(dev, s, insn, data); data[0] = orig_mask; if (shift > 0) data[1] >>= shift; } break; case INSN_CONFIG: ret = check_insn_config_length(insn, data); if (ret) break; ret = s->insn_config(dev, s, insn, data); break; default: ret = -EINVAL; break; } s->busy = NULL; } out: return ret; } /* * COMEDI_INSNLIST ioctl * synchronous instruction list * * arg: * pointer to comedi_insnlist structure * * reads: * comedi_insnlist structure * array of comedi_insn structures from insnlist->insns pointer * data (for writes) from insns[].data pointers * * writes: * data (for reads) to insns[].data pointers */ /* arbitrary limits */ #define MIN_SAMPLES 16 #define MAX_SAMPLES 65536 static int do_insnlist_ioctl(struct comedi_device *dev, struct comedi_insn *insns, unsigned int n_insns, void *file) { unsigned int *data = NULL; unsigned int max_n_data_required = MIN_SAMPLES; int i = 0; int ret = 0; lockdep_assert_held(&dev->mutex); /* Determine maximum memory needed for all instructions. */ for (i = 0; i < n_insns; ++i) { if (insns[i].n > MAX_SAMPLES) { dev_dbg(dev->class_dev, "number of samples too large\n"); ret = -EINVAL; goto error; } max_n_data_required = max(max_n_data_required, insns[i].n); } /* Allocate scratch space for all instruction data. */ data = kmalloc_array(max_n_data_required, sizeof(unsigned int), GFP_KERNEL); if (!data) { ret = -ENOMEM; goto error; } for (i = 0; i < n_insns; ++i) { if (insns[i].insn & INSN_MASK_WRITE) { if (copy_from_user(data, insns[i].data, insns[i].n * sizeof(unsigned int))) { dev_dbg(dev->class_dev, "copy_from_user failed\n"); ret = -EFAULT; goto error; } } ret = parse_insn(dev, insns + i, data, file); if (ret < 0) goto error; if (insns[i].insn & INSN_MASK_READ) { if (copy_to_user(insns[i].data, data, insns[i].n * sizeof(unsigned int))) { dev_dbg(dev->class_dev, "copy_to_user failed\n"); ret = -EFAULT; goto error; } } if (need_resched()) schedule(); } error: kfree(data); if (ret < 0) return ret; return i; } /* * COMEDI_INSN ioctl * synchronous instruction * * arg: * pointer to comedi_insn structure * * reads: * comedi_insn structure * data (for writes) from insn->data pointer * * writes: * data (for reads) to insn->data pointer */ static int do_insn_ioctl(struct comedi_device *dev, struct comedi_insn *insn, void *file) { unsigned int *data = NULL; unsigned int n_data = MIN_SAMPLES; int ret = 0; lockdep_assert_held(&dev->mutex); n_data = max(n_data, insn->n); /* This is where the behavior of insn and insnlist deviate. */ if (insn->n > MAX_SAMPLES) { insn->n = MAX_SAMPLES; n_data = MAX_SAMPLES; } data = kmalloc_array(n_data, sizeof(unsigned int), GFP_KERNEL); if (!data) { ret = -ENOMEM; goto error; } if (insn->insn & INSN_MASK_WRITE) { if (copy_from_user(data, insn->data, insn->n * sizeof(unsigned int))) { ret = -EFAULT; goto error; } } ret = parse_insn(dev, insn, data, file); if (ret < 0) goto error; if (insn->insn & INSN_MASK_READ) { if (copy_to_user(insn->data, data, insn->n * sizeof(unsigned int))) { ret = -EFAULT; goto error; } } ret = insn->n; error: kfree(data); return ret; } static int __comedi_get_user_cmd(struct comedi_device *dev, struct comedi_cmd *cmd) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (cmd->subdev >= dev->n_subdevices) { dev_dbg(dev->class_dev, "%d no such subdevice\n", cmd->subdev); return -ENODEV; } s = &dev->subdevices[cmd->subdev]; if (s->type == COMEDI_SUBD_UNUSED) { dev_dbg(dev->class_dev, "%d not valid subdevice\n", cmd->subdev); return -EIO; } if (!s->do_cmd || !s->do_cmdtest || !s->async) { dev_dbg(dev->class_dev, "subdevice %d does not support commands\n", cmd->subdev); return -EIO; } /* make sure channel/gain list isn't too long */ if (cmd->chanlist_len > s->len_chanlist) { dev_dbg(dev->class_dev, "channel/gain list too long %d > %d\n", cmd->chanlist_len, s->len_chanlist); return -EINVAL; } /* * Set the CMDF_WRITE flag to the correct state if the subdevice * supports only "read" commands or only "write" commands. */ switch (s->subdev_flags & (SDF_CMD_READ | SDF_CMD_WRITE)) { case SDF_CMD_READ: cmd->flags &= ~CMDF_WRITE; break; case SDF_CMD_WRITE: cmd->flags |= CMDF_WRITE; break; default: break; } return 0; } static int __comedi_get_user_chanlist(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int __user *user_chanlist, struct comedi_cmd *cmd) { unsigned int *chanlist; int ret; lockdep_assert_held(&dev->mutex); cmd->chanlist = NULL; chanlist = memdup_array_user(user_chanlist, cmd->chanlist_len, sizeof(unsigned int)); if (IS_ERR(chanlist)) return PTR_ERR(chanlist); /* make sure each element in channel/gain list is valid */ ret = comedi_check_chanlist(s, cmd->chanlist_len, chanlist); if (ret < 0) { kfree(chanlist); return ret; } cmd->chanlist = chanlist; return 0; } /* * COMEDI_CMD ioctl * asynchronous acquisition command set-up * * arg: * pointer to comedi_cmd structure * * reads: * comedi_cmd structure * channel/range list from cmd->chanlist pointer * * writes: * possibly modified comedi_cmd structure (when -EAGAIN returned) */ static int do_cmd_ioctl(struct comedi_device *dev, struct comedi_cmd *cmd, bool *copy, void *file) { struct comedi_subdevice *s; struct comedi_async *async; unsigned int __user *user_chanlist; int ret; lockdep_assert_held(&dev->mutex); /* do some simple cmd validation */ ret = __comedi_get_user_cmd(dev, cmd); if (ret) return ret; /* save user's chanlist pointer so it can be restored later */ user_chanlist = (unsigned int __user *)cmd->chanlist; s = &dev->subdevices[cmd->subdev]; async = s->async; /* are we locked? (ioctl lock) */ if (s->lock && s->lock != file) { dev_dbg(dev->class_dev, "subdevice locked\n"); return -EACCES; } /* are we busy? */ if (s->busy) { dev_dbg(dev->class_dev, "subdevice busy\n"); return -EBUSY; } /* make sure channel/gain list isn't too short */ if (cmd->chanlist_len < 1) { dev_dbg(dev->class_dev, "channel/gain list too short %u < 1\n", cmd->chanlist_len); return -EINVAL; } async->cmd = *cmd; async->cmd.data = NULL; /* load channel/gain list */ ret = __comedi_get_user_chanlist(dev, s, user_chanlist, &async->cmd); if (ret) goto cleanup; ret = s->do_cmdtest(dev, s, &async->cmd); if (async->cmd.flags & CMDF_BOGUS || ret) { dev_dbg(dev->class_dev, "test returned %d\n", ret); *cmd = async->cmd; /* restore chanlist pointer before copying back */ cmd->chanlist = (unsigned int __force *)user_chanlist; cmd->data = NULL; *copy = true; ret = -EAGAIN; goto cleanup; } if (!async->prealloc_bufsz) { ret = -ENOMEM; dev_dbg(dev->class_dev, "no buffer (?)\n"); goto cleanup; } comedi_buf_reset(s); async->cb_mask = COMEDI_CB_BLOCK | COMEDI_CB_CANCEL_MASK; if (async->cmd.flags & CMDF_WAKE_EOS) async->cb_mask |= COMEDI_CB_EOS; comedi_update_subdevice_runflags(s, COMEDI_SRF_BUSY_MASK, COMEDI_SRF_RUNNING); /* * Set s->busy _after_ setting COMEDI_SRF_RUNNING flag to avoid * race with comedi_read() or comedi_write(). */ s->busy = file; ret = s->do_cmd(dev, s); if (ret == 0) return 0; cleanup: do_become_nonbusy(dev, s); return ret; } /* * COMEDI_CMDTEST ioctl * asynchronous acquisition command testing * * arg: * pointer to comedi_cmd structure * * reads: * comedi_cmd structure * channel/range list from cmd->chanlist pointer * * writes: * possibly modified comedi_cmd structure */ static int do_cmdtest_ioctl(struct comedi_device *dev, struct comedi_cmd *cmd, bool *copy, void *file) { struct comedi_subdevice *s; unsigned int __user *user_chanlist; int ret; lockdep_assert_held(&dev->mutex); /* do some simple cmd validation */ ret = __comedi_get_user_cmd(dev, cmd); if (ret) return ret; /* save user's chanlist pointer so it can be restored later */ user_chanlist = (unsigned int __user *)cmd->chanlist; s = &dev->subdevices[cmd->subdev]; /* user_chanlist can be NULL for COMEDI_CMDTEST ioctl */ if (user_chanlist) { /* load channel/gain list */ ret = __comedi_get_user_chanlist(dev, s, user_chanlist, cmd); if (ret) return ret; } ret = s->do_cmdtest(dev, s, cmd); kfree(cmd->chanlist); /* free kernel copy of user chanlist */ /* restore chanlist pointer before copying back */ cmd->chanlist = (unsigned int __force *)user_chanlist; *copy = true; return ret; } /* * COMEDI_LOCK ioctl * lock subdevice * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_lock_ioctl(struct comedi_device *dev, unsigned long arg, void *file) { int ret = 0; unsigned long flags; struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[arg]; spin_lock_irqsave(&s->spin_lock, flags); if (s->busy || s->lock) ret = -EBUSY; else s->lock = file; spin_unlock_irqrestore(&s->spin_lock, flags); return ret; } /* * COMEDI_UNLOCK ioctl * unlock subdevice * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_unlock_ioctl(struct comedi_device *dev, unsigned long arg, void *file) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[arg]; if (s->busy) return -EBUSY; if (s->lock && s->lock != file) return -EACCES; if (s->lock == file) s->lock = NULL; return 0; } /* * COMEDI_CANCEL ioctl * cancel asynchronous acquisition * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_cancel_ioctl(struct comedi_device *dev, unsigned long arg, void *file) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[arg]; if (!s->async) return -EINVAL; if (!s->busy) return 0; if (s->busy != file) return -EBUSY; return do_cancel(dev, s); } /* * COMEDI_POLL ioctl * instructs driver to synchronize buffers * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_poll_ioctl(struct comedi_device *dev, unsigned long arg, void *file) { struct comedi_subdevice *s; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s = &dev->subdevices[arg]; if (!s->busy) return 0; if (s->busy != file) return -EBUSY; if (s->poll) return s->poll(dev, s); return -EINVAL; } /* * COMEDI_SETRSUBD ioctl * sets the current "read" subdevice on a per-file basis * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_setrsubd_ioctl(struct comedi_device *dev, unsigned long arg, struct file *file) { struct comedi_file *cfp = file->private_data; struct comedi_subdevice *s_old, *s_new; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s_new = &dev->subdevices[arg]; s_old = comedi_file_read_subdevice(file); if (s_old == s_new) return 0; /* no change */ if (!(s_new->subdev_flags & SDF_CMD_READ)) return -EINVAL; /* * Check the file isn't still busy handling a "read" command on the * old subdevice (if any). */ if (s_old && s_old->busy == file && s_old->async && !(s_old->async->cmd.flags & CMDF_WRITE)) return -EBUSY; WRITE_ONCE(cfp->read_subdev, s_new); return 0; } /* * COMEDI_SETWSUBD ioctl * sets the current "write" subdevice on a per-file basis * * arg: * subdevice number * * reads: * nothing * * writes: * nothing */ static int do_setwsubd_ioctl(struct comedi_device *dev, unsigned long arg, struct file *file) { struct comedi_file *cfp = file->private_data; struct comedi_subdevice *s_old, *s_new; lockdep_assert_held(&dev->mutex); if (arg >= dev->n_subdevices) return -EINVAL; s_new = &dev->subdevices[arg]; s_old = comedi_file_write_subdevice(file); if (s_old == s_new) return 0; /* no change */ if (!(s_new->subdev_flags & SDF_CMD_WRITE)) return -EINVAL; /* * Check the file isn't still busy handling a "write" command on the * old subdevice (if any). */ if (s_old && s_old->busy == file && s_old->async && (s_old->async->cmd.flags & CMDF_WRITE)) return -EBUSY; WRITE_ONCE(cfp->write_subdev, s_new); return 0; } static long comedi_unlocked_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { unsigned int minor = iminor(file_inode(file)); struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; int rc; mutex_lock(&dev->mutex); /* * Device config is special, because it must work on * an unconfigured device. */ if (cmd == COMEDI_DEVCONFIG) { if (minor >= COMEDI_NUM_BOARD_MINORS) { /* Device config not appropriate on non-board minors. */ rc = -ENOTTY; goto done; } rc = do_devconfig_ioctl(dev, (struct comedi_devconfig __user *)arg); if (rc == 0) { if (arg == 0 && dev->minor >= comedi_num_legacy_minors) { /* * Successfully unconfigured a dynamically * allocated device. Try and remove it. */ if (comedi_clear_board_dev(dev)) { mutex_unlock(&dev->mutex); comedi_free_board_dev(dev); return rc; } } } goto done; } if (!dev->attached) { dev_dbg(dev->class_dev, "no driver attached\n"); rc = -ENODEV; goto done; } switch (cmd) { case COMEDI_BUFCONFIG: rc = do_bufconfig_ioctl(dev, (struct comedi_bufconfig __user *)arg); break; case COMEDI_DEVINFO: rc = do_devinfo_ioctl(dev, (struct comedi_devinfo __user *)arg, file); break; case COMEDI_SUBDINFO: rc = do_subdinfo_ioctl(dev, (struct comedi_subdinfo __user *)arg, file); break; case COMEDI_CHANINFO: { struct comedi_chaninfo it; if (copy_from_user(&it, (void __user *)arg, sizeof(it))) rc = -EFAULT; else rc = do_chaninfo_ioctl(dev, &it); break; } case COMEDI_RANGEINFO: { struct comedi_rangeinfo it; if (copy_from_user(&it, (void __user *)arg, sizeof(it))) rc = -EFAULT; else rc = do_rangeinfo_ioctl(dev, &it); break; } case COMEDI_BUFINFO: rc = do_bufinfo_ioctl(dev, (struct comedi_bufinfo __user *)arg, file); break; case COMEDI_LOCK: rc = do_lock_ioctl(dev, arg, file); break; case COMEDI_UNLOCK: rc = do_unlock_ioctl(dev, arg, file); break; case COMEDI_CANCEL: rc = do_cancel_ioctl(dev, arg, file); break; case COMEDI_CMD: { struct comedi_cmd cmd; bool copy = false; if (copy_from_user(&cmd, (void __user *)arg, sizeof(cmd))) { rc = -EFAULT; break; } rc = do_cmd_ioctl(dev, &cmd, ©, file); if (copy && copy_to_user((void __user *)arg, &cmd, sizeof(cmd))) rc = -EFAULT; break; } case COMEDI_CMDTEST: { struct comedi_cmd cmd; bool copy = false; if (copy_from_user(&cmd, (void __user *)arg, sizeof(cmd))) { rc = -EFAULT; break; } rc = do_cmdtest_ioctl(dev, &cmd, ©, file); if (copy && copy_to_user((void __user *)arg, &cmd, sizeof(cmd))) rc = -EFAULT; break; } case COMEDI_INSNLIST: { struct comedi_insnlist insnlist; struct comedi_insn *insns = NULL; if (copy_from_user(&insnlist, (void __user *)arg, sizeof(insnlist))) { rc = -EFAULT; break; } insns = kcalloc(insnlist.n_insns, sizeof(*insns), GFP_KERNEL); if (!insns) { rc = -ENOMEM; break; } if (copy_from_user(insns, insnlist.insns, sizeof(*insns) * insnlist.n_insns)) { rc = -EFAULT; kfree(insns); break; } rc = do_insnlist_ioctl(dev, insns, insnlist.n_insns, file); kfree(insns); break; } case COMEDI_INSN: { struct comedi_insn insn; if (copy_from_user(&insn, (void __user *)arg, sizeof(insn))) rc = -EFAULT; else rc = do_insn_ioctl(dev, &insn, file); break; } case COMEDI_POLL: rc = do_poll_ioctl(dev, arg, file); break; case COMEDI_SETRSUBD: rc = do_setrsubd_ioctl(dev, arg, file); break; case COMEDI_SETWSUBD: rc = do_setwsubd_ioctl(dev, arg, file); break; default: rc = -ENOTTY; break; } done: mutex_unlock(&dev->mutex); return rc; } static void comedi_vm_open(struct vm_area_struct *area) { struct comedi_buf_map *bm; bm = area->vm_private_data; comedi_buf_map_get(bm); } static void comedi_vm_close(struct vm_area_struct *area) { struct comedi_buf_map *bm; bm = area->vm_private_data; comedi_buf_map_put(bm); } static int comedi_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write) { struct comedi_buf_map *bm = vma->vm_private_data; unsigned long offset = addr - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT); if (len < 0) return -EINVAL; if (len > vma->vm_end - addr) len = vma->vm_end - addr; return comedi_buf_map_access(bm, offset, buf, len, write); } static const struct vm_operations_struct comedi_vm_ops = { .open = comedi_vm_open, .close = comedi_vm_close, .access = comedi_vm_access, }; static int comedi_mmap(struct file *file, struct vm_area_struct *vma) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_subdevice *s; struct comedi_async *async; struct comedi_buf_map *bm = NULL; struct comedi_buf_page *buf; unsigned long start = vma->vm_start; unsigned long size; int n_pages; int i; int retval = 0; /* * 'trylock' avoids circular dependency with current->mm->mmap_lock * and down-reading &dev->attach_lock should normally succeed without * contention unless the device is in the process of being attached * or detached. */ if (!down_read_trylock(&dev->attach_lock)) return -EAGAIN; if (!dev->attached) { dev_dbg(dev->class_dev, "no driver attached\n"); retval = -ENODEV; goto done; } if (vma->vm_flags & VM_WRITE) s = comedi_file_write_subdevice(file); else s = comedi_file_read_subdevice(file); if (!s) { retval = -EINVAL; goto done; } async = s->async; if (!async) { retval = -EINVAL; goto done; } if (vma->vm_pgoff != 0) { dev_dbg(dev->class_dev, "mmap() offset must be 0.\n"); retval = -EINVAL; goto done; } size = vma->vm_end - vma->vm_start; if (size > async->prealloc_bufsz) { retval = -EFAULT; goto done; } if (offset_in_page(size)) { retval = -EFAULT; goto done; } n_pages = vma_pages(vma); /* get reference to current buf map (if any) */ bm = comedi_buf_map_from_subdev_get(s); if (!bm || n_pages > bm->n_pages) { retval = -EINVAL; goto done; } if (bm->dma_dir != DMA_NONE) { /* * DMA buffer was allocated as a single block. * Address is in page_list[0]. */ buf = &bm->page_list[0]; retval = dma_mmap_coherent(bm->dma_hw_dev, vma, buf->virt_addr, buf->dma_addr, n_pages * PAGE_SIZE); } else { for (i = 0; i < n_pages; ++i) { unsigned long pfn; buf = &bm->page_list[i]; pfn = page_to_pfn(virt_to_page(buf->virt_addr)); retval = remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); if (retval) break; start += PAGE_SIZE; } #ifdef CONFIG_MMU /* * Leaving behind a partial mapping of a buffer we're about to * drop is unsafe, see remap_pfn_range_notrack(). * We need to zap the range here ourselves instead of relying * on the automatic zapping in remap_pfn_range() because we call * remap_pfn_range() in a loop. */ if (retval) zap_vma_ptes(vma, vma->vm_start, size); #endif } if (retval == 0) { vma->vm_ops = &comedi_vm_ops; vma->vm_private_data = bm; vma->vm_ops->open(vma); } done: up_read(&dev->attach_lock); comedi_buf_map_put(bm); /* put reference to buf map - okay if NULL */ return retval; } static __poll_t comedi_poll(struct file *file, poll_table *wait) { __poll_t mask = 0; struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_subdevice *s, *s_read; down_read(&dev->attach_lock); if (!dev->attached) { dev_dbg(dev->class_dev, "no driver attached\n"); goto done; } s = comedi_file_read_subdevice(file); s_read = s; if (s && s->async) { poll_wait(file, &s->async->wait_head, wait); if (s->busy != file || !comedi_is_subdevice_running(s) || (s->async->cmd.flags & CMDF_WRITE) || comedi_buf_read_n_available(s) > 0) mask |= EPOLLIN | EPOLLRDNORM; } s = comedi_file_write_subdevice(file); if (s && s->async) { unsigned int bps = comedi_bytes_per_sample(s); if (s != s_read) poll_wait(file, &s->async->wait_head, wait); if (s->busy != file || !comedi_is_subdevice_running(s) || !(s->async->cmd.flags & CMDF_WRITE) || comedi_buf_write_n_available(s) >= bps) mask |= EPOLLOUT | EPOLLWRNORM; } done: up_read(&dev->attach_lock); return mask; } static ssize_t comedi_write(struct file *file, const char __user *buf, size_t nbytes, loff_t *offset) { struct comedi_subdevice *s; struct comedi_async *async; unsigned int n, m; ssize_t count = 0; int retval = 0; DECLARE_WAITQUEUE(wait, current); struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; bool become_nonbusy = false; bool attach_locked; unsigned int old_detach_count; /* Protect against device detachment during operation. */ down_read(&dev->attach_lock); attach_locked = true; old_detach_count = dev->detach_count; if (!dev->attached) { dev_dbg(dev->class_dev, "no driver attached\n"); retval = -ENODEV; goto out; } s = comedi_file_write_subdevice(file); if (!s || !s->async) { retval = -EIO; goto out; } async = s->async; if (s->busy != file || !(async->cmd.flags & CMDF_WRITE)) { retval = -EINVAL; goto out; } add_wait_queue(&async->wait_head, &wait); while (count == 0 && !retval) { unsigned int runflags; unsigned int wp, n1, n2; set_current_state(TASK_INTERRUPTIBLE); runflags = comedi_get_subdevice_runflags(s); if (!comedi_is_runflags_running(runflags)) { if (comedi_is_runflags_in_error(runflags)) retval = -EPIPE; if (retval || nbytes) become_nonbusy = true; break; } if (nbytes == 0) break; /* Allocate all free buffer space. */ comedi_buf_write_alloc(s, async->prealloc_bufsz); m = comedi_buf_write_n_allocated(s); n = min_t(size_t, m, nbytes); if (n == 0) { if (file->f_flags & O_NONBLOCK) { retval = -EAGAIN; break; } schedule(); if (signal_pending(current)) { retval = -ERESTARTSYS; break; } if (s->busy != file || !(async->cmd.flags & CMDF_WRITE)) { retval = -EINVAL; break; } continue; } set_current_state(TASK_RUNNING); wp = async->buf_write_ptr; n1 = min(n, async->prealloc_bufsz - wp); n2 = n - n1; m = copy_from_user(async->prealloc_buf + wp, buf, n1); if (m) m += n2; else if (n2) m = copy_from_user(async->prealloc_buf, buf + n1, n2); if (m) { n -= m; retval = -EFAULT; } comedi_buf_write_free(s, n); count += n; nbytes -= n; buf += n; } remove_wait_queue(&async->wait_head, &wait); set_current_state(TASK_RUNNING); if (become_nonbusy && count == 0) { struct comedi_subdevice *new_s; /* * To avoid deadlock, cannot acquire dev->mutex * while dev->attach_lock is held. */ up_read(&dev->attach_lock); attach_locked = false; mutex_lock(&dev->mutex); /* * Check device hasn't become detached behind our back. * Checking dev->detach_count is unchanged ought to be * sufficient (unless there have been 2**32 detaches in the * meantime!), but check the subdevice pointer as well just in * case. * * Also check the subdevice is still in a suitable state to * become non-busy in case it changed behind our back. */ new_s = comedi_file_write_subdevice(file); if (dev->attached && old_detach_count == dev->detach_count && s == new_s && new_s->async == async && s->busy == file && (async->cmd.flags & CMDF_WRITE) && !comedi_is_subdevice_running(s)) do_become_nonbusy(dev, s); mutex_unlock(&dev->mutex); } out: if (attach_locked) up_read(&dev->attach_lock); return count ? count : retval; } static ssize_t comedi_read(struct file *file, char __user *buf, size_t nbytes, loff_t *offset) { struct comedi_subdevice *s; struct comedi_async *async; unsigned int n, m; ssize_t count = 0; int retval = 0; DECLARE_WAITQUEUE(wait, current); struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; unsigned int old_detach_count; bool become_nonbusy = false; bool attach_locked; /* Protect against device detachment during operation. */ down_read(&dev->attach_lock); attach_locked = true; old_detach_count = dev->detach_count; if (!dev->attached) { dev_dbg(dev->class_dev, "no driver attached\n"); retval = -ENODEV; goto out; } s = comedi_file_read_subdevice(file); if (!s || !s->async) { retval = -EIO; goto out; } async = s->async; if (s->busy != file || (async->cmd.flags & CMDF_WRITE)) { retval = -EINVAL; goto out; } add_wait_queue(&async->wait_head, &wait); while (count == 0 && !retval) { unsigned int rp, n1, n2; set_current_state(TASK_INTERRUPTIBLE); m = comedi_buf_read_n_available(s); n = min_t(size_t, m, nbytes); if (n == 0) { unsigned int runflags = comedi_get_subdevice_runflags(s); if (!comedi_is_runflags_running(runflags)) { if (comedi_is_runflags_in_error(runflags)) retval = -EPIPE; if (retval || nbytes) become_nonbusy = true; break; } if (nbytes == 0) break; if (file->f_flags & O_NONBLOCK) { retval = -EAGAIN; break; } schedule(); if (signal_pending(current)) { retval = -ERESTARTSYS; break; } if (s->busy != file || (async->cmd.flags & CMDF_WRITE)) { retval = -EINVAL; break; } continue; } set_current_state(TASK_RUNNING); rp = async->buf_read_ptr; n1 = min(n, async->prealloc_bufsz - rp); n2 = n - n1; m = copy_to_user(buf, async->prealloc_buf + rp, n1); if (m) m += n2; else if (n2) m = copy_to_user(buf + n1, async->prealloc_buf, n2); if (m) { n -= m; retval = -EFAULT; } comedi_buf_read_alloc(s, n); comedi_buf_read_free(s, n); count += n; nbytes -= n; buf += n; } remove_wait_queue(&async->wait_head, &wait); set_current_state(TASK_RUNNING); if (become_nonbusy && count == 0) { struct comedi_subdevice *new_s; /* * To avoid deadlock, cannot acquire dev->mutex * while dev->attach_lock is held. */ up_read(&dev->attach_lock); attach_locked = false; mutex_lock(&dev->mutex); /* * Check device hasn't become detached behind our back. * Checking dev->detach_count is unchanged ought to be * sufficient (unless there have been 2**32 detaches in the * meantime!), but check the subdevice pointer as well just in * case. * * Also check the subdevice is still in a suitable state to * become non-busy in case it changed behind our back. */ new_s = comedi_file_read_subdevice(file); if (dev->attached && old_detach_count == dev->detach_count && s == new_s && new_s->async == async && s->busy == file && !(async->cmd.flags & CMDF_WRITE) && !comedi_is_subdevice_running(s) && comedi_buf_read_n_available(s) == 0) do_become_nonbusy(dev, s); mutex_unlock(&dev->mutex); } out: if (attach_locked) up_read(&dev->attach_lock); return count ? count : retval; } static int comedi_open(struct inode *inode, struct file *file) { const unsigned int minor = iminor(inode); struct comedi_file *cfp; struct comedi_device *dev = comedi_dev_get_from_minor(minor); int rc; if (!dev) { pr_debug("invalid minor number\n"); return -ENODEV; } cfp = kzalloc(sizeof(*cfp), GFP_KERNEL); if (!cfp) { comedi_dev_put(dev); return -ENOMEM; } cfp->dev = dev; mutex_lock(&dev->mutex); if (!dev->attached && !capable(CAP_SYS_ADMIN)) { dev_dbg(dev->class_dev, "not attached and not CAP_SYS_ADMIN\n"); rc = -ENODEV; goto out; } if (dev->attached && dev->use_count == 0) { if (!try_module_get(dev->driver->module)) { rc = -ENXIO; goto out; } if (dev->open) { rc = dev->open(dev); if (rc < 0) { module_put(dev->driver->module); goto out; } } } dev->use_count++; file->private_data = cfp; comedi_file_reset(file); rc = 0; out: mutex_unlock(&dev->mutex); if (rc) { comedi_dev_put(dev); kfree(cfp); } return rc; } static int comedi_fasync(int fd, struct file *file, int on) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; return fasync_helper(fd, file, on, &dev->async_queue); } static int comedi_close(struct inode *inode, struct file *file) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_subdevice *s = NULL; int i; mutex_lock(&dev->mutex); if (dev->subdevices) { for (i = 0; i < dev->n_subdevices; i++) { s = &dev->subdevices[i]; if (s->busy == file) do_cancel(dev, s); if (s->lock == file) s->lock = NULL; } } if (dev->attached && dev->use_count == 1) { if (dev->close) dev->close(dev); module_put(dev->driver->module); } dev->use_count--; mutex_unlock(&dev->mutex); comedi_dev_put(dev); kfree(cfp); return 0; } #ifdef CONFIG_COMPAT #define COMEDI32_CHANINFO _IOR(CIO, 3, struct comedi32_chaninfo_struct) #define COMEDI32_RANGEINFO _IOR(CIO, 8, struct comedi32_rangeinfo_struct) /* * N.B. COMEDI32_CMD and COMEDI_CMD ought to use _IOWR, not _IOR. * It's too late to change it now, but it only affects the command number. */ #define COMEDI32_CMD _IOR(CIO, 9, struct comedi32_cmd_struct) /* * N.B. COMEDI32_CMDTEST and COMEDI_CMDTEST ought to use _IOWR, not _IOR. * It's too late to change it now, but it only affects the command number. */ #define COMEDI32_CMDTEST _IOR(CIO, 10, struct comedi32_cmd_struct) #define COMEDI32_INSNLIST _IOR(CIO, 11, struct comedi32_insnlist_struct) #define COMEDI32_INSN _IOR(CIO, 12, struct comedi32_insn_struct) struct comedi32_chaninfo_struct { unsigned int subdev; compat_uptr_t maxdata_list; /* 32-bit 'unsigned int *' */ compat_uptr_t flaglist; /* 32-bit 'unsigned int *' */ compat_uptr_t rangelist; /* 32-bit 'unsigned int *' */ unsigned int unused[4]; }; struct comedi32_rangeinfo_struct { unsigned int range_type; compat_uptr_t range_ptr; /* 32-bit 'void *' */ }; struct comedi32_cmd_struct { unsigned int subdev; unsigned int flags; unsigned int start_src; unsigned int start_arg; unsigned int scan_begin_src; unsigned int scan_begin_arg; unsigned int convert_src; unsigned int convert_arg; unsigned int scan_end_src; unsigned int scan_end_arg; unsigned int stop_src; unsigned int stop_arg; compat_uptr_t chanlist; /* 32-bit 'unsigned int *' */ unsigned int chanlist_len; compat_uptr_t data; /* 32-bit 'short *' */ unsigned int data_len; }; struct comedi32_insn_struct { unsigned int insn; unsigned int n; compat_uptr_t data; /* 32-bit 'unsigned int *' */ unsigned int subdev; unsigned int chanspec; unsigned int unused[3]; }; struct comedi32_insnlist_struct { unsigned int n_insns; compat_uptr_t insns; /* 32-bit 'struct comedi_insn *' */ }; /* Handle 32-bit COMEDI_CHANINFO ioctl. */ static int compat_chaninfo(struct file *file, unsigned long arg) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi32_chaninfo_struct chaninfo32; struct comedi_chaninfo chaninfo; int err; if (copy_from_user(&chaninfo32, compat_ptr(arg), sizeof(chaninfo32))) return -EFAULT; memset(&chaninfo, 0, sizeof(chaninfo)); chaninfo.subdev = chaninfo32.subdev; chaninfo.maxdata_list = compat_ptr(chaninfo32.maxdata_list); chaninfo.flaglist = compat_ptr(chaninfo32.flaglist); chaninfo.rangelist = compat_ptr(chaninfo32.rangelist); mutex_lock(&dev->mutex); err = do_chaninfo_ioctl(dev, &chaninfo); mutex_unlock(&dev->mutex); return err; } /* Handle 32-bit COMEDI_RANGEINFO ioctl. */ static int compat_rangeinfo(struct file *file, unsigned long arg) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi32_rangeinfo_struct rangeinfo32; struct comedi_rangeinfo rangeinfo; int err; if (copy_from_user(&rangeinfo32, compat_ptr(arg), sizeof(rangeinfo32))) return -EFAULT; memset(&rangeinfo, 0, sizeof(rangeinfo)); rangeinfo.range_type = rangeinfo32.range_type; rangeinfo.range_ptr = compat_ptr(rangeinfo32.range_ptr); mutex_lock(&dev->mutex); err = do_rangeinfo_ioctl(dev, &rangeinfo); mutex_unlock(&dev->mutex); return err; } /* Copy 32-bit cmd structure to native cmd structure. */ static int get_compat_cmd(struct comedi_cmd *cmd, struct comedi32_cmd_struct __user *cmd32) { struct comedi32_cmd_struct v32; if (copy_from_user(&v32, cmd32, sizeof(v32))) return -EFAULT; cmd->subdev = v32.subdev; cmd->flags = v32.flags; cmd->start_src = v32.start_src; cmd->start_arg = v32.start_arg; cmd->scan_begin_src = v32.scan_begin_src; cmd->scan_begin_arg = v32.scan_begin_arg; cmd->convert_src = v32.convert_src; cmd->convert_arg = v32.convert_arg; cmd->scan_end_src = v32.scan_end_src; cmd->scan_end_arg = v32.scan_end_arg; cmd->stop_src = v32.stop_src; cmd->stop_arg = v32.stop_arg; cmd->chanlist = (unsigned int __force *)compat_ptr(v32.chanlist); cmd->chanlist_len = v32.chanlist_len; cmd->data = compat_ptr(v32.data); cmd->data_len = v32.data_len; return 0; } /* Copy native cmd structure to 32-bit cmd structure. */ static int put_compat_cmd(struct comedi32_cmd_struct __user *cmd32, struct comedi_cmd *cmd) { struct comedi32_cmd_struct v32; memset(&v32, 0, sizeof(v32)); v32.subdev = cmd->subdev; v32.flags = cmd->flags; v32.start_src = cmd->start_src; v32.start_arg = cmd->start_arg; v32.scan_begin_src = cmd->scan_begin_src; v32.scan_begin_arg = cmd->scan_begin_arg; v32.convert_src = cmd->convert_src; v32.convert_arg = cmd->convert_arg; v32.scan_end_src = cmd->scan_end_src; v32.scan_end_arg = cmd->scan_end_arg; v32.stop_src = cmd->stop_src; v32.stop_arg = cmd->stop_arg; /* Assume chanlist pointer is unchanged. */ v32.chanlist = ptr_to_compat((unsigned int __user *)cmd->chanlist); v32.chanlist_len = cmd->chanlist_len; v32.data = ptr_to_compat(cmd->data); v32.data_len = cmd->data_len; if (copy_to_user(cmd32, &v32, sizeof(v32))) return -EFAULT; return 0; } /* Handle 32-bit COMEDI_CMD ioctl. */ static int compat_cmd(struct file *file, unsigned long arg) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_cmd cmd; bool copy = false; int rc, err; rc = get_compat_cmd(&cmd, compat_ptr(arg)); if (rc) return rc; mutex_lock(&dev->mutex); rc = do_cmd_ioctl(dev, &cmd, ©, file); mutex_unlock(&dev->mutex); if (copy) { /* Special case: copy cmd back to user. */ err = put_compat_cmd(compat_ptr(arg), &cmd); if (err) rc = err; } return rc; } /* Handle 32-bit COMEDI_CMDTEST ioctl. */ static int compat_cmdtest(struct file *file, unsigned long arg) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi_cmd cmd; bool copy = false; int rc, err; rc = get_compat_cmd(&cmd, compat_ptr(arg)); if (rc) return rc; mutex_lock(&dev->mutex); rc = do_cmdtest_ioctl(dev, &cmd, ©, file); mutex_unlock(&dev->mutex); if (copy) { err = put_compat_cmd(compat_ptr(arg), &cmd); if (err) rc = err; } return rc; } /* Copy 32-bit insn structure to native insn structure. */ static int get_compat_insn(struct comedi_insn *insn, struct comedi32_insn_struct __user *insn32) { struct comedi32_insn_struct v32; /* Copy insn structure. Ignore the unused members. */ if (copy_from_user(&v32, insn32, sizeof(v32))) return -EFAULT; memset(insn, 0, sizeof(*insn)); insn->insn = v32.insn; insn->n = v32.n; insn->data = compat_ptr(v32.data); insn->subdev = v32.subdev; insn->chanspec = v32.chanspec; return 0; } /* Handle 32-bit COMEDI_INSNLIST ioctl. */ static int compat_insnlist(struct file *file, unsigned long arg) { struct comedi_file *cfp = file->private_data; struct comedi_device *dev = cfp->dev; struct comedi32_insnlist_struct insnlist32; struct comedi32_insn_struct __user *insn32; struct comedi_insn *insns; unsigned int n; in |