| 1 4 679 679 679 87 91 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | /* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ #ifndef _LINUX_RSEQ_H #define _LINUX_RSEQ_H #ifdef CONFIG_RSEQ #include <linux/preempt.h> #include <linux/sched.h> /* * Map the event mask on the user-space ABI enum rseq_cs_flags * for direct mask checks. */ enum rseq_event_mask_bits { RSEQ_EVENT_PREEMPT_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT, RSEQ_EVENT_SIGNAL_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT, RSEQ_EVENT_MIGRATE_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT, }; enum rseq_event_mask { RSEQ_EVENT_PREEMPT = (1U << RSEQ_EVENT_PREEMPT_BIT), RSEQ_EVENT_SIGNAL = (1U << RSEQ_EVENT_SIGNAL_BIT), RSEQ_EVENT_MIGRATE = (1U << RSEQ_EVENT_MIGRATE_BIT), }; static inline void rseq_set_notify_resume(struct task_struct *t) { if (t->rseq) set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); } void __rseq_handle_notify_resume(struct ksignal *sig, struct pt_regs *regs); static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) { if (current->rseq) __rseq_handle_notify_resume(ksig, regs); } static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { preempt_disable(); __set_bit(RSEQ_EVENT_SIGNAL_BIT, ¤t->rseq_event_mask); preempt_enable(); rseq_handle_notify_resume(ksig, regs); } /* rseq_preempt() requires preemption to be disabled. */ static inline void rseq_preempt(struct task_struct *t) { __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask); rseq_set_notify_resume(t); } /* rseq_migrate() requires preemption to be disabled. */ static inline void rseq_migrate(struct task_struct *t) { __set_bit(RSEQ_EVENT_MIGRATE_BIT, &t->rseq_event_mask); rseq_set_notify_resume(t); } /* * If parent process has a registered restartable sequences area, the * child inherits. Unregister rseq for a clone with CLONE_VM set. */ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { if (clone_flags & CLONE_VM) { t->rseq = NULL; t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; t->rseq_len = current->rseq_len; t->rseq_sig = current->rseq_sig; t->rseq_event_mask = current->rseq_event_mask; } } static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } #else static inline void rseq_set_notify_resume(struct task_struct *t) { } static inline void rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) { } static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { } static inline void rseq_preempt(struct task_struct *t) { } static inline void rseq_migrate(struct task_struct *t) { } static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { } static inline void rseq_execve(struct task_struct *t) { } #endif #ifdef CONFIG_DEBUG_RSEQ void rseq_syscall(struct pt_regs *regs); #else static inline void rseq_syscall(struct pt_regs *regs) { } #endif #endif /* _LINUX_RSEQ_H */ |
| 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | // SPDX-License-Identifier: GPL-2.0-only /* * Shared Memory Communications over RDMA (SMC-R) and RoCE * * Definitions for the IPPROTO_SMC (socket related) * * Copyright IBM Corp. 2016, 2018 * Copyright (c) 2024, Alibaba Inc. * * Author: D. Wythe <alibuda@linux.alibaba.com> */ #include <net/protocol.h> #include <net/sock.h> #include "smc_inet.h" #include "smc.h" static int smc_inet_init_sock(struct sock *sk); static struct proto smc_inet_prot = { .name = "INET_SMC", .owner = THIS_MODULE, .init = smc_inet_init_sock, .hash = smc_hash_sk, .unhash = smc_unhash_sk, .release_cb = smc_release_cb, .obj_size = sizeof(struct smc_sock), .h.smc_hash = &smc_v4_hashinfo, .slab_flags = SLAB_TYPESAFE_BY_RCU, }; static const struct proto_ops smc_inet_stream_ops = { .family = PF_INET, .owner = THIS_MODULE, .release = smc_release, .bind = smc_bind, .connect = smc_connect, .socketpair = sock_no_socketpair, .accept = smc_accept, .getname = smc_getname, .poll = smc_poll, .ioctl = smc_ioctl, .listen = smc_listen, .shutdown = smc_shutdown, .setsockopt = smc_setsockopt, .getsockopt = smc_getsockopt, .sendmsg = smc_sendmsg, .recvmsg = smc_recvmsg, .mmap = sock_no_mmap, .splice_read = smc_splice_read, }; static struct inet_protosw smc_inet_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_SMC, .prot = &smc_inet_prot, .ops = &smc_inet_stream_ops, .flags = INET_PROTOSW_ICSK, }; #if IS_ENABLED(CONFIG_IPV6) struct smc6_sock { struct smc_sock smc; struct ipv6_pinfo inet6; }; static struct proto smc_inet6_prot = { .name = "INET6_SMC", .owner = THIS_MODULE, .init = smc_inet_init_sock, .hash = smc_hash_sk, .unhash = smc_unhash_sk, .release_cb = smc_release_cb, .obj_size = sizeof(struct smc6_sock), .h.smc_hash = &smc_v6_hashinfo, .slab_flags = SLAB_TYPESAFE_BY_RCU, .ipv6_pinfo_offset = offsetof(struct smc6_sock, inet6), }; static const struct proto_ops smc_inet6_stream_ops = { .family = PF_INET6, .owner = THIS_MODULE, .release = smc_release, .bind = smc_bind, .connect = smc_connect, .socketpair = sock_no_socketpair, .accept = smc_accept, .getname = smc_getname, .poll = smc_poll, .ioctl = smc_ioctl, .listen = smc_listen, .shutdown = smc_shutdown, .setsockopt = smc_setsockopt, .getsockopt = smc_getsockopt, .sendmsg = smc_sendmsg, .recvmsg = smc_recvmsg, .mmap = sock_no_mmap, .splice_read = smc_splice_read, }; static struct inet_protosw smc_inet6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_SMC, .prot = &smc_inet6_prot, .ops = &smc_inet6_stream_ops, .flags = INET_PROTOSW_ICSK, }; #endif /* CONFIG_IPV6 */ static int smc_inet_init_sock(struct sock *sk) { struct net *net = sock_net(sk); /* init common smc sock */ smc_sk_init(net, sk, IPPROTO_SMC); /* create clcsock */ return smc_create_clcsk(net, sk, sk->sk_family); } int __init smc_inet_init(void) { int rc; rc = proto_register(&smc_inet_prot, 1); if (rc) { pr_err("%s: proto_register smc_inet_prot fails with %d\n", __func__, rc); return rc; } /* no return value */ inet_register_protosw(&smc_inet_protosw); #if IS_ENABLED(CONFIG_IPV6) rc = proto_register(&smc_inet6_prot, 1); if (rc) { pr_err("%s: proto_register smc_inet6_prot fails with %d\n", __func__, rc); goto out_inet6_prot; } rc = inet6_register_protosw(&smc_inet6_protosw); if (rc) { pr_err("%s: inet6_register_protosw smc_inet6_protosw fails with %d\n", __func__, rc); goto out_inet6_protosw; } return rc; out_inet6_protosw: proto_unregister(&smc_inet6_prot); out_inet6_prot: inet_unregister_protosw(&smc_inet_protosw); proto_unregister(&smc_inet_prot); #endif /* CONFIG_IPV6 */ return rc; } void smc_inet_exit(void) { #if IS_ENABLED(CONFIG_IPV6) inet6_unregister_protosw(&smc_inet6_protosw); proto_unregister(&smc_inet6_prot); #endif /* CONFIG_IPV6 */ inet_unregister_protosw(&smc_inet_protosw); proto_unregister(&smc_inet_prot); } |
| 36 2 31 7 2 24 24 5 23 21 1 1 41 16 49 50 50 36 7 7 5 16 2 1 2 14 2 1 1 1 13 13 13 15 34 2 32 13 3 19 7 13 4 14 14 13 13 13 13 13 12 13 13 23 23 19 19 19 19 19 5 3 3 5 1 4 27 26 26 27 16 20 26 104 74 24 29 26 1 1 1 2 1 2 2 1 16 1 14 1 15 16 16 16 12 16 10 10 9 36 31 5 33 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Userspace interface * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> */ #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/netpoll.h> #include <linux/ethtool.h> #include <linux/if_arp.h> #include <linux/module.h> #include <linux/init.h> #include <linux/rtnetlink.h> #include <linux/if_ether.h> #include <linux/slab.h> #include <net/dsa.h> #include <net/sock.h> #include <linux/if_vlan.h> #include <net/switchdev.h> #include <net/net_namespace.h> #include "br_private.h" /* * Determine initial path cost based on speed. * using recommendations from 802.1d standard * * Since driver might sleep need to not be holding any locks. */ static int port_cost(struct net_device *dev) { struct ethtool_link_ksettings ecmd; if (!__ethtool_get_link_ksettings(dev, &ecmd)) { switch (ecmd.base.speed) { case SPEED_10000: return 2; case SPEED_5000: return 3; case SPEED_2500: return 4; case SPEED_1000: return 5; case SPEED_100: return 19; case SPEED_10: return 100; case SPEED_UNKNOWN: return 100; default: if (ecmd.base.speed > SPEED_10000) return 1; } } /* Old silly heuristics based on name */ if (!strncmp(dev->name, "lec", 3)) return 7; if (!strncmp(dev->name, "plip", 4)) return 2500; return 100; /* assume old 10Mbps */ } /* Check for port carrier transitions. */ void br_port_carrier_check(struct net_bridge_port *p, bool *notified) { struct net_device *dev = p->dev; struct net_bridge *br = p->br; if (!(p->flags & BR_ADMIN_COST) && netif_running(dev) && netif_oper_up(dev)) p->path_cost = port_cost(dev); *notified = false; if (!netif_running(br->dev)) return; spin_lock_bh(&br->lock); if (netif_running(dev) && netif_oper_up(dev)) { if (p->state == BR_STATE_DISABLED) { br_stp_enable_port(p); *notified = true; } } else { if (p->state != BR_STATE_DISABLED) { br_stp_disable_port(p); *notified = true; } } spin_unlock_bh(&br->lock); } static void br_port_set_promisc(struct net_bridge_port *p) { int err = 0; if (br_promisc_port(p)) return; err = dev_set_promiscuity(p->dev, 1); if (err) return; br_fdb_unsync_static(p->br, p); p->flags |= BR_PROMISC; } static void br_port_clear_promisc(struct net_bridge_port *p) { int err; /* Check if the port is already non-promisc or if it doesn't * support UNICAST filtering. Without unicast filtering support * we'll end up re-enabling promisc mode anyway, so just check for * it here. */ if (!br_promisc_port(p) || !(p->dev->priv_flags & IFF_UNICAST_FLT)) return; /* Since we'll be clearing the promisc mode, program the port * first so that we don't have interruption in traffic. */ err = br_fdb_sync_static(p->br, p); if (err) return; dev_set_promiscuity(p->dev, -1); p->flags &= ~BR_PROMISC; } /* When a port is added or removed or when certain port flags * change, this function is called to automatically manage * promiscuity setting of all the bridge ports. We are always called * under RTNL so can skip using rcu primitives. */ void br_manage_promisc(struct net_bridge *br) { struct net_bridge_port *p; bool set_all = false; /* If vlan filtering is disabled or bridge interface is placed * into promiscuous mode, place all ports in promiscuous mode. */ if ((br->dev->flags & IFF_PROMISC) || !br_vlan_enabled(br->dev)) set_all = true; list_for_each_entry(p, &br->port_list, list) { if (set_all) { br_port_set_promisc(p); } else { /* If the number of auto-ports is <= 1, then all other * ports will have their output configuration * statically specified through fdbs. Since ingress * on the auto-port becomes forwarding/egress to other * ports and egress configuration is statically known, * we can say that ingress configuration of the * auto-port is also statically known. * This lets us disable promiscuous mode and write * this config to hw. */ if ((p->dev->priv_flags & IFF_UNICAST_FLT) && (br->auto_cnt == 0 || (br->auto_cnt == 1 && br_auto_port(p)))) br_port_clear_promisc(p); else br_port_set_promisc(p); } } } int nbp_backup_change(struct net_bridge_port *p, struct net_device *backup_dev) { struct net_bridge_port *old_backup = rtnl_dereference(p->backup_port); struct net_bridge_port *backup_p = NULL; ASSERT_RTNL(); if (backup_dev) { if (!netif_is_bridge_port(backup_dev)) return -ENOENT; backup_p = br_port_get_rtnl(backup_dev); if (backup_p->br != p->br) return -EINVAL; } if (p == backup_p) return -EINVAL; if (old_backup == backup_p) return 0; /* if the backup link is already set, clear it */ if (old_backup) old_backup->backup_redirected_cnt--; if (backup_p) backup_p->backup_redirected_cnt++; rcu_assign_pointer(p->backup_port, backup_p); return 0; } static void nbp_backup_clear(struct net_bridge_port *p) { nbp_backup_change(p, NULL); if (p->backup_redirected_cnt) { struct net_bridge_port *cur_p; list_for_each_entry(cur_p, &p->br->port_list, list) { struct net_bridge_port *backup_p; backup_p = rtnl_dereference(cur_p->backup_port); if (backup_p == p) nbp_backup_change(cur_p, NULL); } } WARN_ON(rcu_access_pointer(p->backup_port) || p->backup_redirected_cnt); } static void nbp_update_port_count(struct net_bridge *br) { struct net_bridge_port *p; u32 cnt = 0; list_for_each_entry(p, &br->port_list, list) { if (br_auto_port(p)) cnt++; } if (br->auto_cnt != cnt) { br->auto_cnt = cnt; br_manage_promisc(br); } } static void nbp_delete_promisc(struct net_bridge_port *p) { /* If port is currently promiscuous, unset promiscuity. * Otherwise, it is a static port so remove all addresses * from it. */ dev_set_allmulti(p->dev, -1); if (br_promisc_port(p)) dev_set_promiscuity(p->dev, -1); else br_fdb_unsync_static(p->br, p); } static void release_nbp(struct kobject *kobj) { struct net_bridge_port *p = container_of(kobj, struct net_bridge_port, kobj); kfree(p); } static void brport_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t *gid) { struct net_bridge_port *p = kobj_to_brport(kobj); net_ns_get_ownership(dev_net(p->dev), uid, gid); } static const struct kobj_type brport_ktype = { #ifdef CONFIG_SYSFS .sysfs_ops = &brport_sysfs_ops, #endif .release = release_nbp, .get_ownership = brport_get_ownership, }; static void destroy_nbp(struct net_bridge_port *p) { struct net_device *dev = p->dev; p->br = NULL; p->dev = NULL; netdev_put(dev, &p->dev_tracker); kobject_put(&p->kobj); } static void destroy_nbp_rcu(struct rcu_head *head) { struct net_bridge_port *p = container_of(head, struct net_bridge_port, rcu); destroy_nbp(p); } static unsigned get_max_headroom(struct net_bridge *br) { unsigned max_headroom = 0; struct net_bridge_port *p; list_for_each_entry(p, &br->port_list, list) { unsigned dev_headroom = netdev_get_fwd_headroom(p->dev); if (dev_headroom > max_headroom) max_headroom = dev_headroom; } return max_headroom; } static void update_headroom(struct net_bridge *br, int new_hr) { struct net_bridge_port *p; list_for_each_entry(p, &br->port_list, list) netdev_set_rx_headroom(p->dev, new_hr); br->dev->needed_headroom = new_hr; } /* Delete port(interface) from bridge is done in two steps. * via RCU. First step, marks device as down. That deletes * all the timers and stops new packets from flowing through. * * Final cleanup doesn't occur until after all CPU's finished * processing packets. * * Protected from multiple admin operations by RTNL mutex */ static void del_nbp(struct net_bridge_port *p) { struct net_bridge *br = p->br; struct net_device *dev = p->dev; sysfs_remove_link(br->ifobj, p->dev->name); nbp_delete_promisc(p); spin_lock_bh(&br->lock); br_stp_disable_port(p); spin_unlock_bh(&br->lock); br_mrp_port_del(br, p); br_cfm_port_del(br, p); br_ifinfo_notify(RTM_DELLINK, NULL, p); list_del_rcu(&p->list); if (netdev_get_fwd_headroom(dev) == br->dev->needed_headroom) update_headroom(br, get_max_headroom(br)); netdev_reset_rx_headroom(dev); nbp_vlan_flush(p); br_fdb_delete_by_port(br, p, 0, 1); switchdev_deferred_process(); nbp_backup_clear(p); nbp_update_port_count(br); netdev_upper_dev_unlink(dev, br->dev); dev->priv_flags &= ~IFF_BRIDGE_PORT; netdev_rx_handler_unregister(dev); br_multicast_del_port(p); kobject_uevent(&p->kobj, KOBJ_REMOVE); kobject_del(&p->kobj); br_netpoll_disable(p); call_rcu(&p->rcu, destroy_nbp_rcu); } /* Delete bridge device */ void br_dev_delete(struct net_device *dev, struct list_head *head) { struct net_bridge *br = netdev_priv(dev); struct net_bridge_port *p, *n; list_for_each_entry_safe(p, n, &br->port_list, list) { del_nbp(p); } br_recalculate_neigh_suppress_enabled(br); br_fdb_delete_by_port(br, NULL, 0, 1); cancel_delayed_work_sync(&br->gc_work); br_sysfs_delbr(br->dev); unregister_netdevice_queue(br->dev, head); } /* find an available port number */ static int find_portno(struct net_bridge *br) { int index; struct net_bridge_port *p; unsigned long *inuse; inuse = bitmap_zalloc(BR_MAX_PORTS, GFP_KERNEL); if (!inuse) return -ENOMEM; __set_bit(0, inuse); /* zero is reserved */ list_for_each_entry(p, &br->port_list, list) __set_bit(p->port_no, inuse); index = find_first_zero_bit(inuse, BR_MAX_PORTS); bitmap_free(inuse); return (index >= BR_MAX_PORTS) ? -EXFULL : index; } /* called with RTNL but without bridge lock */ static struct net_bridge_port *new_nbp(struct net_bridge *br, struct net_device *dev) { struct net_bridge_port *p; int index, err; index = find_portno(br); if (index < 0) return ERR_PTR(index); p = kzalloc(sizeof(*p), GFP_KERNEL); if (p == NULL) return ERR_PTR(-ENOMEM); p->br = br; netdev_hold(dev, &p->dev_tracker, GFP_KERNEL); p->dev = dev; p->path_cost = port_cost(dev); p->priority = 0x8000 >> BR_PORT_BITS; p->port_no = index; p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; br_init_port(p); br_set_state(p, BR_STATE_DISABLED); br_stp_port_timer_init(p); err = br_multicast_add_port(p); if (err) { netdev_put(dev, &p->dev_tracker); kfree(p); p = ERR_PTR(err); } return p; } int br_add_bridge(struct net *net, const char *name) { struct net_device *dev; int res; dev = alloc_netdev(sizeof(struct net_bridge), name, NET_NAME_UNKNOWN, br_dev_setup); if (!dev) return -ENOMEM; dev_net_set(dev, net); dev->rtnl_link_ops = &br_link_ops; res = register_netdevice(dev); if (res) free_netdev(dev); return res; } int br_del_bridge(struct net *net, const char *name) { struct net_device *dev; int ret = 0; dev = __dev_get_by_name(net, name); if (dev == NULL) ret = -ENXIO; /* Could not find device */ else if (!netif_is_bridge_master(dev)) { /* Attempt to delete non bridge device! */ ret = -EPERM; } else if (dev->flags & IFF_UP) { /* Not shutdown yet. */ ret = -EBUSY; } else br_dev_delete(dev, NULL); return ret; } /* MTU of the bridge pseudo-device: ETH_DATA_LEN or the minimum of the ports */ static int br_mtu_min(const struct net_bridge *br) { const struct net_bridge_port *p; int ret_mtu = 0; list_for_each_entry(p, &br->port_list, list) if (!ret_mtu || ret_mtu > p->dev->mtu) ret_mtu = p->dev->mtu; return ret_mtu ? ret_mtu : ETH_DATA_LEN; } void br_mtu_auto_adjust(struct net_bridge *br) { ASSERT_RTNL(); /* if the bridge MTU was manually configured don't mess with it */ if (br_opt_get(br, BROPT_MTU_SET_BY_USER)) return; /* change to the minimum MTU and clear the flag which was set by * the bridge ndo_change_mtu callback */ dev_set_mtu(br->dev, br_mtu_min(br)); br_opt_toggle(br, BROPT_MTU_SET_BY_USER, false); } static void br_set_gso_limits(struct net_bridge *br) { unsigned int tso_max_size = TSO_MAX_SIZE; const struct net_bridge_port *p; u16 tso_max_segs = TSO_MAX_SEGS; list_for_each_entry(p, &br->port_list, list) { tso_max_size = min(tso_max_size, p->dev->tso_max_size); tso_max_segs = min(tso_max_segs, p->dev->tso_max_segs); } netif_set_tso_max_size(br->dev, tso_max_size); netif_set_tso_max_segs(br->dev, tso_max_segs); } /* * Recomputes features using slave's features */ netdev_features_t br_features_recompute(struct net_bridge *br, netdev_features_t features) { struct net_bridge_port *p; netdev_features_t mask; if (list_empty(&br->port_list)) return features; mask = features; features &= ~NETIF_F_ONE_FOR_ALL; list_for_each_entry(p, &br->port_list, list) { features = netdev_increment_features(features, p->dev->features, mask); } features = netdev_add_tso_features(features, mask); return features; } /* called with RTNL */ int br_add_if(struct net_bridge *br, struct net_device *dev, struct netlink_ext_ack *extack) { struct net_bridge_port *p; int err = 0; unsigned br_hr, dev_hr; bool changed_addr, fdb_synced = false; /* Don't allow bridging non-ethernet like devices. */ if ((dev->flags & IFF_LOOPBACK) || dev->type != ARPHRD_ETHER || dev->addr_len != ETH_ALEN || !is_valid_ether_addr(dev->dev_addr)) return -EINVAL; /* No bridging of bridges */ if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) { NL_SET_ERR_MSG(extack, "Can not enslave a bridge to a bridge"); return -ELOOP; } /* Device has master upper dev */ if (netdev_master_upper_dev_get(dev)) return -EBUSY; /* No bridging devices that dislike that (e.g. wireless) */ if (dev->priv_flags & IFF_DONT_BRIDGE) { NL_SET_ERR_MSG(extack, "Device does not allow enslaving to a bridge"); return -EOPNOTSUPP; } p = new_nbp(br, dev); if (IS_ERR(p)) return PTR_ERR(p); call_netdevice_notifiers(NETDEV_JOIN, dev); err = dev_set_allmulti(dev, 1); if (err) { br_multicast_del_port(p); netdev_put(dev, &p->dev_tracker); kfree(p); /* kobject not yet init'd, manually free */ goto err1; } err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj), SYSFS_BRIDGE_PORT_ATTR); if (err) goto err2; err = br_sysfs_addif(p); if (err) goto err2; err = br_netpoll_enable(p); if (err) goto err3; err = netdev_rx_handler_register(dev, br_get_rx_handler(dev), p); if (err) goto err4; dev->priv_flags |= IFF_BRIDGE_PORT; err = netdev_master_upper_dev_link(dev, br->dev, NULL, NULL, extack); if (err) goto err5; dev_disable_lro(dev); list_add_rcu(&p->list, &br->port_list); nbp_update_port_count(br); if (!br_promisc_port(p) && (p->dev->priv_flags & IFF_UNICAST_FLT)) { /* When updating the port count we also update all ports' * promiscuous mode. * A port leaving promiscuous mode normally gets the bridge's * fdb synced to the unicast filter (if supported), however, * `br_port_clear_promisc` does not distinguish between * non-promiscuous ports and *new* ports, so we need to * sync explicitly here. */ fdb_synced = br_fdb_sync_static(br, p) == 0; if (!fdb_synced) netdev_err(dev, "failed to sync bridge static fdb addresses to this port\n"); } netdev_update_features(br->dev); br_hr = br->dev->needed_headroom; dev_hr = netdev_get_fwd_headroom(dev); if (br_hr < dev_hr) update_headroom(br, dev_hr); else netdev_set_rx_headroom(dev, br_hr); if (br_fdb_add_local(br, p, dev->dev_addr, 0)) netdev_err(dev, "failed insert local address bridge forwarding table\n"); if (br->dev->addr_assign_type != NET_ADDR_SET) { /* Ask for permission to use this MAC address now, even if we * don't end up choosing it below. */ err = dev_pre_changeaddr_notify(br->dev, dev->dev_addr, extack); if (err) goto err6; } err = nbp_vlan_init(p, extack); if (err) { netdev_err(dev, "failed to initialize vlan filtering on this port\n"); goto err6; } spin_lock_bh(&br->lock); changed_addr = br_stp_recalculate_bridge_id(br); if (netif_running(dev) && netif_oper_up(dev) && (br->dev->flags & IFF_UP)) br_stp_enable_port(p); spin_unlock_bh(&br->lock); br_ifinfo_notify(RTM_NEWLINK, NULL, p); if (changed_addr) call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev); br_mtu_auto_adjust(br); br_set_gso_limits(br); kobject_uevent(&p->kobj, KOBJ_ADD); return 0; err6: if (fdb_synced) br_fdb_unsync_static(br, p); list_del_rcu(&p->list); br_fdb_delete_by_port(br, p, 0, 1); nbp_update_port_count(br); netdev_upper_dev_unlink(dev, br->dev); err5: dev->priv_flags &= ~IFF_BRIDGE_PORT; netdev_rx_handler_unregister(dev); err4: br_netpoll_disable(p); err3: sysfs_remove_link(br->ifobj, p->dev->name); err2: br_multicast_del_port(p); netdev_put(dev, &p->dev_tracker); kobject_put(&p->kobj); dev_set_allmulti(dev, -1); err1: return err; } /* called with RTNL */ int br_del_if(struct net_bridge *br, struct net_device *dev) { struct net_bridge_port *p; bool changed_addr; p = br_port_get_rtnl(dev); if (!p || p->br != br) return -EINVAL; /* Since more than one interface can be attached to a bridge, * there still maybe an alternate path for netconsole to use; * therefore there is no reason for a NETDEV_RELEASE event. */ del_nbp(p); br_mtu_auto_adjust(br); br_set_gso_limits(br); spin_lock_bh(&br->lock); changed_addr = br_stp_recalculate_bridge_id(br); spin_unlock_bh(&br->lock); if (changed_addr) call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev); netdev_update_features(br->dev); return 0; } void br_port_flags_change(struct net_bridge_port *p, unsigned long mask) { struct net_bridge *br = p->br; if (mask & BR_AUTO_MASK) nbp_update_port_count(br); if (mask & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) br_recalculate_neigh_suppress_enabled(br); } bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag) { struct net_bridge_port *p; p = br_port_get_rtnl_rcu(dev); if (!p) return false; return p->flags & flag; } EXPORT_SYMBOL_GPL(br_port_flag_is_set); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 | /* * Header file for reservations for dma-buf and ttm * * Copyright(C) 2011 Linaro Limited. All rights reserved. * Copyright (C) 2012-2013 Canonical Ltd * Copyright (C) 2012 Texas Instruments * * Authors: * Rob Clark <robdclark@gmail.com> * Maarten Lankhorst <maarten.lankhorst@canonical.com> * Thomas Hellstrom <thellstrom-at-vmware-dot-com> * * Based on bo.c which bears the following copyright notice, * but is dual licensed: * * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _LINUX_RESERVATION_H #define _LINUX_RESERVATION_H #include <linux/ww_mutex.h> #include <linux/dma-fence.h> #include <linux/slab.h> #include <linux/seqlock.h> #include <linux/rcupdate.h> extern struct ww_class reservation_ww_class; struct dma_resv_list; /** * enum dma_resv_usage - how the fences from a dma_resv obj are used * * This enum describes the different use cases for a dma_resv object and * controls which fences are returned when queried. * * An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP and * when the dma_resv object is asked for fences for one use case the fences * for the lower use case are returned as well. * * For example when asking for WRITE fences then the KERNEL fences are returned * as well. Similar when asked for READ fences then both WRITE and KERNEL * fences are returned as well. * * Already used fences can be promoted in the sense that a fence with * DMA_RESV_USAGE_BOOKKEEP could become DMA_RESV_USAGE_READ by adding it again * with this usage. But fences can never be degraded in the sense that a fence * with DMA_RESV_USAGE_WRITE could become DMA_RESV_USAGE_READ. */ enum dma_resv_usage { /** * @DMA_RESV_USAGE_KERNEL: For in kernel memory management only. * * This should only be used for things like copying or clearing memory * with a DMA hardware engine for the purpose of kernel memory * management. * * Drivers *always* must wait for those fences before accessing the * resource protected by the dma_resv object. The only exception for * that is when the resource is known to be locked down in place by * pinning it previously. */ DMA_RESV_USAGE_KERNEL, /** * @DMA_RESV_USAGE_WRITE: Implicit write synchronization. * * This should only be used for userspace command submissions which add * an implicit write dependency. */ DMA_RESV_USAGE_WRITE, /** * @DMA_RESV_USAGE_READ: Implicit read synchronization. * * This should only be used for userspace command submissions which add * an implicit read dependency. */ DMA_RESV_USAGE_READ, /** * @DMA_RESV_USAGE_BOOKKEEP: No implicit sync. * * This should be used by submissions which don't want to participate in * any implicit synchronization. * * The most common case are preemption fences, page table updates, TLB * flushes as well as explicit synced user submissions. * * Explicit synced user user submissions can be promoted to * DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE as needed using * dma_buf_import_sync_file() when implicit synchronization should * become necessary after initial adding of the fence. */ DMA_RESV_USAGE_BOOKKEEP }; /** * dma_resv_usage_rw - helper for implicit sync * @write: true if we create a new implicit sync write * * This returns the implicit synchronization usage for write or read accesses, * see enum dma_resv_usage and &dma_buf.resv. */ static inline enum dma_resv_usage dma_resv_usage_rw(bool write) { /* This looks confusing at first sight, but is indeed correct. * * The rational is that new write operations needs to wait for the * existing read and write operations to finish. * But a new read operation only needs to wait for the existing write * operations to finish. */ return write ? DMA_RESV_USAGE_READ : DMA_RESV_USAGE_WRITE; } /** * struct dma_resv - a reservation object manages fences for a buffer * * This is a container for dma_fence objects which needs to handle multiple use * cases. * * One use is to synchronize cross-driver access to a struct dma_buf, either for * dynamic buffer management or just to handle implicit synchronization between * different users of the buffer in userspace. See &dma_buf.resv for a more * in-depth discussion. * * The other major use is to manage access and locking within a driver in a * buffer based memory manager. struct ttm_buffer_object is the canonical * example here, since this is where reservation objects originated from. But * use in drivers is spreading and some drivers also manage struct * drm_gem_object with the same scheme. */ struct dma_resv { /** * @lock: * * Update side lock. Don't use directly, instead use the wrapper * functions like dma_resv_lock() and dma_resv_unlock(). * * Drivers which use the reservation object to manage memory dynamically * also use this lock to protect buffer object state like placement, * allocation policies or throughout command submission. */ struct ww_mutex lock; /** * @fences: * * Array of fences which where added to the dma_resv object * * A new fence is added by calling dma_resv_add_fence(). Since this * often needs to be done past the point of no return in command * submission it cannot fail, and therefore sufficient slots need to be * reserved by calling dma_resv_reserve_fences(). */ struct dma_resv_list __rcu *fences; }; /** * struct dma_resv_iter - current position into the dma_resv fences * * Don't touch this directly in the driver, use the accessor function instead. * * IMPORTANT * * When using the lockless iterators like dma_resv_iter_next_unlocked() or * dma_resv_for_each_fence_unlocked() beware that the iterator can be restarted. * Code which accumulates statistics or similar needs to check for this with * dma_resv_iter_is_restarted(). */ struct dma_resv_iter { /** @obj: The dma_resv object we iterate over */ struct dma_resv *obj; /** @usage: Return fences with this usage or lower. */ enum dma_resv_usage usage; /** @fence: the currently handled fence */ struct dma_fence *fence; /** @fence_usage: the usage of the current fence */ enum dma_resv_usage fence_usage; /** @index: index into the shared fences */ unsigned int index; /** @fences: the shared fences; private, *MUST* not dereference */ struct dma_resv_list *fences; /** @num_fences: number of fences */ unsigned int num_fences; /** @is_restarted: true if this is the first returned fence */ bool is_restarted; }; struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor); struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor); struct dma_fence *dma_resv_iter_first(struct dma_resv_iter *cursor); struct dma_fence *dma_resv_iter_next(struct dma_resv_iter *cursor); /** * dma_resv_iter_begin - initialize a dma_resv_iter object * @cursor: The dma_resv_iter object to initialize * @obj: The dma_resv object which we want to iterate over * @usage: controls which fences to include, see enum dma_resv_usage. */ static inline void dma_resv_iter_begin(struct dma_resv_iter *cursor, struct dma_resv *obj, enum dma_resv_usage usage) { cursor->obj = obj; cursor->usage = usage; cursor->fence = NULL; } /** * dma_resv_iter_end - cleanup a dma_resv_iter object * @cursor: the dma_resv_iter object which should be cleaned up * * Make sure that the reference to the fence in the cursor is properly * dropped. */ static inline void dma_resv_iter_end(struct dma_resv_iter *cursor) { dma_fence_put(cursor->fence); } /** * dma_resv_iter_usage - Return the usage of the current fence * @cursor: the cursor of the current position * * Returns the usage of the currently processed fence. */ static inline enum dma_resv_usage dma_resv_iter_usage(struct dma_resv_iter *cursor) { return cursor->fence_usage; } /** * dma_resv_iter_is_restarted - test if this is the first fence after a restart * @cursor: the cursor with the current position * * Return true if this is the first fence in an iteration after a restart. */ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) { return cursor->is_restarted; } /** * dma_resv_for_each_fence_unlocked - unlocked fence iterator * @cursor: a struct dma_resv_iter pointer * @fence: the current fence * * Iterate over the fences in a struct dma_resv object without holding the * &dma_resv.lock and using RCU instead. The cursor needs to be initialized * with dma_resv_iter_begin() and cleaned up with dma_resv_iter_end(). Inside * the iterator a reference to the dma_fence is held and the RCU lock dropped. * * Beware that the iterator can be restarted when the struct dma_resv for * @cursor is modified. Code which accumulates statistics or similar needs to * check for this with dma_resv_iter_is_restarted(). For this reason prefer the * lock iterator dma_resv_for_each_fence() whenever possible. */ #define dma_resv_for_each_fence_unlocked(cursor, fence) \ for (fence = dma_resv_iter_first_unlocked(cursor); \ fence; fence = dma_resv_iter_next_unlocked(cursor)) /** * dma_resv_for_each_fence - fence iterator * @cursor: a struct dma_resv_iter pointer * @obj: a dma_resv object pointer * @usage: controls which fences to return * @fence: the current fence * * Iterate over the fences in a struct dma_resv object while holding the * &dma_resv.lock. @all_fences controls if the shared fences are returned as * well. The cursor initialisation is part of the iterator and the fence stays * valid as long as the lock is held and so no extra reference to the fence is * taken. */ #define dma_resv_for_each_fence(cursor, obj, usage, fence) \ for (dma_resv_iter_begin(cursor, obj, usage), \ fence = dma_resv_iter_first(cursor); fence; \ fence = dma_resv_iter_next(cursor)) #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base) #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base) #ifdef CONFIG_DEBUG_MUTEXES void dma_resv_reset_max_fences(struct dma_resv *obj); #else static inline void dma_resv_reset_max_fences(struct dma_resv *obj) {} #endif /** * dma_resv_lock - lock the reservation object * @obj: the reservation object * @ctx: the locking context * * Locks the reservation object for exclusive access and modification. Note, * that the lock is only against other writers, readers will run concurrently * with a writer under RCU. The seqlock is used to notify readers if they * overlap with a writer. * * As the reservation object may be locked by multiple parties in an * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. * * When a die situation is indicated by returning -EDEADLK all locks held by * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj. * * Unlocked by calling dma_resv_unlock(). * * See also dma_resv_lock_interruptible() for the interruptible variant. */ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) { return ww_mutex_lock(&obj->lock, ctx); } /** * dma_resv_lock_interruptible - lock the reservation object * @obj: the reservation object * @ctx: the locking context * * Locks the reservation object interruptible for exclusive access and * modification. Note, that the lock is only against other writers, readers * will run concurrently with a writer under RCU. The seqlock is used to * notify readers if they overlap with a writer. * * As the reservation object may be locked by multiple parties in an * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. * * When a die situation is indicated by returning -EDEADLK all locks held by * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on * @obj. * * Unlocked by calling dma_resv_unlock(). */ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) { return ww_mutex_lock_interruptible(&obj->lock, ctx); } /** * dma_resv_lock_slow - slowpath lock the reservation object * @obj: the reservation object * @ctx: the locking context * * Acquires the reservation object after a die case. This function * will sleep until the lock becomes available. See dma_resv_lock() as * well. * * See also dma_resv_lock_slow_interruptible() for the interruptible variant. */ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) { ww_mutex_lock_slow(&obj->lock, ctx); } /** * dma_resv_lock_slow_interruptible - slowpath lock the reservation * object, interruptible * @obj: the reservation object * @ctx: the locking context * * Acquires the reservation object interruptible after a die case. This function * will sleep until the lock becomes available. See * dma_resv_lock_interruptible() as well. */ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) { return ww_mutex_lock_slow_interruptible(&obj->lock, ctx); } /** * dma_resv_trylock - trylock the reservation object * @obj: the reservation object * * Tries to lock the reservation object for exclusive access and modification. * Note, that the lock is only against other writers, readers will run * concurrently with a writer under RCU. The seqlock is used to notify readers * if they overlap with a writer. * * Also note that since no context is provided, no deadlock protection is * possible, which is also not needed for a trylock. * * Returns true if the lock was acquired, false otherwise. */ static inline bool __must_check dma_resv_trylock(struct dma_resv *obj) { return ww_mutex_trylock(&obj->lock, NULL); } /** * dma_resv_is_locked - is the reservation object locked * @obj: the reservation object * * Returns true if the mutex is locked, false if unlocked. */ static inline bool dma_resv_is_locked(struct dma_resv *obj) { return ww_mutex_is_locked(&obj->lock); } /** * dma_resv_locking_ctx - returns the context used to lock the object * @obj: the reservation object * * Returns the context used to lock a reservation object or NULL if no context * was used or the object is not locked at all. * * WARNING: This interface is pretty horrible, but TTM needs it because it * doesn't pass the struct ww_acquire_ctx around in some very long callchains. * Everyone else just uses it to check whether they're holding a reservation or * not. */ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { return READ_ONCE(obj->lock.ctx); } /** * dma_resv_unlock - unlock the reservation object * @obj: the reservation object * * Unlocks the reservation object following exclusive access. */ static inline void dma_resv_unlock(struct dma_resv *obj) { dma_resv_reset_max_fences(obj); ww_mutex_unlock(&obj->lock); } void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_fences(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, enum dma_resv_usage usage); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *fence, enum dma_resv_usage usage); int dma_resv_get_fences(struct dma_resv *obj, enum dma_resv_usage usage, unsigned int *num_fences, struct dma_fence ***fences); int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage, struct dma_fence **fence); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, bool intr, unsigned long timeout); void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage, ktime_t deadline); bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage); void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq); #endif /* _LINUX_RESERVATION_H */ |
| 1279 799 2 1274 52 1275 770 1278 649 1149 1149 1151 641 1 590 1 3 1145 1146 1147 1146 1147 1141 1427 185 1276 1324 184 1282 1148 1149 1146 1148 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2008 IBM Corporation * Author: Mimi Zohar <zohar@us.ibm.com> * * ima_policy.c * - initialize default measure policy rules */ #include <linux/init.h> #include <linux/list.h> #include <linux/kernel_read_file.h> #include <linux/fs.h> #include <linux/security.h> #include <linux/magic.h> #include <linux/parser.h> #include <linux/slab.h> #include <linux/rculist.h> #include <linux/seq_file.h> #include <linux/ima.h> #include "ima.h" /* flags definitions */ #define IMA_FUNC 0x0001 #define IMA_MASK 0x0002 #define IMA_FSMAGIC 0x0004 #define IMA_UID 0x0008 #define IMA_FOWNER 0x0010 #define IMA_FSUUID 0x0020 #define IMA_INMASK 0x0040 #define IMA_EUID 0x0080 #define IMA_PCR 0x0100 #define IMA_FSNAME 0x0200 #define IMA_KEYRINGS 0x0400 #define IMA_LABEL 0x0800 #define IMA_VALIDATE_ALGOS 0x1000 #define IMA_GID 0x2000 #define IMA_EGID 0x4000 #define IMA_FGROUP 0x8000 #define UNKNOWN 0 #define MEASURE 0x0001 /* same as IMA_MEASURE */ #define DONT_MEASURE 0x0002 #define APPRAISE 0x0004 /* same as IMA_APPRAISE */ #define DONT_APPRAISE 0x0008 #define AUDIT 0x0040 #define HASH 0x0100 #define DONT_HASH 0x0200 #define INVALID_PCR(a) (((a) < 0) || \ (a) >= (sizeof_field(struct ima_iint_cache, measured_pcrs) * 8)) int ima_policy_flag; static int temp_ima_appraise; static int build_ima_appraise __ro_after_init; atomic_t ima_setxattr_allowed_hash_algorithms; #define MAX_LSM_RULES 6 enum lsm_rule_types { LSM_OBJ_USER, LSM_OBJ_ROLE, LSM_OBJ_TYPE, LSM_SUBJ_USER, LSM_SUBJ_ROLE, LSM_SUBJ_TYPE }; enum policy_types { ORIGINAL_TCB = 1, DEFAULT_TCB }; enum policy_rule_list { IMA_DEFAULT_POLICY = 1, IMA_CUSTOM_POLICY }; struct ima_rule_opt_list { size_t count; char *items[] __counted_by(count); }; /* * These comparators are needed nowhere outside of ima so just define them here. * This pattern should hopefully never be needed outside of ima. */ static inline bool vfsuid_gt_kuid(vfsuid_t vfsuid, kuid_t kuid) { return __vfsuid_val(vfsuid) > __kuid_val(kuid); } static inline bool vfsgid_gt_kgid(vfsgid_t vfsgid, kgid_t kgid) { return __vfsgid_val(vfsgid) > __kgid_val(kgid); } static inline bool vfsuid_lt_kuid(vfsuid_t vfsuid, kuid_t kuid) { return __vfsuid_val(vfsuid) < __kuid_val(kuid); } static inline bool vfsgid_lt_kgid(vfsgid_t vfsgid, kgid_t kgid) { return __vfsgid_val(vfsgid) < __kgid_val(kgid); } struct ima_rule_entry { struct list_head list; int action; unsigned int flags; enum ima_hooks func; int mask; unsigned long fsmagic; uuid_t fsuuid; kuid_t uid; kgid_t gid; kuid_t fowner; kgid_t fgroup; bool (*uid_op)(kuid_t cred_uid, kuid_t rule_uid); /* Handlers for operators */ bool (*gid_op)(kgid_t cred_gid, kgid_t rule_gid); bool (*fowner_op)(vfsuid_t vfsuid, kuid_t rule_uid); /* vfsuid_eq_kuid(), vfsuid_gt_kuid(), vfsuid_lt_kuid() */ bool (*fgroup_op)(vfsgid_t vfsgid, kgid_t rule_gid); /* vfsgid_eq_kgid(), vfsgid_gt_kgid(), vfsgid_lt_kgid() */ int pcr; unsigned int allowed_algos; /* bitfield of allowed hash algorithms */ struct { void *rule; /* LSM file metadata specific */ char *args_p; /* audit value */ int type; /* audit type */ } lsm[MAX_LSM_RULES]; char *fsname; struct ima_rule_opt_list *keyrings; /* Measure keys added to these keyrings */ struct ima_rule_opt_list *label; /* Measure data grouped under this label */ struct ima_template_desc *template; }; /* * sanity check in case the kernels gains more hash algorithms that can * fit in an unsigned int */ static_assert( 8 * sizeof(unsigned int) >= HASH_ALGO__LAST, "The bitfield allowed_algos in ima_rule_entry is too small to contain all the supported hash algorithms, consider using a bigger type"); /* * Without LSM specific knowledge, the default policy can only be * written in terms of .action, .func, .mask, .fsmagic, .uid, .gid, * .fowner, and .fgroup */ /* * The minimum rule set to allow for full TCB coverage. Measures all files * opened or mmap for exec and everything read by root. Dangerous because * normal users can easily run the machine out of memory simply building * and running executables. */ static struct ima_rule_entry dont_measure_rules[] __ro_after_init = { {.action = DONT_MEASURE, .fsmagic = PROC_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = SYSFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = DEBUGFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = TMPFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = DEVPTS_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = BINFMTFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = SECURITYFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = SELINUX_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = SMACK_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = CGROUP_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = CGROUP2_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = NSFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_MEASURE, .fsmagic = EFIVARFS_MAGIC, .flags = IMA_FSMAGIC} }; static struct ima_rule_entry original_measurement_rules[] __ro_after_init = { {.action = MEASURE, .func = MMAP_CHECK, .mask = MAY_EXEC, .flags = IMA_FUNC | IMA_MASK}, {.action = MEASURE, .func = BPRM_CHECK, .mask = MAY_EXEC, .flags = IMA_FUNC | IMA_MASK}, {.action = MEASURE, .func = FILE_CHECK, .mask = MAY_READ, .uid = GLOBAL_ROOT_UID, .uid_op = &uid_eq, .flags = IMA_FUNC | IMA_MASK | IMA_UID}, {.action = MEASURE, .func = MODULE_CHECK, .flags = IMA_FUNC}, {.action = MEASURE, .func = FIRMWARE_CHECK, .flags = IMA_FUNC}, }; static struct ima_rule_entry default_measurement_rules[] __ro_after_init = { {.action = MEASURE, .func = MMAP_CHECK, .mask = MAY_EXEC, .flags = IMA_FUNC | IMA_MASK}, {.action = MEASURE, .func = BPRM_CHECK, .mask = MAY_EXEC, .flags = IMA_FUNC | IMA_MASK}, {.action = MEASURE, .func = FILE_CHECK, .mask = MAY_READ, .uid = GLOBAL_ROOT_UID, .uid_op = &uid_eq, .flags = IMA_FUNC | IMA_INMASK | IMA_EUID}, {.action = MEASURE, .func = FILE_CHECK, .mask = MAY_READ, .uid = GLOBAL_ROOT_UID, .uid_op = &uid_eq, .flags = IMA_FUNC | IMA_INMASK | IMA_UID}, {.action = MEASURE, .func = MODULE_CHECK, .flags = IMA_FUNC}, {.action = MEASURE, .func = FIRMWARE_CHECK, .flags = IMA_FUNC}, {.action = MEASURE, .func = POLICY_CHECK, .flags = IMA_FUNC}, }; static struct ima_rule_entry default_appraise_rules[] __ro_after_init = { {.action = DONT_APPRAISE, .fsmagic = PROC_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = SYSFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = DEBUGFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = TMPFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = RAMFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = DEVPTS_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = BINFMTFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = SECURITYFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = SELINUX_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = SMACK_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = NSFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = EFIVARFS_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = CGROUP_SUPER_MAGIC, .flags = IMA_FSMAGIC}, {.action = DONT_APPRAISE, .fsmagic = CGROUP2_SUPER_MAGIC, .flags = IMA_FSMAGIC}, #ifdef CONFIG_IMA_WRITE_POLICY {.action = APPRAISE, .func = POLICY_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, #endif #ifndef CONFIG_IMA_APPRAISE_SIGNED_INIT {.action = APPRAISE, .fowner = GLOBAL_ROOT_UID, .fowner_op = &vfsuid_eq_kuid, .flags = IMA_FOWNER}, #else /* force signature */ {.action = APPRAISE, .fowner = GLOBAL_ROOT_UID, .fowner_op = &vfsuid_eq_kuid, .flags = IMA_FOWNER | IMA_DIGSIG_REQUIRED}, #endif }; static struct ima_rule_entry build_appraise_rules[] __ro_after_init = { #ifdef CONFIG_IMA_APPRAISE_REQUIRE_MODULE_SIGS {.action = APPRAISE, .func = MODULE_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, #endif #ifdef CONFIG_IMA_APPRAISE_REQUIRE_FIRMWARE_SIGS {.action = APPRAISE, .func = FIRMWARE_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, #endif #ifdef CONFIG_IMA_APPRAISE_REQUIRE_KEXEC_SIGS {.action = APPRAISE, .func = KEXEC_KERNEL_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, #endif #ifdef CONFIG_IMA_APPRAISE_REQUIRE_POLICY_SIGS {.action = APPRAISE, .func = POLICY_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, #endif }; static struct ima_rule_entry secure_boot_rules[] __ro_after_init = { {.action = APPRAISE, .func = MODULE_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, {.action = APPRAISE, .func = FIRMWARE_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, {.action = APPRAISE, .func = KEXEC_KERNEL_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, {.action = APPRAISE, .func = POLICY_CHECK, .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED}, }; static struct ima_rule_entry critical_data_rules[] __ro_after_init = { {.action = MEASURE, .func = CRITICAL_DATA, .flags = IMA_FUNC}, }; /* An array of architecture specific rules */ static struct ima_rule_entry *arch_policy_entry __ro_after_init; static LIST_HEAD(ima_default_rules); static LIST_HEAD(ima_policy_rules); static LIST_HEAD(ima_temp_rules); static struct list_head __rcu *ima_rules = (struct list_head __rcu *)(&ima_default_rules); static int ima_policy __initdata; static int __init default_measure_policy_setup(char *str) { if (ima_policy) return 1; ima_policy = ORIGINAL_TCB; return 1; } __setup("ima_tcb", default_measure_policy_setup); static bool ima_use_appraise_tcb __initdata; static bool ima_use_secure_boot __initdata; static bool ima_use_critical_data __initdata; static bool ima_fail_unverifiable_sigs __ro_after_init; static int __init policy_setup(char *str) { char *p; while ((p = strsep(&str, " |\n")) != NULL) { if (*p == ' ') continue; if ((strcmp(p, "tcb") == 0) && !ima_policy) ima_policy = DEFAULT_TCB; else if (strcmp(p, "appraise_tcb") == 0) ima_use_appraise_tcb = true; else if (strcmp(p, "secure_boot") == 0) ima_use_secure_boot = true; else if (strcmp(p, "critical_data") == 0) ima_use_critical_data = true; else if (strcmp(p, "fail_securely") == 0) ima_fail_unverifiable_sigs = true; else pr_err("policy \"%s\" not found", p); } return 1; } __setup("ima_policy=", policy_setup); static int __init default_appraise_policy_setup(char *str) { ima_use_appraise_tcb = true; return 1; } __setup("ima_appraise_tcb", default_appraise_policy_setup); static struct ima_rule_opt_list *ima_alloc_rule_opt_list(const substring_t *src) { struct ima_rule_opt_list *opt_list; size_t count = 0; char *src_copy; char *cur, *next; size_t i; src_copy = match_strdup(src); if (!src_copy) return ERR_PTR(-ENOMEM); next = src_copy; while ((cur = strsep(&next, "|"))) { /* Don't accept an empty list item */ if (!(*cur)) { kfree(src_copy); return ERR_PTR(-EINVAL); } count++; } /* Don't accept an empty list */ if (!count) { kfree(src_copy); return ERR_PTR(-EINVAL); } opt_list = kzalloc(struct_size(opt_list, items, count), GFP_KERNEL); if (!opt_list) { kfree(src_copy); return ERR_PTR(-ENOMEM); } opt_list->count = count; /* * strsep() has already replaced all instances of '|' with '\0', * leaving a byte sequence of NUL-terminated strings. Reference each * string with the array of items. * * IMPORTANT: Ownership of the allocated buffer is transferred from * src_copy to the first element in the items array. To free the * buffer, kfree() must only be called on the first element of the * array. */ for (i = 0, cur = src_copy; i < count; i++) { opt_list->items[i] = cur; cur = strchr(cur, '\0') + 1; } return opt_list; } static void ima_free_rule_opt_list(struct ima_rule_opt_list *opt_list) { if (!opt_list) return; if (opt_list->count) { kfree(opt_list->items[0]); opt_list->count = 0; } kfree(opt_list); } static void ima_lsm_free_rule(struct ima_rule_entry *entry) { int i; for (i = 0; i < MAX_LSM_RULES; i++) { ima_filter_rule_free(entry->lsm[i].rule); kfree(entry->lsm[i].args_p); } } static void ima_free_rule(struct ima_rule_entry *entry) { if (!entry) return; /* * entry->template->fields may be allocated in ima_parse_rule() but that * reference is owned by the corresponding ima_template_desc element in * the defined_templates list and cannot be freed here */ kfree(entry->fsname); ima_free_rule_opt_list(entry->keyrings); ima_lsm_free_rule(entry); kfree(entry); } static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry, gfp_t gfp) { struct ima_rule_entry *nentry; int i; /* * Immutable elements are copied over as pointers and data; only * lsm rules can change */ nentry = kmemdup(entry, sizeof(*nentry), gfp); if (!nentry) return NULL; memset(nentry->lsm, 0, sizeof_field(struct ima_rule_entry, lsm)); for (i = 0; i < MAX_LSM_RULES; i++) { if (!entry->lsm[i].args_p) continue; nentry->lsm[i].type = entry->lsm[i].type; nentry->lsm[i].args_p = entry->lsm[i].args_p; ima_filter_rule_init(nentry->lsm[i].type, Audit_equal, nentry->lsm[i].args_p, &nentry->lsm[i].rule, gfp); if (!nentry->lsm[i].rule) pr_warn("rule for LSM \'%s\' is undefined\n", nentry->lsm[i].args_p); } return nentry; } static int ima_lsm_update_rule(struct ima_rule_entry *entry) { int i; struct ima_rule_entry *nentry; nentry = ima_lsm_copy_rule(entry, GFP_KERNEL); if (!nentry) return -ENOMEM; list_replace_rcu(&entry->list, &nentry->list); synchronize_rcu(); /* * ima_lsm_copy_rule() shallow copied all references, except for the * LSM references, from entry to nentry so we only want to free the LSM * references and the entry itself. All other memory references will now * be owned by nentry. */ for (i = 0; i < MAX_LSM_RULES; i++) ima_filter_rule_free(entry->lsm[i].rule); kfree(entry); return 0; } static bool ima_rule_contains_lsm_cond(struct ima_rule_entry *entry) { int i; for (i = 0; i < MAX_LSM_RULES; i++) if (entry->lsm[i].args_p) return true; return false; } /* * The LSM policy can be reloaded, leaving the IMA LSM based rules referring * to the old, stale LSM policy. Update the IMA LSM based rules to reflect * the reloaded LSM policy. */ static void ima_lsm_update_rules(void) { struct ima_rule_entry *entry, *e; int result; list_for_each_entry_safe(entry, e, &ima_policy_rules, list) { if (!ima_rule_contains_lsm_cond(entry)) continue; result = ima_lsm_update_rule(entry); if (result) { pr_err("lsm rule update error %d\n", result); return; } } } int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event, void *lsm_data) { if (event != LSM_POLICY_CHANGE) return NOTIFY_DONE; ima_lsm_update_rules(); return NOTIFY_OK; } /** * ima_match_rule_data - determine whether func_data matches the policy rule * @rule: a pointer to a rule * @func_data: data to match against the measure rule data * @cred: a pointer to a credentials structure for user validation * * Returns true if func_data matches one in the rule, false otherwise. */ static bool ima_match_rule_data(struct ima_rule_entry *rule, const char *func_data, const struct cred *cred) { const struct ima_rule_opt_list *opt_list = NULL; bool matched = false; size_t i; if ((rule->flags & IMA_UID) && !rule->uid_op(cred->uid, rule->uid)) return false; switch (rule->func) { case KEY_CHECK: if (!rule->keyrings) return true; opt_list = rule->keyrings; break; case CRITICAL_DATA: if (!rule->label) return true; opt_list = rule->label; break; default: return false; } if (!func_data) return false; for (i = 0; i < opt_list->count; i++) { if (!strcmp(opt_list->items[i], func_data)) { matched = true; break; } } return matched; } /** * ima_match_rules - determine whether an inode matches the policy rule. * @rule: a pointer to a rule * @idmap: idmap of the mount the inode was found from * @inode: a pointer to an inode * @cred: a pointer to a credentials structure for user validation * @secid: the secid of the task to be validated * @func: LIM hook identifier * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC) * @func_data: func specific data, may be NULL * * Returns true on rule match, false on failure. */ static bool ima_match_rules(struct ima_rule_entry *rule, struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, u32 secid, enum ima_hooks func, int mask, const char *func_data) { int i; bool result = false; struct ima_rule_entry *lsm_rule = rule; bool rule_reinitialized = false; if ((rule->flags & IMA_FUNC) && (rule->func != func && func != POST_SETATTR)) return false; switch (func) { case KEY_CHECK: case CRITICAL_DATA: return ((rule->func == func) && ima_match_rule_data(rule, func_data, cred)); default: break; } if ((rule->flags & IMA_MASK) && (rule->mask != mask && func != POST_SETATTR)) return false; if ((rule->flags & IMA_INMASK) && (!(rule->mask & mask) && func != POST_SETATTR)) return false; if ((rule->flags & IMA_FSMAGIC) && rule->fsmagic != inode->i_sb->s_magic) return false; if ((rule->flags & IMA_FSNAME) && strcmp(rule->fsname, inode->i_sb->s_type->name)) return false; if ((rule->flags & IMA_FSUUID) && !uuid_equal(&rule->fsuuid, &inode->i_sb->s_uuid)) return false; if ((rule->flags & IMA_UID) && !rule->uid_op(cred->uid, rule->uid)) return false; if (rule->flags & IMA_EUID) { if (has_capability_noaudit(current, CAP_SETUID)) { if (!rule->uid_op(cred->euid, rule->uid) && !rule->uid_op(cred->suid, rule->uid) && !rule->uid_op(cred->uid, rule->uid)) return false; } else if (!rule->uid_op(cred->euid, rule->uid)) return false; } if ((rule->flags & IMA_GID) && !rule->gid_op(cred->gid, rule->gid)) return false; if (rule->flags & IMA_EGID) { if (has_capability_noaudit(current, CAP_SETGID)) { if (!rule->gid_op(cred->egid, rule->gid) && !rule->gid_op(cred->sgid, rule->gid) && !rule->gid_op(cred->gid, rule->gid)) return false; } else if (!rule->gid_op(cred->egid, rule->gid)) return false; } if ((rule->flags & IMA_FOWNER) && !rule->fowner_op(i_uid_into_vfsuid(idmap, inode), rule->fowner)) return false; if ((rule->flags & IMA_FGROUP) && !rule->fgroup_op(i_gid_into_vfsgid(idmap, inode), rule->fgroup)) return false; for (i = 0; i < MAX_LSM_RULES; i++) { int rc = 0; u32 osid; if (!lsm_rule->lsm[i].rule) { if (!lsm_rule->lsm[i].args_p) continue; else return false; } retry: switch (i) { case LSM_OBJ_USER: case LSM_OBJ_ROLE: case LSM_OBJ_TYPE: security_inode_getsecid(inode, &osid); rc = ima_filter_rule_match(osid, lsm_rule->lsm[i].type, Audit_equal, lsm_rule->lsm[i].rule); break; case LSM_SUBJ_USER: case LSM_SUBJ_ROLE: case LSM_SUBJ_TYPE: rc = ima_filter_rule_match(secid, lsm_rule->lsm[i].type, Audit_equal, lsm_rule->lsm[i].rule); break; default: break; } if (rc == -ESTALE && !rule_reinitialized) { lsm_rule = ima_lsm_copy_rule(rule, GFP_ATOMIC); if (lsm_rule) { rule_reinitialized = true; goto retry; } } if (!rc) { result = false; goto out; } } result = true; out: if (rule_reinitialized) { for (i = 0; i < MAX_LSM_RULES; i++) ima_filter_rule_free(lsm_rule->lsm[i].rule); kfree(lsm_rule); } return result; } /* * In addition to knowing that we need to appraise the file in general, * we need to differentiate between calling hooks, for hook specific rules. */ static int get_subaction(struct ima_rule_entry *rule, enum ima_hooks func) { if (!(rule->flags & IMA_FUNC)) return IMA_FILE_APPRAISE; switch (func) { case MMAP_CHECK: case MMAP_CHECK_REQPROT: return IMA_MMAP_APPRAISE; case BPRM_CHECK: return IMA_BPRM_APPRAISE; case CREDS_CHECK: return IMA_CREDS_APPRAISE; case FILE_CHECK: case POST_SETATTR: return IMA_FILE_APPRAISE; case MODULE_CHECK ... MAX_CHECK - 1: default: return IMA_READ_APPRAISE; } } /** * ima_match_policy - decision based on LSM and other conditions * @idmap: idmap of the mount the inode was found from * @inode: pointer to an inode for which the policy decision is being made * @cred: pointer to a credentials structure for which the policy decision is * being made * @secid: LSM secid of the task to be validated * @func: IMA hook identifier * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC) * @flags: IMA actions to consider (e.g. IMA_MEASURE | IMA_APPRAISE) * @pcr: set the pcr to extend * @template_desc: the template that should be used for this rule * @func_data: func specific data, may be NULL * @allowed_algos: allowlist of hash algorithms for the IMA xattr * * Measure decision based on func/mask/fsmagic and LSM(subj/obj/type) * conditions. * * Since the IMA policy may be updated multiple times we need to lock the * list when walking it. Reads are many orders of magnitude more numerous * than writes so ima_match_policy() is classical RCU candidate. */ int ima_match_policy(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, u32 secid, enum ima_hooks func, int mask, int flags, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos) { struct ima_rule_entry *entry; int action = 0, actmask = flags | (flags << 1); struct list_head *ima_rules_tmp; if (template_desc && !*template_desc) *template_desc = ima_template_desc_current(); rcu_read_lock(); ima_rules_tmp = rcu_dereference(ima_rules); list_for_each_entry_rcu(entry, ima_rules_tmp, list) { if (!(entry->action & actmask)) continue; if (!ima_match_rules(entry, idmap, inode, cred, secid, func, mask, func_data)) continue; action |= entry->flags & IMA_NONACTION_FLAGS; action |= entry->action & IMA_DO_MASK; if (entry->action & IMA_APPRAISE) { action |= get_subaction(entry, func); action &= ~IMA_HASH; if (ima_fail_unverifiable_sigs) action |= IMA_FAIL_UNVERIFIABLE_SIGS; if (allowed_algos && entry->flags & IMA_VALIDATE_ALGOS) *allowed_algos = entry->allowed_algos; } if (entry->action & IMA_DO_MASK) actmask &= ~(entry->action | entry->action << 1); else actmask &= ~(entry->action | entry->action >> 1); if ((pcr) && (entry->flags & IMA_PCR)) *pcr = entry->pcr; if (template_desc && entry->template) *template_desc = entry->template; if (!actmask) break; } rcu_read_unlock(); return action; } /** * ima_update_policy_flags() - Update global IMA variables * * Update ima_policy_flag and ima_setxattr_allowed_hash_algorithms * based on the currently loaded policy. * * With ima_policy_flag, the decision to short circuit out of a function * or not call the function in the first place can be made earlier. * * With ima_setxattr_allowed_hash_algorithms, the policy can restrict the * set of hash algorithms accepted when updating the security.ima xattr of * a file. * * Context: called after a policy update and at system initialization. */ void ima_update_policy_flags(void) { struct ima_rule_entry *entry; int new_policy_flag = 0; struct list_head *ima_rules_tmp; rcu_read_lock(); ima_rules_tmp = rcu_dereference(ima_rules); list_for_each_entry_rcu(entry, ima_rules_tmp, list) { /* * SETXATTR_CHECK rules do not implement a full policy check * because rule checking would probably have an important * performance impact on setxattr(). As a consequence, only one * SETXATTR_CHECK can be active at a given time. * Because we want to preserve that property, we set out to use * atomic_cmpxchg. Either: * - the atomic was non-zero: a setxattr hash policy is * already enforced, we do nothing * - the atomic was zero: no setxattr policy was set, enable * the setxattr hash policy */ if (entry->func == SETXATTR_CHECK) { atomic_cmpxchg(&ima_setxattr_allowed_hash_algorithms, 0, entry->allowed_algos); /* SETXATTR_CHECK doesn't impact ima_policy_flag */ continue; } if (entry->action & IMA_DO_MASK) new_policy_flag |= entry->action; } rcu_read_unlock(); ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) new_policy_flag &= ~IMA_APPRAISE; ima_policy_flag = new_policy_flag; } static int ima_appraise_flag(enum ima_hooks func) { if (func == MODULE_CHECK) return IMA_APPRAISE_MODULES; else if (func == FIRMWARE_CHECK) return IMA_APPRAISE_FIRMWARE; else if (func == POLICY_CHECK) return IMA_APPRAISE_POLICY; else if (func == KEXEC_KERNEL_CHECK) return IMA_APPRAISE_KEXEC; return 0; } static void add_rules(struct ima_rule_entry *entries, int count, enum policy_rule_list policy_rule) { int i = 0; for (i = 0; i < count; i++) { struct ima_rule_entry *entry; if (policy_rule & IMA_DEFAULT_POLICY) list_add_tail(&entries[i].list, &ima_default_rules); if (policy_rule & IMA_CUSTOM_POLICY) { entry = kmemdup(&entries[i], sizeof(*entry), GFP_KERNEL); if (!entry) continue; list_add_tail(&entry->list, &ima_policy_rules); } if (entries[i].action == APPRAISE) { if (entries != build_appraise_rules) temp_ima_appraise |= ima_appraise_flag(entries[i].func); else build_ima_appraise |= ima_appraise_flag(entries[i].func); } } } static int ima_parse_rule(char *rule, struct ima_rule_entry *entry); static int __init ima_init_arch_policy(void) { const char * const *arch_rules; const char * const *rules; int arch_entries = 0; int i = 0; arch_rules = arch_get_ima_policy(); if (!arch_rules) return arch_entries; /* Get number of rules */ for (rules = arch_rules; *rules != NULL; rules++) arch_entries++; arch_policy_entry = kcalloc(arch_entries + 1, sizeof(*arch_policy_entry), GFP_KERNEL); if (!arch_policy_entry) return 0; /* Convert each policy string rules to struct ima_rule_entry format */ for (rules = arch_rules, i = 0; *rules != NULL; rules++) { char rule[255]; int result; result = strscpy(rule, *rules, sizeof(rule)); INIT_LIST_HEAD(&arch_policy_entry[i].list); result = ima_parse_rule(rule, &arch_policy_entry[i]); if (result) { pr_warn("Skipping unknown architecture policy rule: %s\n", rule); memset(&arch_policy_entry[i], 0, sizeof(*arch_policy_entry)); continue; } i++; } return i; } /** * ima_init_policy - initialize the default measure rules. * * ima_rules points to either the ima_default_rules or the new ima_policy_rules. */ void __init ima_init_policy(void) { int build_appraise_entries, arch_entries; /* if !ima_policy, we load NO default rules */ if (ima_policy) add_rules(dont_measure_rules, ARRAY_SIZE(dont_measure_rules), IMA_DEFAULT_POLICY); switch (ima_policy) { case ORIGINAL_TCB: add_rules(original_measurement_rules, ARRAY_SIZE(original_measurement_rules), IMA_DEFAULT_POLICY); break; case DEFAULT_TCB: add_rules(default_measurement_rules, ARRAY_SIZE(default_measurement_rules), IMA_DEFAULT_POLICY); break; default: break; } /* * Based on runtime secure boot flags, insert arch specific measurement * and appraise rules requiring file signatures for both the initial * and custom policies, prior to other appraise rules. * (Highest priority) */ arch_entries = ima_init_arch_policy(); if (!arch_entries) pr_info("No architecture policies found\n"); else add_rules(arch_policy_entry, arch_entries, IMA_DEFAULT_POLICY | IMA_CUSTOM_POLICY); /* * Insert the builtin "secure_boot" policy rules requiring file * signatures, prior to other appraise rules. */ if (ima_use_secure_boot) add_rules(secure_boot_rules, ARRAY_SIZE(secure_boot_rules), IMA_DEFAULT_POLICY); /* * Insert the build time appraise rules requiring file signatures * for both the initial and custom policies, prior to other appraise * rules. As the secure boot rules includes all of the build time * rules, include either one or the other set of rules, but not both. */ build_appraise_entries = ARRAY_SIZE(build_appraise_rules); if (build_appraise_entries) { if (ima_use_secure_boot) add_rules(build_appraise_rules, build_appraise_entries, IMA_CUSTOM_POLICY); else add_rules(build_appraise_rules, build_appraise_entries, IMA_DEFAULT_POLICY | IMA_CUSTOM_POLICY); } if (ima_use_appraise_tcb) add_rules(default_appraise_rules, ARRAY_SIZE(default_appraise_rules), IMA_DEFAULT_POLICY); if (ima_use_critical_data) add_rules(critical_data_rules, ARRAY_SIZE(critical_data_rules), IMA_DEFAULT_POLICY); atomic_set(&ima_setxattr_allowed_hash_algorithms, 0); ima_update_policy_flags(); } /* Make sure we have a valid policy, at least containing some rules. */ int ima_check_policy(void) { if (list_empty(&ima_temp_rules)) return -EINVAL; return 0; } /** * ima_update_policy - update default_rules with new measure rules * * Called on file .release to update the default rules with a complete new * policy. What we do here is to splice ima_policy_rules and ima_temp_rules so * they make a queue. The policy may be updated multiple times and this is the * RCU updater. * * Policy rules are never deleted so ima_policy_flag gets zeroed only once when * we switch from the default policy to user defined. */ void ima_update_policy(void) { struct list_head *policy = &ima_policy_rules; list_splice_tail_init_rcu(&ima_temp_rules, policy, synchronize_rcu); if (ima_rules != (struct list_head __rcu *)policy) { ima_policy_flag = 0; rcu_assign_pointer(ima_rules, policy); /* * IMA architecture specific policy rules are specified * as strings and converted to an array of ima_entry_rules * on boot. After loading a custom policy, free the * architecture specific rules stored as an array. */ kfree(arch_policy_entry); } ima_update_policy_flags(); /* Custom IMA policy has been loaded */ ima_process_queued_keys(); } /* Keep the enumeration in sync with the policy_tokens! */ enum policy_opt { Opt_measure, Opt_dont_measure, Opt_appraise, Opt_dont_appraise, Opt_audit, Opt_hash, Opt_dont_hash, Opt_obj_user, Opt_obj_role, Opt_obj_type, Opt_subj_user, Opt_subj_role, Opt_subj_type, Opt_func, Opt_mask, Opt_fsmagic, Opt_fsname, Opt_fsuuid, Opt_uid_eq, Opt_euid_eq, Opt_gid_eq, Opt_egid_eq, Opt_fowner_eq, Opt_fgroup_eq, Opt_uid_gt, Opt_euid_gt, Opt_gid_gt, Opt_egid_gt, Opt_fowner_gt, Opt_fgroup_gt, Opt_uid_lt, Opt_euid_lt, Opt_gid_lt, Opt_egid_lt, Opt_fowner_lt, Opt_fgroup_lt, Opt_digest_type, Opt_appraise_type, Opt_appraise_flag, Opt_appraise_algos, Opt_permit_directio, Opt_pcr, Opt_template, Opt_keyrings, Opt_label, Opt_err }; static const match_table_t policy_tokens = { {Opt_measure, "measure"}, {Opt_dont_measure, "dont_measure"}, {Opt_appraise, "appraise"}, {Opt_dont_appraise, "dont_appraise"}, {Opt_audit, "audit"}, {Opt_hash, "hash"}, {Opt_dont_hash, "dont_hash"}, {Opt_obj_user, "obj_user=%s"}, {Opt_obj_role, "obj_role=%s"}, {Opt_obj_type, "obj_type=%s"}, {Opt_subj_user, "subj_user=%s"}, {Opt_subj_role, "subj_role=%s"}, {Opt_subj_type, "subj_type=%s"}, {Opt_func, "func=%s"}, {Opt_mask, "mask=%s"}, {Opt_fsmagic, "fsmagic=%s"}, {Opt_fsname, "fsname=%s"}, {Opt_fsuuid, "fsuuid=%s"}, {Opt_uid_eq, "uid=%s"}, {Opt_euid_eq, "euid=%s"}, {Opt_gid_eq, "gid=%s"}, {Opt_egid_eq, "egid=%s"}, {Opt_fowner_eq, "fowner=%s"}, {Opt_fgroup_eq, "fgroup=%s"}, {Opt_uid_gt, "uid>%s"}, {Opt_euid_gt, "euid>%s"}, {Opt_gid_gt, "gid>%s"}, {Opt_egid_gt, "egid>%s"}, {Opt_fowner_gt, "fowner>%s"}, {Opt_fgroup_gt, "fgroup>%s"}, {Opt_uid_lt, "uid<%s"}, {Opt_euid_lt, "euid<%s"}, {Opt_gid_lt, "gid<%s"}, {Opt_egid_lt, "egid<%s"}, {Opt_fowner_lt, "fowner<%s"}, {Opt_fgroup_lt, "fgroup<%s"}, {Opt_digest_type, "digest_type=%s"}, {Opt_appraise_type, "appraise_type=%s"}, {Opt_appraise_flag, "appraise_flag=%s"}, {Opt_appraise_algos, "appraise_algos=%s"}, {Opt_permit_directio, "permit_directio"}, {Opt_pcr, "pcr=%s"}, {Opt_template, "template=%s"}, {Opt_keyrings, "keyrings=%s"}, {Opt_label, "label=%s"}, {Opt_err, NULL} }; static int ima_lsm_rule_init(struct ima_rule_entry *entry, substring_t *args, int lsm_rule, int audit_type) { int result; if (entry->lsm[lsm_rule].rule) return -EINVAL; entry->lsm[lsm_rule].args_p = match_strdup(args); if (!entry->lsm[lsm_rule].args_p) return -ENOMEM; entry->lsm[lsm_rule].type = audit_type; result = ima_filter_rule_init(entry->lsm[lsm_rule].type, Audit_equal, entry->lsm[lsm_rule].args_p, &entry->lsm[lsm_rule].rule, GFP_KERNEL); if (!entry->lsm[lsm_rule].rule) { pr_warn("rule for LSM \'%s\' is undefined\n", entry->lsm[lsm_rule].args_p); if (ima_rules == (struct list_head __rcu *)(&ima_default_rules)) { kfree(entry->lsm[lsm_rule].args_p); entry->lsm[lsm_rule].args_p = NULL; result = -EINVAL; } else result = 0; } return result; } static void ima_log_string_op(struct audit_buffer *ab, char *key, char *value, enum policy_opt rule_operator) { if (!ab) return; switch (rule_operator) { case Opt_uid_gt: case Opt_euid_gt: case Opt_gid_gt: case Opt_egid_gt: case Opt_fowner_gt: case Opt_fgroup_gt: audit_log_format(ab, "%s>", key); break; case Opt_uid_lt: case Opt_euid_lt: case Opt_gid_lt: case Opt_egid_lt: case Opt_fowner_lt: case Opt_fgroup_lt: audit_log_format(ab, "%s<", key); break; default: audit_log_format(ab, "%s=", key); } audit_log_format(ab, "%s ", value); } static void ima_log_string(struct audit_buffer *ab, char *key, char *value) { ima_log_string_op(ab, key, value, Opt_err); } /* * Validating the appended signature included in the measurement list requires * the file hash calculated without the appended signature (i.e., the 'd-modsig' * field). Therefore, notify the user if they have the 'modsig' field but not * the 'd-modsig' field in the template. */ static void check_template_modsig(const struct ima_template_desc *template) { #define MSG "template with 'modsig' field also needs 'd-modsig' field\n" bool has_modsig, has_dmodsig; static bool checked; int i; /* We only need to notify the user once. */ if (checked) return; has_modsig = has_dmodsig = false; for (i = 0; i < template->num_fields; i++) { if (!strcmp(template->fields[i]->field_id, "modsig")) has_modsig = true; else if (!strcmp(template->fields[i]->field_id, "d-modsig")) has_dmodsig = true; } if (has_modsig && !has_dmodsig) pr_notice(MSG); checked = true; #undef MSG } /* * Warn if the template does not contain the given field. */ static void check_template_field(const struct ima_template_desc *template, const char *field, const char *msg) { int i; for (i = 0; i < template->num_fields; i++) if (!strcmp(template->fields[i]->field_id, field)) return; pr_notice_once("%s", msg); } static bool ima_validate_rule(struct ima_rule_entry *entry) { /* Ensure that the action is set and is compatible with the flags */ if (entry->action == UNKNOWN) return false; if (entry->action != MEASURE && entry->flags & IMA_PCR) return false; if (entry->action != APPRAISE && entry->flags & (IMA_DIGSIG_REQUIRED | IMA_MODSIG_ALLOWED | IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS)) return false; /* * The IMA_FUNC bit must be set if and only if there's a valid hook * function specified, and vice versa. Enforcing this property allows * for the NONE case below to validate a rule without an explicit hook * function. */ if (((entry->flags & IMA_FUNC) && entry->func == NONE) || (!(entry->flags & IMA_FUNC) && entry->func != NONE)) return false; /* * Ensure that the hook function is compatible with the other * components of the rule */ switch (entry->func) { case NONE: case FILE_CHECK: case MMAP_CHECK: case MMAP_CHECK_REQPROT: case BPRM_CHECK: case CREDS_CHECK: case POST_SETATTR: case FIRMWARE_CHECK: case POLICY_CHECK: if (entry->flags & ~(IMA_FUNC | IMA_MASK | IMA_FSMAGIC | IMA_UID | IMA_FOWNER | IMA_FSUUID | IMA_INMASK | IMA_EUID | IMA_PCR | IMA_FSNAME | IMA_GID | IMA_EGID | IMA_FGROUP | IMA_DIGSIG_REQUIRED | IMA_PERMIT_DIRECTIO | IMA_VALIDATE_ALGOS | IMA_CHECK_BLACKLIST | IMA_VERITY_REQUIRED)) return false; break; case MODULE_CHECK: case KEXEC_KERNEL_CHECK: case KEXEC_INITRAMFS_CHECK: if (entry->flags & ~(IMA_FUNC | IMA_MASK | IMA_FSMAGIC | IMA_UID | IMA_FOWNER | IMA_FSUUID | IMA_INMASK | IMA_EUID | IMA_PCR | IMA_FSNAME | IMA_GID | IMA_EGID | IMA_FGROUP | IMA_DIGSIG_REQUIRED | IMA_PERMIT_DIRECTIO | IMA_MODSIG_ALLOWED | IMA_CHECK_BLACKLIST | IMA_VALIDATE_ALGOS)) return false; break; case KEXEC_CMDLINE: if (entry->action & ~(MEASURE | DONT_MEASURE)) return false; if (entry->flags & ~(IMA_FUNC | IMA_FSMAGIC | IMA_UID | IMA_FOWNER | IMA_FSUUID | IMA_EUID | IMA_PCR | IMA_FSNAME | IMA_GID | IMA_EGID | IMA_FGROUP)) return false; break; case KEY_CHECK: if (entry->action & ~(MEASURE | DONT_MEASURE)) return false; if (entry->flags & ~(IMA_FUNC | IMA_UID | IMA_GID | IMA_PCR | IMA_KEYRINGS)) return false; if (ima_rule_contains_lsm_cond(entry)) return false; break; case CRITICAL_DATA: if (entry->action & ~(MEASURE | DONT_MEASURE)) return false; if (entry->flags & ~(IMA_FUNC | IMA_UID | IMA_GID | IMA_PCR | IMA_LABEL)) return false; if (ima_rule_contains_lsm_cond(entry)) return false; break; case SETXATTR_CHECK: /* any action other than APPRAISE is unsupported */ if (entry->action != APPRAISE) return false; /* SETXATTR_CHECK requires an appraise_algos parameter */ if (!(entry->flags & IMA_VALIDATE_ALGOS)) return false; /* * full policies are not supported, they would have too * much of a performance impact */ if (entry->flags & ~(IMA_FUNC | IMA_VALIDATE_ALGOS)) return false; break; default: return false; } /* Ensure that combinations of flags are compatible with each other */ if (entry->flags & IMA_CHECK_BLACKLIST && !(entry->flags & IMA_DIGSIG_REQUIRED)) return false; /* * Unlike for regular IMA 'appraise' policy rules where security.ima * xattr may contain either a file hash or signature, the security.ima * xattr for fsverity must contain a file signature (sigv3). Ensure * that 'appraise' rules for fsverity require file signatures by * checking the IMA_DIGSIG_REQUIRED flag is set. */ if (entry->action == APPRAISE && (entry->flags & IMA_VERITY_REQUIRED) && !(entry->flags & IMA_DIGSIG_REQUIRED)) return false; return true; } static unsigned int ima_parse_appraise_algos(char *arg) { unsigned int res = 0; int idx; char *token; while ((token = strsep(&arg, ",")) != NULL) { idx = match_string(hash_algo_name, HASH_ALGO__LAST, token); if (idx < 0) { pr_err("unknown hash algorithm \"%s\"", token); return 0; } if (!crypto_has_alg(hash_algo_name[idx], 0, 0)) { pr_err("unavailable hash algorithm \"%s\", check your kernel configuration", token); return 0; } /* Add the hash algorithm to the 'allowed' bitfield */ res |= (1U << idx); } return res; } static int ima_parse_rule(char *rule, struct ima_rule_entry *entry) { struct audit_buffer *ab; char *from; char *p; bool eid_token; /* either euid or egid */ struct ima_template_desc *template_desc; int result = 0; ab = integrity_audit_log_start(audit_context(), GFP_KERNEL, AUDIT_INTEGRITY_POLICY_RULE); entry->uid = INVALID_UID; entry->gid = INVALID_GID; entry->fowner = INVALID_UID; entry->fgroup = INVALID_GID; entry->uid_op = &uid_eq; entry->gid_op = &gid_eq; entry->fowner_op = &vfsuid_eq_kuid; entry->fgroup_op = &vfsgid_eq_kgid; entry->action = UNKNOWN; while ((p = strsep(&rule, " \t")) != NULL) { substring_t args[MAX_OPT_ARGS]; int token; unsigned long lnum; if (result < 0) break; if ((*p == '\0') || (*p == ' ') || (*p == '\t')) continue; token = match_token(p, policy_tokens, args); switch (token) { case Opt_measure: ima_log_string(ab, "action", "measure"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = MEASURE; break; case Opt_dont_measure: ima_log_string(ab, "action", "dont_measure"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = DONT_MEASURE; break; case Opt_appraise: ima_log_string(ab, "action", "appraise"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = APPRAISE; break; case Opt_dont_appraise: ima_log_string(ab, "action", "dont_appraise"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = DONT_APPRAISE; break; case Opt_audit: ima_log_string(ab, "action", "audit"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = AUDIT; break; case Opt_hash: ima_log_string(ab, "action", "hash"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = HASH; break; case Opt_dont_hash: ima_log_string(ab, "action", "dont_hash"); if (entry->action != UNKNOWN) result = -EINVAL; entry->action = DONT_HASH; break; case Opt_func: ima_log_string(ab, "func", args[0].from); if (entry->func) result = -EINVAL; if (strcmp(args[0].from, "FILE_CHECK") == 0) entry->func = FILE_CHECK; /* PATH_CHECK is for backwards compat */ else if (strcmp(args[0].from, "PATH_CHECK") == 0) entry->func = FILE_CHECK; else if (strcmp(args[0].from, "MODULE_CHECK") == 0) entry->func = MODULE_CHECK; else if (strcmp(args[0].from, "FIRMWARE_CHECK") == 0) entry->func = FIRMWARE_CHECK; else if ((strcmp(args[0].from, "FILE_MMAP") == 0) || (strcmp(args[0].from, "MMAP_CHECK") == 0)) entry->func = MMAP_CHECK; else if ((strcmp(args[0].from, "MMAP_CHECK_REQPROT") == 0)) entry->func = MMAP_CHECK_REQPROT; else if (strcmp(args[0].from, "BPRM_CHECK") == 0) entry->func = BPRM_CHECK; else if (strcmp(args[0].from, "CREDS_CHECK") == 0) entry->func = CREDS_CHECK; else if (strcmp(args[0].from, "KEXEC_KERNEL_CHECK") == 0) entry->func = KEXEC_KERNEL_CHECK; else if (strcmp(args[0].from, "KEXEC_INITRAMFS_CHECK") == 0) entry->func = KEXEC_INITRAMFS_CHECK; else if (strcmp(args[0].from, "POLICY_CHECK") == 0) entry->func = POLICY_CHECK; else if (strcmp(args[0].from, "KEXEC_CMDLINE") == 0) entry->func = KEXEC_CMDLINE; else if (IS_ENABLED(CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS) && strcmp(args[0].from, "KEY_CHECK") == 0) entry->func = KEY_CHECK; else if (strcmp(args[0].from, "CRITICAL_DATA") == 0) entry->func = CRITICAL_DATA; else if (strcmp(args[0].from, "SETXATTR_CHECK") == 0) entry->func = SETXATTR_CHECK; else result = -EINVAL; if (!result) entry->flags |= IMA_FUNC; break; case Opt_mask: ima_log_string(ab, "mask", args[0].from); if (entry->mask) result = -EINVAL; from = args[0].from; if (*from == '^') from++; if ((strcmp(from, "MAY_EXEC")) == 0) entry->mask = MAY_EXEC; else if (strcmp(from, "MAY_WRITE") == 0) entry->mask = MAY_WRITE; else if (strcmp(from, "MAY_READ") == 0) entry->mask = MAY_READ; else if (strcmp(from, "MAY_APPEND") == 0) entry->mask = MAY_APPEND; else result = -EINVAL; if (!result) entry->flags |= (*args[0].from == '^') ? IMA_INMASK : IMA_MASK; break; case Opt_fsmagic: ima_log_string(ab, "fsmagic", args[0].from); if (entry->fsmagic) { result = -EINVAL; break; } result = kstrtoul(args[0].from, 16, &entry->fsmagic); if (!result) entry->flags |= IMA_FSMAGIC; break; case Opt_fsname: ima_log_string(ab, "fsname", args[0].from); entry->fsname = kstrdup(args[0].from, GFP_KERNEL); if (!entry->fsname) { result = -ENOMEM; break; } result = 0; entry->flags |= IMA_FSNAME; break; case Opt_keyrings: ima_log_string(ab, "keyrings", args[0].from); if (!IS_ENABLED(CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS) || entry->keyrings) { result = -EINVAL; break; } entry->keyrings = ima_alloc_rule_opt_list(args); if (IS_ERR(entry->keyrings)) { result = PTR_ERR(entry->keyrings); entry->keyrings = NULL; break; } entry->flags |= IMA_KEYRINGS; break; case Opt_label: ima_log_string(ab, "label", args[0].from); if (entry->label) { result = -EINVAL; break; } entry->label = ima_alloc_rule_opt_list(args); if (IS_ERR(entry->label)) { result = PTR_ERR(entry->label); entry->label = NULL; break; } entry->flags |= IMA_LABEL; break; case Opt_fsuuid: ima_log_string(ab, "fsuuid", args[0].from); if (!uuid_is_null(&entry->fsuuid)) { result = -EINVAL; break; } result = uuid_parse(args[0].from, &entry->fsuuid); if (!result) entry->flags |= IMA_FSUUID; break; case Opt_uid_gt: case Opt_euid_gt: entry->uid_op = &uid_gt; fallthrough; case Opt_uid_lt: case Opt_euid_lt: if ((token == Opt_uid_lt) || (token == Opt_euid_lt)) entry->uid_op = &uid_lt; fallthrough; case Opt_uid_eq: case Opt_euid_eq: eid_token = (token == Opt_euid_eq) || (token == Opt_euid_gt) || (token == Opt_euid_lt); ima_log_string_op(ab, eid_token ? "euid" : "uid", args[0].from, token); if (uid_valid(entry->uid)) { result = -EINVAL; break; } result = kstrtoul(args[0].from, 10, &lnum); if (!result) { entry->uid = make_kuid(current_user_ns(), (uid_t) lnum); if (!uid_valid(entry->uid) || (uid_t)lnum != lnum) result = -EINVAL; else entry->flags |= eid_token ? IMA_EUID : IMA_UID; } break; case Opt_gid_gt: case Opt_egid_gt: entry->gid_op = &gid_gt; fallthrough; case Opt_gid_lt: case Opt_egid_lt: if ((token == Opt_gid_lt) || (token == Opt_egid_lt)) entry->gid_op = &gid_lt; fallthrough; case Opt_gid_eq: case Opt_egid_eq: eid_token = (token == Opt_egid_eq) || (token == Opt_egid_gt) || (token == Opt_egid_lt); ima_log_string_op(ab, eid_token ? "egid" : "gid", args[0].from, token); if (gid_valid(entry->gid)) { result = -EINVAL; break; } result = kstrtoul(args[0].from, 10, &lnum); if (!result) { entry->gid = make_kgid(current_user_ns(), (gid_t)lnum); if (!gid_valid(entry->gid) || (((gid_t)lnum) != lnum)) result = -EINVAL; else entry->flags |= eid_token ? IMA_EGID : IMA_GID; } break; case Opt_fowner_gt: entry->fowner_op = &vfsuid_gt_kuid; fallthrough; case Opt_fowner_lt: if (token == Opt_fowner_lt) entry->fowner_op = &vfsuid_lt_kuid; fallthrough; case Opt_fowner_eq: ima_log_string_op(ab, "fowner", args[0].from, token); if (uid_valid(entry->fowner)) { result = -EINVAL; break; } result = kstrtoul(args[0].from, 10, &lnum); if (!result) { entry->fowner = make_kuid(current_user_ns(), (uid_t)lnum); if (!uid_valid(entry->fowner) || (((uid_t)lnum) != lnum)) result = -EINVAL; else entry->flags |= IMA_FOWNER; } break; case Opt_fgroup_gt: entry->fgroup_op = &vfsgid_gt_kgid; fallthrough; case Opt_fgroup_lt: if (token == Opt_fgroup_lt) entry->fgroup_op = &vfsgid_lt_kgid; fallthrough; case Opt_fgroup_eq: ima_log_string_op(ab, "fgroup", args[0].from, token); if (gid_valid(entry->fgroup)) { result = -EINVAL; break; } result = kstrtoul(args[0].from, 10, &lnum); if (!result) { entry->fgroup = make_kgid(current_user_ns(), (gid_t)lnum); if (!gid_valid(entry->fgroup) || (((gid_t)lnum) != lnum)) result = -EINVAL; else entry->flags |= IMA_FGROUP; } break; case Opt_obj_user: ima_log_string(ab, "obj_user", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_OBJ_USER, AUDIT_OBJ_USER); break; case Opt_obj_role: ima_log_string(ab, "obj_role", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_OBJ_ROLE, AUDIT_OBJ_ROLE); break; case Opt_obj_type: ima_log_string(ab, "obj_type", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_OBJ_TYPE, AUDIT_OBJ_TYPE); break; case Opt_subj_user: ima_log_string(ab, "subj_user", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_SUBJ_USER, AUDIT_SUBJ_USER); break; case Opt_subj_role: ima_log_string(ab, "subj_role", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_SUBJ_ROLE, AUDIT_SUBJ_ROLE); break; case Opt_subj_type: ima_log_string(ab, "subj_type", args[0].from); result = ima_lsm_rule_init(entry, args, LSM_SUBJ_TYPE, AUDIT_SUBJ_TYPE); break; case Opt_digest_type: ima_log_string(ab, "digest_type", args[0].from); if (entry->flags & IMA_DIGSIG_REQUIRED) result = -EINVAL; else if ((strcmp(args[0].from, "verity")) == 0) entry->flags |= IMA_VERITY_REQUIRED; else result = -EINVAL; break; case Opt_appraise_type: ima_log_string(ab, "appraise_type", args[0].from); if ((strcmp(args[0].from, "imasig")) == 0) { if (entry->flags & IMA_VERITY_REQUIRED) result = -EINVAL; else entry->flags |= IMA_DIGSIG_REQUIRED | IMA_CHECK_BLACKLIST; } else if (strcmp(args[0].from, "sigv3") == 0) { /* Only fsverity supports sigv3 for now */ if (entry->flags & IMA_VERITY_REQUIRED) entry->flags |= IMA_DIGSIG_REQUIRED | IMA_CHECK_BLACKLIST; else result = -EINVAL; } else if (IS_ENABLED(CONFIG_IMA_APPRAISE_MODSIG) && strcmp(args[0].from, "imasig|modsig") == 0) { if (entry->flags & IMA_VERITY_REQUIRED) result = -EINVAL; else entry->flags |= IMA_DIGSIG_REQUIRED | IMA_MODSIG_ALLOWED | IMA_CHECK_BLACKLIST; } else { result = -EINVAL; } break; case Opt_appraise_flag: ima_log_string(ab, "appraise_flag", args[0].from); break; case Opt_appraise_algos: ima_log_string(ab, "appraise_algos", args[0].from); if (entry->allowed_algos) { result = -EINVAL; break; } entry->allowed_algos = ima_parse_appraise_algos(args[0].from); /* invalid or empty list of algorithms */ if (!entry->allowed_algos) { result = -EINVAL; break; } entry->flags |= IMA_VALIDATE_ALGOS; break; case Opt_permit_directio: entry->flags |= IMA_PERMIT_DIRECTIO; break; case Opt_pcr: ima_log_string(ab, "pcr", args[0].from); result = kstrtoint(args[0].from, 10, &entry->pcr); if (result || INVALID_PCR(entry->pcr)) result = -EINVAL; else entry->flags |= IMA_PCR; break; case Opt_template: ima_log_string(ab, "template", args[0].from); if (entry->action != MEASURE) { result = -EINVAL; break; } template_desc = lookup_template_desc(args[0].from); if (!template_desc || entry->template) { result = -EINVAL; break; } /* * template_desc_init_fields() does nothing if * the template is already initialised, so * it's safe to do this unconditionally */ template_desc_init_fields(template_desc->fmt, &(template_desc->fields), &(template_desc->num_fields)); entry->template = template_desc; break; case Opt_err: ima_log_string(ab, "UNKNOWN", p); result = -EINVAL; break; } } if (!result && !ima_validate_rule(entry)) result = -EINVAL; else if (entry->action == APPRAISE) temp_ima_appraise |= ima_appraise_flag(entry->func); if (!result && entry->flags & IMA_MODSIG_ALLOWED) { template_desc = entry->template ? entry->template : ima_template_desc_current(); check_template_modsig(template_desc); } /* d-ngv2 template field recommended for unsigned fs-verity digests */ if (!result && entry->action == MEASURE && entry->flags & IMA_VERITY_REQUIRED) { template_desc = entry->template ? entry->template : ima_template_desc_current(); check_template_field(template_desc, "d-ngv2", "verity rules should include d-ngv2"); } audit_log_format(ab, "res=%d", !result); audit_log_end(ab); return result; } /** * ima_parse_add_rule - add a rule to ima_policy_rules * @rule: ima measurement policy rule * * Avoid locking by allowing just one writer at a time in ima_write_policy() * Returns the length of the rule parsed, an error code on failure */ ssize_t ima_parse_add_rule(char *rule) { static const char op[] = "update_policy"; char *p; struct ima_rule_entry *entry; ssize_t result, len; int audit_info = 0; p = strsep(&rule, "\n"); len = strlen(p) + 1; p += strspn(p, " \t"); if (*p == '#' || *p == '\0') return len; entry = kzalloc(sizeof(*entry), GFP_KERNEL); if (!entry) { integrity_audit_msg(AUDIT_INTEGRITY_STATUS, NULL, NULL, op, "-ENOMEM", -ENOMEM, audit_info); return -ENOMEM; } INIT_LIST_HEAD(&entry->list); result = ima_parse_rule(p, entry); if (result) { ima_free_rule(entry); integrity_audit_msg(AUDIT_INTEGRITY_STATUS, NULL, NULL, op, "invalid-policy", result, audit_info); return result; } list_add_tail(&entry->list, &ima_temp_rules); return len; } /** * ima_delete_rules() - called to cleanup invalid in-flight policy. * * We don't need locking as we operate on the temp list, which is * different from the active one. There is also only one user of * ima_delete_rules() at a time. */ void ima_delete_rules(void) { struct ima_rule_entry *entry, *tmp; temp_ima_appraise = 0; list_for_each_entry_safe(entry, tmp, &ima_temp_rules, list) { list_del(&entry->list); ima_free_rule(entry); } } #define __ima_hook_stringify(func, str) (#func), const char *const func_tokens[] = { __ima_hooks(__ima_hook_stringify) }; #ifdef CONFIG_IMA_READ_POLICY enum { mask_exec = 0, mask_write, mask_read, mask_append }; static const char *const mask_tokens[] = { "^MAY_EXEC", "^MAY_WRITE", "^MAY_READ", "^MAY_APPEND" }; void *ima_policy_start(struct seq_file *m, loff_t *pos) { loff_t l = *pos; struct ima_rule_entry *entry; struct list_head *ima_rules_tmp; rcu_read_lock(); ima_rules_tmp = rcu_dereference(ima_rules); list_for_each_entry_rcu(entry, ima_rules_tmp, list) { if (!l--) { rcu_read_unlock(); return entry; } } rcu_read_unlock(); return NULL; } void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos) { struct ima_rule_entry *entry = v; rcu_read_lock(); entry = list_entry_rcu(entry->list.next, struct ima_rule_entry, list); rcu_read_unlock(); (*pos)++; return (&entry->list == &ima_default_rules || &entry->list == &ima_policy_rules) ? NULL : entry; } void ima_policy_stop(struct seq_file *m, void *v) { } #define pt(token) policy_tokens[token].pattern #define mt(token) mask_tokens[token] /* * policy_func_show - display the ima_hooks policy rule */ static void policy_func_show(struct seq_file *m, enum ima_hooks func) { if (func > 0 && func < MAX_CHECK) seq_printf(m, "func=%s ", func_tokens[func]); else seq_printf(m, "func=%d ", func); } static void ima_show_rule_opt_list(struct seq_file *m, const struct ima_rule_opt_list *opt_list) { size_t i; for (i = 0; i < opt_list->count; i++) seq_printf(m, "%s%s", i ? "|" : "", opt_list->items[i]); } static void ima_policy_show_appraise_algos(struct seq_file *m, unsigned int allowed_hashes) { int idx, list_size = 0; for (idx = 0; idx < HASH_ALGO__LAST; idx++) { if (!(allowed_hashes & (1U << idx))) continue; /* only add commas if the list contains multiple entries */ if (list_size++) seq_puts(m, ","); seq_puts(m, hash_algo_name[idx]); } } int ima_policy_show(struct seq_file *m, void *v) { struct ima_rule_entry *entry = v; int i; char tbuf[64] = {0,}; int offset = 0; rcu_read_lock(); /* Do not print rules with inactive LSM labels */ for (i = 0; i < MAX_LSM_RULES; i++) { if (entry->lsm[i].args_p && !entry->lsm[i].rule) { rcu_read_unlock(); return 0; } } if (entry->action & MEASURE) seq_puts(m, pt(Opt_measure)); if (entry->action & DONT_MEASURE) seq_puts(m, pt(Opt_dont_measure)); if (entry->action & APPRAISE) seq_puts(m, pt(Opt_appraise)); if (entry->action & DONT_APPRAISE) seq_puts(m, pt(Opt_dont_appraise)); if (entry->action & AUDIT) seq_puts(m, pt(Opt_audit)); if (entry->action & HASH) seq_puts(m, pt(Opt_hash)); if (entry->action & DONT_HASH) seq_puts(m, pt(Opt_dont_hash)); seq_puts(m, " "); if (entry->flags & IMA_FUNC) policy_func_show(m, entry->func); if ((entry->flags & IMA_MASK) || (entry->flags & IMA_INMASK)) { if (entry->flags & IMA_MASK) offset = 1; if (entry->mask & MAY_EXEC) seq_printf(m, pt(Opt_mask), mt(mask_exec) + offset); if (entry->mask & MAY_WRITE) seq_printf(m, pt(Opt_mask), mt(mask_write) + offset); if (entry->mask & MAY_READ) seq_printf(m, pt(Opt_mask), mt(mask_read) + offset); if (entry->mask & MAY_APPEND) seq_printf(m, pt(Opt_mask), mt(mask_append) + offset); seq_puts(m, " "); } if (entry->flags & IMA_FSMAGIC) { snprintf(tbuf, sizeof(tbuf), "0x%lx", entry->fsmagic); seq_printf(m, pt(Opt_fsmagic), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_FSNAME) { snprintf(tbuf, sizeof(tbuf), "%s", entry->fsname); seq_printf(m, pt(Opt_fsname), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_KEYRINGS) { seq_puts(m, "keyrings="); ima_show_rule_opt_list(m, entry->keyrings); seq_puts(m, " "); } if (entry->flags & IMA_LABEL) { seq_puts(m, "label="); ima_show_rule_opt_list(m, entry->label); seq_puts(m, " "); } if (entry->flags & IMA_PCR) { snprintf(tbuf, sizeof(tbuf), "%d", entry->pcr); seq_printf(m, pt(Opt_pcr), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_FSUUID) { seq_printf(m, "fsuuid=%pU", &entry->fsuuid); seq_puts(m, " "); } if (entry->flags & IMA_UID) { snprintf(tbuf, sizeof(tbuf), "%d", __kuid_val(entry->uid)); if (entry->uid_op == &uid_gt) seq_printf(m, pt(Opt_uid_gt), tbuf); else if (entry->uid_op == &uid_lt) seq_printf(m, pt(Opt_uid_lt), tbuf); else seq_printf(m, pt(Opt_uid_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_EUID) { snprintf(tbuf, sizeof(tbuf), "%d", __kuid_val(entry->uid)); if (entry->uid_op == &uid_gt) seq_printf(m, pt(Opt_euid_gt), tbuf); else if (entry->uid_op == &uid_lt) seq_printf(m, pt(Opt_euid_lt), tbuf); else seq_printf(m, pt(Opt_euid_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_GID) { snprintf(tbuf, sizeof(tbuf), "%d", __kgid_val(entry->gid)); if (entry->gid_op == &gid_gt) seq_printf(m, pt(Opt_gid_gt), tbuf); else if (entry->gid_op == &gid_lt) seq_printf(m, pt(Opt_gid_lt), tbuf); else seq_printf(m, pt(Opt_gid_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_EGID) { snprintf(tbuf, sizeof(tbuf), "%d", __kgid_val(entry->gid)); if (entry->gid_op == &gid_gt) seq_printf(m, pt(Opt_egid_gt), tbuf); else if (entry->gid_op == &gid_lt) seq_printf(m, pt(Opt_egid_lt), tbuf); else seq_printf(m, pt(Opt_egid_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_FOWNER) { snprintf(tbuf, sizeof(tbuf), "%d", __kuid_val(entry->fowner)); if (entry->fowner_op == &vfsuid_gt_kuid) seq_printf(m, pt(Opt_fowner_gt), tbuf); else if (entry->fowner_op == &vfsuid_lt_kuid) seq_printf(m, pt(Opt_fowner_lt), tbuf); else seq_printf(m, pt(Opt_fowner_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_FGROUP) { snprintf(tbuf, sizeof(tbuf), "%d", __kgid_val(entry->fgroup)); if (entry->fgroup_op == &vfsgid_gt_kgid) seq_printf(m, pt(Opt_fgroup_gt), tbuf); else if (entry->fgroup_op == &vfsgid_lt_kgid) seq_printf(m, pt(Opt_fgroup_lt), tbuf); else seq_printf(m, pt(Opt_fgroup_eq), tbuf); seq_puts(m, " "); } if (entry->flags & IMA_VALIDATE_ALGOS) { seq_puts(m, "appraise_algos="); ima_policy_show_appraise_algos(m, entry->allowed_algos); seq_puts(m, " "); } for (i = 0; i < MAX_LSM_RULES; i++) { if (entry->lsm[i].rule) { switch (i) { case LSM_OBJ_USER: seq_printf(m, pt(Opt_obj_user), entry->lsm[i].args_p); break; case LSM_OBJ_ROLE: seq_printf(m, pt(Opt_obj_role), entry->lsm[i].args_p); break; case LSM_OBJ_TYPE: seq_printf(m, pt(Opt_obj_type), entry->lsm[i].args_p); break; case LSM_SUBJ_USER: seq_printf(m, pt(Opt_subj_user), entry->lsm[i].args_p); break; case LSM_SUBJ_ROLE: seq_printf(m, pt(Opt_subj_role), entry->lsm[i].args_p); break; case LSM_SUBJ_TYPE: seq_printf(m, pt(Opt_subj_type), entry->lsm[i].args_p); break; } seq_puts(m, " "); } } if (entry->template) seq_printf(m, "template=%s ", entry->template->name); if (entry->flags & IMA_DIGSIG_REQUIRED) { if (entry->flags & IMA_VERITY_REQUIRED) seq_puts(m, "appraise_type=sigv3 "); else if (entry->flags & IMA_MODSIG_ALLOWED) seq_puts(m, "appraise_type=imasig|modsig "); else seq_puts(m, "appraise_type=imasig "); } if (entry->flags & IMA_VERITY_REQUIRED) seq_puts(m, "digest_type=verity "); if (entry->flags & IMA_PERMIT_DIRECTIO) seq_puts(m, "permit_directio "); rcu_read_unlock(); seq_puts(m, "\n"); return 0; } #endif /* CONFIG_IMA_READ_POLICY */ #if defined(CONFIG_IMA_APPRAISE) && defined(CONFIG_INTEGRITY_TRUSTED_KEYRING) /* * ima_appraise_signature: whether IMA will appraise a given function using * an IMA digital signature. This is restricted to cases where the kernel * has a set of built-in trusted keys in order to avoid an attacker simply * loading additional keys. */ bool ima_appraise_signature(enum kernel_read_file_id id) { struct ima_rule_entry *entry; bool found = false; enum ima_hooks func; struct list_head *ima_rules_tmp; if (id >= READING_MAX_ID) return false; if (id == READING_KEXEC_IMAGE && !(ima_appraise & IMA_APPRAISE_ENFORCE) && security_locked_down(LOCKDOWN_KEXEC)) return false; func = read_idmap[id] ?: FILE_CHECK; rcu_read_lock(); ima_rules_tmp = rcu_dereference(ima_rules); list_for_each_entry_rcu(entry, ima_rules_tmp, list) { if (entry->action != APPRAISE) continue; /* * A generic entry will match, but otherwise require that it * match the func we're looking for */ if (entry->func && entry->func != func) continue; /* * We require this to be a digital signature, not a raw IMA * hash. */ if (entry->flags & IMA_DIGSIG_REQUIRED) found = true; /* * We've found a rule that matches, so break now even if it * didn't require a digital signature - a later rule that does * won't override it, so would be a false positive. */ break; } rcu_read_unlock(); return found; } #endif /* CONFIG_IMA_APPRAISE && CONFIG_INTEGRITY_TRUSTED_KEYRING */ |
| 2 49 15 15 51 3 51 3 51 51 51 3 51 51 51 51 74 73 81 81 21 74 21 54 49 7 2 5 1 4 1 5 29 19 10 24 5 5 5 29 25 1 3 20 9 19 10 27 27 27 7 7 7 7 4 17 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 | // SPDX-License-Identifier: GPL-2.0-or-later /* * inet fragments management * * Authors: Pavel Emelyanov <xemul@openvz.org> * Started as consolidation of ipv4/ip_fragment.c, * ipv6/reassembly. and ipv6 nf conntrack reassembly */ #include <linux/list.h> #include <linux/spinlock.h> #include <linux/module.h> #include <linux/timer.h> #include <linux/mm.h> #include <linux/random.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/slab.h> #include <linux/rhashtable.h> #include <net/sock.h> #include <net/inet_frag.h> #include <net/inet_ecn.h> #include <net/ip.h> #include <net/ipv6.h> #include "../core/sock_destructor.h" /* Use skb->cb to track consecutive/adjacent fragments coming at * the end of the queue. Nodes in the rb-tree queue will * contain "runs" of one or more adjacent fragments. * * Invariants: * - next_frag is NULL at the tail of a "run"; * - the head of a "run" has the sum of all fragment lengths in frag_run_len. */ struct ipfrag_skb_cb { union { struct inet_skb_parm h4; struct inet6_skb_parm h6; }; struct sk_buff *next_frag; int frag_run_len; int ip_defrag_offset; }; #define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) static void fragcb_clear(struct sk_buff *skb) { RB_CLEAR_NODE(&skb->rbnode); FRAG_CB(skb)->next_frag = NULL; FRAG_CB(skb)->frag_run_len = skb->len; } /* Append skb to the last "run". */ static void fragrun_append_to_last(struct inet_frag_queue *q, struct sk_buff *skb) { fragcb_clear(skb); FRAG_CB(q->last_run_head)->frag_run_len += skb->len; FRAG_CB(q->fragments_tail)->next_frag = skb; q->fragments_tail = skb; } /* Create a new "run" with the skb. */ static void fragrun_create(struct inet_frag_queue *q, struct sk_buff *skb) { BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); fragcb_clear(skb); if (q->last_run_head) rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, &q->last_run_head->rbnode.rb_right); else rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); rb_insert_color(&skb->rbnode, &q->rb_fragments); q->fragments_tail = skb; q->last_run_head = skb; } /* Given the OR values of all fragments, apply RFC 3168 5.3 requirements * Value : 0xff if frame should be dropped. * 0 or INET_ECN_CE value, to be ORed in to final iph->tos field */ const u8 ip_frag_ecn_table[16] = { /* at least one fragment had CE, and others ECT_0 or ECT_1 */ [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0] = INET_ECN_CE, [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_1] = INET_ECN_CE, [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = INET_ECN_CE, /* invalid combinations : drop frame */ [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_0] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = 0xff, }; EXPORT_SYMBOL(ip_frag_ecn_table); int inet_frags_init(struct inet_frags *f) { f->frags_cachep = kmem_cache_create(f->frags_cache_name, f->qsize, 0, 0, NULL); if (!f->frags_cachep) return -ENOMEM; refcount_set(&f->refcnt, 1); init_completion(&f->completion); return 0; } EXPORT_SYMBOL(inet_frags_init); void inet_frags_fini(struct inet_frags *f) { if (refcount_dec_and_test(&f->refcnt)) complete(&f->completion); wait_for_completion(&f->completion); kmem_cache_destroy(f->frags_cachep); f->frags_cachep = NULL; } EXPORT_SYMBOL(inet_frags_fini); /* called from rhashtable_free_and_destroy() at netns_frags dismantle */ static void inet_frags_free_cb(void *ptr, void *arg) { struct inet_frag_queue *fq = ptr; int count; count = del_timer_sync(&fq->timer) ? 1 : 0; spin_lock_bh(&fq->lock); fq->flags |= INET_FRAG_DROP; if (!(fq->flags & INET_FRAG_COMPLETE)) { fq->flags |= INET_FRAG_COMPLETE; count++; } else if (fq->flags & INET_FRAG_HASH_DEAD) { count++; } spin_unlock_bh(&fq->lock); if (refcount_sub_and_test(count, &fq->refcnt)) inet_frag_destroy(fq); } static LLIST_HEAD(fqdir_free_list); static void fqdir_free_fn(struct work_struct *work) { struct llist_node *kill_list; struct fqdir *fqdir, *tmp; struct inet_frags *f; /* Atomically snapshot the list of fqdirs to free */ kill_list = llist_del_all(&fqdir_free_list); /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) * have completed, since they need to dereference fqdir. * Would it not be nice to have kfree_rcu_barrier() ? :) */ rcu_barrier(); llist_for_each_entry_safe(fqdir, tmp, kill_list, free_list) { f = fqdir->f; if (refcount_dec_and_test(&f->refcnt)) complete(&f->completion); kfree(fqdir); } } static DECLARE_DELAYED_WORK(fqdir_free_work, fqdir_free_fn); static void fqdir_work_fn(struct work_struct *work) { struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); if (llist_add(&fqdir->free_list, &fqdir_free_list)) queue_delayed_work(system_wq, &fqdir_free_work, HZ); } int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f, struct net *net) { struct fqdir *fqdir = kzalloc(sizeof(*fqdir), GFP_KERNEL); int res; if (!fqdir) return -ENOMEM; fqdir->f = f; fqdir->net = net; res = rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params); if (res < 0) { kfree(fqdir); return res; } refcount_inc(&f->refcnt); *fqdirp = fqdir; return 0; } EXPORT_SYMBOL(fqdir_init); static struct workqueue_struct *inet_frag_wq; static int __init inet_frag_wq_init(void) { inet_frag_wq = create_workqueue("inet_frag_wq"); if (!inet_frag_wq) panic("Could not create inet frag workq"); return 0; } pure_initcall(inet_frag_wq_init); void fqdir_exit(struct fqdir *fqdir) { INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); queue_work(inet_frag_wq, &fqdir->destroy_work); } EXPORT_SYMBOL(fqdir_exit); void inet_frag_kill(struct inet_frag_queue *fq) { if (del_timer(&fq->timer)) refcount_dec(&fq->refcnt); if (!(fq->flags & INET_FRAG_COMPLETE)) { struct fqdir *fqdir = fq->fqdir; fq->flags |= INET_FRAG_COMPLETE; rcu_read_lock(); /* The RCU read lock provides a memory barrier * guaranteeing that if fqdir->dead is false then * the hash table destruction will not start until * after we unlock. Paired with fqdir_pre_exit(). */ if (!READ_ONCE(fqdir->dead)) { rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params); refcount_dec(&fq->refcnt); } else { fq->flags |= INET_FRAG_HASH_DEAD; } rcu_read_unlock(); } } EXPORT_SYMBOL(inet_frag_kill); static void inet_frag_destroy_rcu(struct rcu_head *head) { struct inet_frag_queue *q = container_of(head, struct inet_frag_queue, rcu); struct inet_frags *f = q->fqdir->f; if (f->destructor) f->destructor(q); kmem_cache_free(f->frags_cachep, q); } unsigned int inet_frag_rbtree_purge(struct rb_root *root, enum skb_drop_reason reason) { struct rb_node *p = rb_first(root); unsigned int sum = 0; while (p) { struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); p = rb_next(p); rb_erase(&skb->rbnode, root); while (skb) { struct sk_buff *next = FRAG_CB(skb)->next_frag; sum += skb->truesize; kfree_skb_reason(skb, reason); skb = next; } } return sum; } EXPORT_SYMBOL(inet_frag_rbtree_purge); void inet_frag_destroy(struct inet_frag_queue *q) { unsigned int sum, sum_truesize = 0; enum skb_drop_reason reason; struct inet_frags *f; struct fqdir *fqdir; WARN_ON(!(q->flags & INET_FRAG_COMPLETE)); reason = (q->flags & INET_FRAG_DROP) ? SKB_DROP_REASON_FRAG_REASM_TIMEOUT : SKB_CONSUMED; WARN_ON(del_timer(&q->timer) != 0); /* Release all fragment data. */ fqdir = q->fqdir; f = fqdir->f; sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments, reason); sum = sum_truesize + f->qsize; call_rcu(&q->rcu, inet_frag_destroy_rcu); sub_frag_mem_limit(fqdir, sum); } EXPORT_SYMBOL(inet_frag_destroy); static struct inet_frag_queue *inet_frag_alloc(struct fqdir *fqdir, struct inet_frags *f, void *arg) { struct inet_frag_queue *q; q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC); if (!q) return NULL; q->fqdir = fqdir; f->constructor(q, arg); add_frag_mem_limit(fqdir, f->qsize); timer_setup(&q->timer, f->frag_expire, 0); spin_lock_init(&q->lock); refcount_set(&q->refcnt, 3); return q; } static struct inet_frag_queue *inet_frag_create(struct fqdir *fqdir, void *arg, struct inet_frag_queue **prev) { struct inet_frags *f = fqdir->f; struct inet_frag_queue *q; q = inet_frag_alloc(fqdir, f, arg); if (!q) { *prev = ERR_PTR(-ENOMEM); return NULL; } mod_timer(&q->timer, jiffies + fqdir->timeout); *prev = rhashtable_lookup_get_insert_key(&fqdir->rhashtable, &q->key, &q->node, f->rhash_params); if (*prev) { q->flags |= INET_FRAG_COMPLETE; inet_frag_kill(q); inet_frag_destroy(q); return NULL; } return q; } /* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() */ struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key) { /* This pairs with WRITE_ONCE() in fqdir_pre_exit(). */ long high_thresh = READ_ONCE(fqdir->high_thresh); struct inet_frag_queue *fq = NULL, *prev; if (!high_thresh || frag_mem_limit(fqdir) > high_thresh) return NULL; rcu_read_lock(); prev = rhashtable_lookup(&fqdir->rhashtable, key, fqdir->f->rhash_params); if (!prev) fq = inet_frag_create(fqdir, key, &prev); if (!IS_ERR_OR_NULL(prev)) { fq = prev; if (!refcount_inc_not_zero(&fq->refcnt)) fq = NULL; } rcu_read_unlock(); return fq; } EXPORT_SYMBOL(inet_frag_find); int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, int offset, int end) { struct sk_buff *last = q->fragments_tail; /* RFC5722, Section 4, amended by Errata ID : 3089 * When reassembling an IPv6 datagram, if * one or more its constituent fragments is determined to be an * overlapping fragment, the entire datagram (and any constituent * fragments) MUST be silently discarded. * * Duplicates, however, should be ignored (i.e. skb dropped, but the * queue/fragments kept for later reassembly). */ if (!last) fragrun_create(q, skb); /* First fragment. */ else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) { /* This is the common case: skb goes to the end. */ /* Detect and discard overlaps. */ if (offset < FRAG_CB(last)->ip_defrag_offset + last->len) return IPFRAG_OVERLAP; if (offset == FRAG_CB(last)->ip_defrag_offset + last->len) fragrun_append_to_last(q, skb); else fragrun_create(q, skb); } else { /* Binary search. Note that skb can become the first fragment, * but not the last (covered above). */ struct rb_node **rbn, *parent; rbn = &q->rb_fragments.rb_node; do { struct sk_buff *curr; int curr_run_end; parent = *rbn; curr = rb_to_skb(parent); curr_run_end = FRAG_CB(curr)->ip_defrag_offset + FRAG_CB(curr)->frag_run_len; if (end <= FRAG_CB(curr)->ip_defrag_offset) rbn = &parent->rb_left; else if (offset >= curr_run_end) rbn = &parent->rb_right; else if (offset >= FRAG_CB(curr)->ip_defrag_offset && end <= curr_run_end) return IPFRAG_DUP; else return IPFRAG_OVERLAP; } while (*rbn); /* Here we have parent properly set, and rbn pointing to * one of its NULL left/right children. Insert skb. */ fragcb_clear(skb); rb_link_node(&skb->rbnode, parent, rbn); rb_insert_color(&skb->rbnode, &q->rb_fragments); } FRAG_CB(skb)->ip_defrag_offset = offset; return IPFRAG_OK; } EXPORT_SYMBOL(inet_frag_queue_insert); void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, struct sk_buff *parent) { struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments); void (*destructor)(struct sk_buff *); unsigned int orig_truesize = 0; struct sk_buff **nextp = NULL; struct sock *sk = skb->sk; int delta; if (sk && is_skb_wmem(skb)) { /* TX: skb->sk might have been passed as argument to * dst->output and must remain valid until tx completes. * * Move sk to reassembled skb and fix up wmem accounting. */ orig_truesize = skb->truesize; destructor = skb->destructor; } if (head != skb) { fp = skb_clone(skb, GFP_ATOMIC); if (!fp) { head = skb; goto out_restore_sk; } FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; if (RB_EMPTY_NODE(&skb->rbnode)) FRAG_CB(parent)->next_frag = fp; else rb_replace_node(&skb->rbnode, &fp->rbnode, &q->rb_fragments); if (q->fragments_tail == skb) q->fragments_tail = fp; if (orig_truesize) { /* prevent skb_morph from releasing sk */ skb->sk = NULL; skb->destructor = NULL; } skb_morph(skb, head); FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; rb_replace_node(&head->rbnode, &skb->rbnode, &q->rb_fragments); consume_skb(head); head = skb; } WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0); delta = -head->truesize; /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) goto out_restore_sk; delta += head->truesize; if (delta) add_frag_mem_limit(q->fqdir, delta); /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */ if (skb_has_frag_list(head)) { struct sk_buff *clone; int i, plen = 0; clone = alloc_skb(0, GFP_ATOMIC); if (!clone) goto out_restore_sk; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) plen += skb_frag_size(&skb_shinfo(head)->frags[i]); clone->data_len = head->data_len - plen; clone->len = clone->data_len; head->truesize += clone->truesize; clone->csum = 0; clone->ip_summed = head->ip_summed; add_frag_mem_limit(q->fqdir, clone->truesize); skb_shinfo(head)->frag_list = clone; nextp = &clone->next; } else { nextp = &skb_shinfo(head)->frag_list; } out_restore_sk: if (orig_truesize) { int ts_delta = head->truesize - orig_truesize; /* if this reassembled skb is fragmented later, * fraglist skbs will get skb->sk assigned from head->sk, * and each frag skb will be released via sock_wfree. * * Update sk_wmem_alloc. */ head->sk = sk; head->destructor = destructor; refcount_add(ts_delta, &sk->sk_wmem_alloc); } return nextp; } EXPORT_SYMBOL(inet_frag_reasm_prepare); void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, void *reasm_data, bool try_coalesce) { struct sock *sk = is_skb_wmem(head) ? head->sk : NULL; const unsigned int head_truesize = head->truesize; struct sk_buff **nextp = reasm_data; struct rb_node *rbn; struct sk_buff *fp; int sum_truesize; skb_push(head, head->data - skb_network_header(head)); /* Traverse the tree in order, to build frag_list. */ fp = FRAG_CB(head)->next_frag; rbn = rb_next(&head->rbnode); rb_erase(&head->rbnode, &q->rb_fragments); sum_truesize = head->truesize; while (rbn || fp) { /* fp points to the next sk_buff in the current run; * rbn points to the next run. */ /* Go through the current run. */ while (fp) { struct sk_buff *next_frag = FRAG_CB(fp)->next_frag; bool stolen; int delta; sum_truesize += fp->truesize; if (head->ip_summed != fp->ip_summed) head->ip_summed = CHECKSUM_NONE; else if (head->ip_summed == CHECKSUM_COMPLETE) head->csum = csum_add(head->csum, fp->csum); if (try_coalesce && skb_try_coalesce(head, fp, &stolen, &delta)) { kfree_skb_partial(fp, stolen); } else { fp->prev = NULL; memset(&fp->rbnode, 0, sizeof(fp->rbnode)); fp->sk = NULL; head->data_len += fp->len; head->len += fp->len; head->truesize += fp->truesize; *nextp = fp; nextp = &fp->next; } fp = next_frag; } /* Move to the next run. */ if (rbn) { struct rb_node *rbnext = rb_next(rbn); fp = rb_to_skb(rbn); rb_erase(rbn, &q->rb_fragments); rbn = rbnext; } } sub_frag_mem_limit(q->fqdir, sum_truesize); *nextp = NULL; skb_mark_not_on_list(head); head->prev = NULL; head->tstamp = q->stamp; head->tstamp_type = q->tstamp_type; if (sk) refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc); } EXPORT_SYMBOL(inet_frag_reasm_finish); struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q) { struct sk_buff *head, *skb; head = skb_rb_first(&q->rb_fragments); if (!head) return NULL; skb = FRAG_CB(head)->next_frag; if (skb) rb_replace_node(&head->rbnode, &skb->rbnode, &q->rb_fragments); else rb_erase(&head->rbnode, &q->rb_fragments); memset(&head->rbnode, 0, sizeof(head->rbnode)); barrier(); if (head == q->fragments_tail) q->fragments_tail = NULL; sub_frag_mem_limit(q->fqdir, head->truesize); return head; } EXPORT_SYMBOL(inet_frag_pull_head); |
| 9 111 71 114 8 22 3 357 22 357 15455 886 24 372 354 356 8 297 4 4 4 197 198 197 465 457 7 455 16 17 440 22 17 17 329 757 27 44 18333 47 83 80 31 2 83 83 720 719 719 42 78 516 281 50 21 26 257 78 35 198 99 65 198 19 3 4 14 261 231 324 605 63 600 692 217 238 14077 218 10055 601 17 149 15737 114 14351 133 318 90 1 4 90 322 25 9 11 28 3 195 2 2 2 2 193 268 1430 97 269 364 175 175 170 114 222 368 2956 630 4 1202 1203 47 47 16 17 17 2327 39 249 25 330 22 154 325 16 14131 56 1 649 182 1178 12358 212 9 681 4 2 68 18 65 152 43 1 77 202 867 29 10 8 906 15 591 1 349 11 722 7 20 10934 10940 6 7 74 67 9 189 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the AF_INET socket handler. * * Version: @(#)sock.h 1.0.4 05/13/93 * * Authors: Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> * Corey Minyard <wf-rch!minyard@relay.EU.net> * Florian La Roche <flla@stud.uni-sb.de> * * Fixes: * Alan Cox : Volatiles in skbuff pointers. See * skbuff comments. May be overdone, * better to prove they can be removed * than the reverse. * Alan Cox : Added a zapped field for tcp to note * a socket is reset and must stay shut up * Alan Cox : New fields for options * Pauline Middelink : identd support * Alan Cox : Eliminate low level recv/recvfrom * David S. Miller : New socket lookup architecture. * Steve Whitehouse: Default routines for sock_ops * Arnaldo C. Melo : removed net_pinfo, tp_pinfo and made * protinfo be just a void pointer, as the * protocol specific parts were moved to * respective headers and ipv4/v6, etc now * use private slabcaches for its socks * Pedro Hortas : New flags field for socket options */ #ifndef _SOCK_H #define _SOCK_H #include <linux/hardirq.h> #include <linux/kernel.h> #include <linux/list.h> #include <linux/list_nulls.h> #include <linux/timer.h> #include <linux/cache.h> #include <linux/bitops.h> #include <linux/lockdep.h> #include <linux/netdevice.h> #include <linux/skbuff.h> /* struct sk_buff */ #include <linux/mm.h> #include <linux/security.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/page_counter.h> #include <linux/memcontrol.h> #include <linux/static_key.h> #include <linux/sched.h> #include <linux/wait.h> #include <linux/cgroup-defs.h> #include <linux/rbtree.h> #include <linux/rculist_nulls.h> #include <linux/poll.h> #include <linux/sockptr.h> #include <linux/indirect_call_wrapper.h> #include <linux/atomic.h> #include <linux/refcount.h> #include <linux/llist.h> #include <net/dst.h> #include <net/checksum.h> #include <net/tcp_states.h> #include <linux/net_tstamp.h> #include <net/l3mdev.h> #include <uapi/linux/socket.h> /* * This structure really needs to be cleaned up. * Most of it is for TCP, and not used by any of * the other protocols. */ /* This is the per-socket lock. The spinlock provides a synchronization * between user contexts and software interrupt processing, whereas the * mini-semaphore synchronizes multiple users amongst themselves. */ typedef struct { spinlock_t slock; int owned; wait_queue_head_t wq; /* * We express the mutex-alike socket_lock semantics * to the lock validator by explicitly managing * the slock as a lock variant (in addition to * the slock itself): */ #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif } socket_lock_t; struct sock; struct proto; struct net; typedef __u32 __bitwise __portpair; typedef __u64 __bitwise __addrpair; /** * struct sock_common - minimal network layer representation of sockets * @skc_daddr: Foreign IPv4 addr * @skc_rcv_saddr: Bound local IPv4 addr * @skc_addrpair: 8-byte-aligned __u64 union of @skc_daddr & @skc_rcv_saddr * @skc_hash: hash value used with various protocol lookup tables * @skc_u16hashes: two u16 hash values used by UDP lookup tables * @skc_dport: placeholder for inet_dport/tw_dport * @skc_num: placeholder for inet_num/tw_num * @skc_portpair: __u32 union of @skc_dport & @skc_num * @skc_family: network address family * @skc_state: Connection state * @skc_reuse: %SO_REUSEADDR setting * @skc_reuseport: %SO_REUSEPORT setting * @skc_ipv6only: socket is IPV6 only * @skc_net_refcnt: socket is using net ref counting * @skc_bound_dev_if: bound device index if != 0 * @skc_bind_node: bind hash linkage for various protocol lookup tables * @skc_portaddr_node: second hash linkage for UDP/UDP-Lite protocol * @skc_prot: protocol handlers inside a network family * @skc_net: reference to the network namespace of this socket * @skc_v6_daddr: IPV6 destination address * @skc_v6_rcv_saddr: IPV6 source address * @skc_cookie: socket's cookie value * @skc_node: main hash linkage for various protocol lookup tables * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol * @skc_tx_queue_mapping: tx queue number for this connection * @skc_rx_queue_mapping: rx queue number for this connection * @skc_flags: place holder for sk_flags * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @skc_listener: connection request listener socket (aka rsk_listener) * [union with @skc_flags] * @skc_tw_dr: (aka tw_dr) ptr to &struct inet_timewait_death_row * [union with @skc_flags] * @skc_incoming_cpu: record/match cpu processing incoming packets * @skc_rcv_wnd: (aka rsk_rcv_wnd) TCP receive window size (possibly scaled) * [union with @skc_incoming_cpu] * @skc_tw_rcv_nxt: (aka tw_rcv_nxt) TCP window next expected seq number * [union with @skc_incoming_cpu] * @skc_refcnt: reference count * * This is the minimal network layer representation of sockets, the header * for struct sock and struct inet_timewait_sock. */ struct sock_common { union { __addrpair skc_addrpair; struct { __be32 skc_daddr; __be32 skc_rcv_saddr; }; }; union { unsigned int skc_hash; __u16 skc_u16hashes[2]; }; /* skc_dport && skc_num must be grouped as well */ union { __portpair skc_portpair; struct { __be16 skc_dport; __u16 skc_num; }; }; unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse:4; unsigned char skc_reuseport:1; unsigned char skc_ipv6only:1; unsigned char skc_net_refcnt:1; int skc_bound_dev_if; union { struct hlist_node skc_bind_node; struct hlist_node skc_portaddr_node; }; struct proto *skc_prot; possible_net_t skc_net; #if IS_ENABLED(CONFIG_IPV6) struct in6_addr skc_v6_daddr; struct in6_addr skc_v6_rcv_saddr; #endif atomic64_t skc_cookie; /* following fields are padding to force * offset(struct sock, sk_refcnt) == 128 on 64bit arches * assuming IPV6 is enabled. We use this padding differently * for different kind of 'sockets' */ union { unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; /* * fields between dontcopy_begin/dontcopy_end * are not copied in sock_copy() */ /* private: */ int skc_dontcopy_begin[0]; /* public: */ union { struct hlist_node skc_node; struct hlist_nulls_node skc_nulls_node; }; unsigned short skc_tx_queue_mapping; #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING unsigned short skc_rx_queue_mapping; #endif union { int skc_incoming_cpu; u32 skc_rcv_wnd; u32 skc_tw_rcv_nxt; /* struct tcp_timewait_sock */ }; refcount_t skc_refcnt; /* private: */ int skc_dontcopy_end[0]; union { u32 skc_rxhash; u32 skc_window_clamp; u32 skc_tw_snd_nxt; /* struct tcp_timewait_sock */ }; /* public: */ }; struct bpf_local_storage; struct sk_filter; /** * struct sock - network layer representation of sockets * @__sk_common: shared layout with inet_timewait_sock * @sk_shutdown: mask of %SEND_SHUTDOWN and/or %RCV_SHUTDOWN * @sk_userlocks: %SO_SNDBUF and %SO_RCVBUF settings * @sk_lock: synchronizer * @sk_kern_sock: True if sock is using kernel lock classes * @sk_rcvbuf: size of receive buffer in bytes * @sk_wq: sock wait queue and async head * @sk_rx_dst: receive input route used by early demux * @sk_rx_dst_ifindex: ifindex for @sk_rx_dst * @sk_rx_dst_cookie: cookie for @sk_rx_dst * @sk_dst_cache: destination cache * @sk_dst_pending_confirm: need to confirm neighbour * @sk_policy: flow policy * @sk_receive_queue: incoming packets * @sk_wmem_alloc: transmit queue bytes committed * @sk_tsq_flags: TCP Small Queues flags * @sk_write_queue: Packet sending queue * @sk_omem_alloc: "o" is "option" or "other" * @sk_wmem_queued: persistent queue size * @sk_forward_alloc: space allocated forward * @sk_reserved_mem: space reserved and non-reclaimable for the socket * @sk_napi_id: id of the last napi context to receive data for sk * @sk_ll_usec: usecs to busypoll when there is no data * @sk_allocation: allocation mode * @sk_pacing_rate: Pacing rate (if supported by transport/packet scheduler) * @sk_pacing_status: Pacing status (requested, handled by sch_fq) * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) * @sk_sndbuf: size of send buffer in bytes * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets * @sk_no_check_rx: allow zero checksum in RX packets * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) * @sk_gso_disabled: if set, NETIF_F_GSO_MASK is forbidden. * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) * @sk_gso_max_size: Maximum GSO segment size to build * @sk_gso_max_segs: Maximum number of GSO segments * @sk_pacing_shift: scaling factor for TCP Small Queues * @sk_lingertime: %SO_LINGER l_linger setting * @sk_backlog: always used with the per-socket spinlock held * @sk_callback_lock: used with the callbacks in the end of this struct * @sk_error_queue: rarely used * @sk_prot_creator: sk_prot of original sock creator (see ipv6_setsockopt, * IPV6_ADDRFORM for instance) * @sk_err: last error * @sk_err_soft: errors that don't cause failure but are the cause of a * persistent failure not just 'timed out' * @sk_drops: raw/udp drops counter * @sk_ack_backlog: current listen backlog * @sk_max_ack_backlog: listen backlog set in listen() * @sk_uid: user id of owner * @sk_prefer_busy_poll: prefer busypolling over softirq processing * @sk_busy_poll_budget: napi processing budget when busypolling * @sk_priority: %SO_PRIORITY setting * @sk_type: socket type (%SOCK_STREAM, etc) * @sk_protocol: which protocol this socket belongs in this network family * @sk_peer_lock: lock protecting @sk_peer_pid and @sk_peer_cred * @sk_peer_pid: &struct pid for this socket's peer * @sk_peer_cred: %SO_PEERCRED setting * @sk_rcvlowat: %SO_RCVLOWAT setting * @sk_rcvtimeo: %SO_RCVTIMEO setting * @sk_sndtimeo: %SO_SNDTIMEO setting * @sk_txhash: computed flow hash for use on transmit * @sk_txrehash: enable TX hash rethink * @sk_filter: socket filtering instructions * @sk_timer: sock cleanup timer * @sk_stamp: time stamp of last packet received * @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only * @sk_tsflags: SO_TIMESTAMPING flags * @sk_use_task_frag: allow sk_page_frag() to use current->task_frag. * Sockets that can be used under memory reclaim should * set this to false. * @sk_bind_phc: SO_TIMESTAMPING bind PHC index of PTP virtual clock * for timestamping * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit * @tcp_rtx_queue: TCP re-transmit queue [union with @sk_send_head] * @sk_security: used by security modules * @sk_mark: generic packet mark * @sk_cgrp_data: cgroup data for this cgroup * @sk_memcg: this socket's memory cgroup association * @sk_write_pending: a write to stream socket waits to start * @sk_disconnects: number of disconnect operations performed on this sock * @sk_state_change: callback to indicate change in the state of the sock * @sk_data_ready: callback to indicate there is data to be processed * @sk_write_space: callback to indicate there is bf sending space available * @sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE) * @sk_backlog_rcv: callback to process the backlog * @sk_validate_xmit_skb: ptr to an optional validate function * @sk_destruct: called at sock freeing time, i.e. when all refcnt == 0 * @sk_reuseport_cb: reuseport group container * @sk_bpf_storage: ptr to cache and control for bpf_sk_storage * @sk_rcu: used during RCU grace period * @sk_clockid: clockid used by time-based scheduling (SO_TXTIME) * @sk_txtime_deadline_mode: set deadline mode for SO_TXTIME * @sk_txtime_report_errors: set report errors mode for SO_TXTIME * @sk_txtime_unused: unused txtime flags * @ns_tracker: tracker for netns reference */ struct sock { /* * Now struct inet_timewait_sock also uses sock_common, so please just * don't add nothing before this first member (__sk_common) --acme */ struct sock_common __sk_common; #define sk_node __sk_common.skc_node #define sk_nulls_node __sk_common.skc_nulls_node #define sk_refcnt __sk_common.skc_refcnt #define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING #define sk_rx_queue_mapping __sk_common.skc_rx_queue_mapping #endif #define sk_dontcopy_begin __sk_common.skc_dontcopy_begin #define sk_dontcopy_end __sk_common.skc_dontcopy_end #define sk_hash __sk_common.skc_hash #define sk_portpair __sk_common.skc_portpair #define sk_num __sk_common.skc_num #define sk_dport __sk_common.skc_dport #define sk_addrpair __sk_common.skc_addrpair #define sk_daddr __sk_common.skc_daddr #define sk_rcv_saddr __sk_common.skc_rcv_saddr #define sk_family __sk_common.skc_family #define sk_state __sk_common.skc_state #define sk_reuse __sk_common.skc_reuse #define sk_reuseport __sk_common.skc_reuseport #define sk_ipv6only __sk_common.skc_ipv6only #define sk_net_refcnt __sk_common.skc_net_refcnt #define sk_bound_dev_if __sk_common.skc_bound_dev_if #define sk_bind_node __sk_common.skc_bind_node #define sk_prot __sk_common.skc_prot #define sk_net __sk_common.skc_net #define sk_v6_daddr __sk_common.skc_v6_daddr #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr #define sk_cookie __sk_common.skc_cookie #define sk_incoming_cpu __sk_common.skc_incoming_cpu #define sk_flags __sk_common.skc_flags #define sk_rxhash __sk_common.skc_rxhash __cacheline_group_begin(sock_write_rx); atomic_t sk_drops; __s32 sk_peek_off; struct sk_buff_head sk_error_queue; struct sk_buff_head sk_receive_queue; /* * The backlog queue is special, it is always used with * the per-socket spinlock held and requires low latency * access. Therefore we special case it's implementation. * Note : rmem_alloc is in this structure to fill a hole * on 64bit arches, not because its logically part of * backlog. */ struct { atomic_t rmem_alloc; int len; struct sk_buff *head; struct sk_buff *tail; } sk_backlog; #define sk_rmem_alloc sk_backlog.rmem_alloc __cacheline_group_end(sock_write_rx); __cacheline_group_begin(sock_read_rx); /* early demux fields */ struct dst_entry __rcu *sk_rx_dst; int sk_rx_dst_ifindex; u32 sk_rx_dst_cookie; #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sk_ll_usec; unsigned int sk_napi_id; u16 sk_busy_poll_budget; u8 sk_prefer_busy_poll; #endif u8 sk_userlocks; int sk_rcvbuf; struct sk_filter __rcu *sk_filter; union { struct socket_wq __rcu *sk_wq; /* private: */ struct socket_wq *sk_wq_raw; /* public: */ }; void (*sk_data_ready)(struct sock *sk); long sk_rcvtimeo; int sk_rcvlowat; __cacheline_group_end(sock_read_rx); __cacheline_group_begin(sock_read_rxtx); int sk_err; struct socket *sk_socket; struct mem_cgroup *sk_memcg; #ifdef CONFIG_XFRM struct xfrm_policy __rcu *sk_policy[2]; #endif __cacheline_group_end(sock_read_rxtx); __cacheline_group_begin(sock_write_rxtx); socket_lock_t sk_lock; u32 sk_reserved_mem; int sk_forward_alloc; u32 sk_tsflags; __cacheline_group_end(sock_write_rxtx); __cacheline_group_begin(sock_write_tx); int sk_write_pending; atomic_t sk_omem_alloc; int sk_sndbuf; int sk_wmem_queued; refcount_t sk_wmem_alloc; unsigned long sk_tsq_flags; union { struct sk_buff *sk_send_head; struct rb_root tcp_rtx_queue; }; struct sk_buff_head sk_write_queue; u32 sk_dst_pending_confirm; u32 sk_pacing_status; /* see enum sk_pacing */ struct page_frag sk_frag; struct timer_list sk_timer; unsigned long sk_pacing_rate; /* bytes per second */ atomic_t sk_zckey; atomic_t sk_tskey; __cacheline_group_end(sock_write_tx); __cacheline_group_begin(sock_read_tx); unsigned long sk_max_pacing_rate; long sk_sndtimeo; u32 sk_priority; u32 sk_mark; struct dst_entry __rcu *sk_dst_cache; netdev_features_t sk_route_caps; #ifdef CONFIG_SOCK_VALIDATE_XMIT struct sk_buff* (*sk_validate_xmit_skb)(struct sock *sk, struct net_device *dev, struct sk_buff *skb); #endif u16 sk_gso_type; u16 sk_gso_max_segs; unsigned int sk_gso_max_size; gfp_t sk_allocation; u32 sk_txhash; u8 sk_pacing_shift; bool sk_use_task_frag; __cacheline_group_end(sock_read_tx); /* * Because of non atomicity rules, all * changes are protected by socket lock. */ u8 sk_gso_disabled : 1, sk_kern_sock : 1, sk_no_check_tx : 1, sk_no_check_rx : 1; u8 sk_shutdown; u16 sk_type; u16 sk_protocol; unsigned long sk_lingertime; struct proto *sk_prot_creator; rwlock_t sk_callback_lock; int sk_err_soft; u32 sk_ack_backlog; u32 sk_max_ack_backlog; kuid_t sk_uid; spinlock_t sk_peer_lock; int sk_bind_phc; struct pid *sk_peer_pid; const struct cred *sk_peer_cred; ktime_t sk_stamp; #if BITS_PER_LONG==32 seqlock_t sk_stamp_seq; #endif int sk_disconnects; u8 sk_txrehash; u8 sk_clockid; u8 sk_txtime_deadline_mode : 1, sk_txtime_report_errors : 1, sk_txtime_unused : 6; void *sk_user_data; #ifdef CONFIG_SECURITY void *sk_security; #endif struct sock_cgroup_data sk_cgrp_data; void (*sk_state_change)(struct sock *sk); void (*sk_write_space)(struct sock *sk); void (*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); struct sock_reuseport __rcu *sk_reuseport_cb; #ifdef CONFIG_BPF_SYSCALL struct bpf_local_storage __rcu *sk_bpf_storage; #endif struct rcu_head sk_rcu; netns_tracker ns_tracker; }; struct sock_bh_locked { struct sock *sock; local_lock_t bh_lock; }; enum sk_pacing { SK_PACING_NONE = 0, SK_PACING_NEEDED = 1, SK_PACING_FQ = 2, }; /* flag bits in sk_user_data * * - SK_USER_DATA_NOCOPY: Pointer stored in sk_user_data might * not be suitable for copying when cloning the socket. For instance, * it can point to a reference counted object. sk_user_data bottom * bit is set if pointer must not be copied. * * - SK_USER_DATA_BPF: Mark whether sk_user_data field is * managed/owned by a BPF reuseport array. This bit should be set * when sk_user_data's sk is added to the bpf's reuseport_array. * * - SK_USER_DATA_PSOCK: Mark whether pointer stored in * sk_user_data points to psock type. This bit should be set * when sk_user_data is assigned to a psock object. */ #define SK_USER_DATA_NOCOPY 1UL #define SK_USER_DATA_BPF 2UL #define SK_USER_DATA_PSOCK 4UL #define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\ SK_USER_DATA_PSOCK) /** * sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied * @sk: socket */ static inline bool sk_user_data_is_nocopy(const struct sock *sk) { return ((uintptr_t)sk->sk_user_data & SK_USER_DATA_NOCOPY); } #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) /** * __locked_read_sk_user_data_with_flags - return the pointer * only if argument flags all has been set in sk_user_data. Otherwise * return NULL * * @sk: socket * @flags: flag bits * * The caller must be holding sk->sk_callback_lock. */ static inline void * __locked_read_sk_user_data_with_flags(const struct sock *sk, uintptr_t flags) { uintptr_t sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk), lockdep_is_held(&sk->sk_callback_lock)); WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); if ((sk_user_data & flags) == flags) return (void *)(sk_user_data & SK_USER_DATA_PTRMASK); return NULL; } /** * __rcu_dereference_sk_user_data_with_flags - return the pointer * only if argument flags all has been set in sk_user_data. Otherwise * return NULL * * @sk: socket * @flags: flag bits */ static inline void * __rcu_dereference_sk_user_data_with_flags(const struct sock *sk, uintptr_t flags) { uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk)); WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); if ((sk_user_data & flags) == flags) return (void *)(sk_user_data & SK_USER_DATA_PTRMASK); return NULL; } #define rcu_dereference_sk_user_data(sk) \ __rcu_dereference_sk_user_data_with_flags(sk, 0) #define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \ ({ \ uintptr_t __tmp1 = (uintptr_t)(ptr), \ __tmp2 = (uintptr_t)(flags); \ WARN_ON_ONCE(__tmp1 & ~SK_USER_DATA_PTRMASK); \ WARN_ON_ONCE(__tmp2 & SK_USER_DATA_PTRMASK); \ rcu_assign_pointer(__sk_user_data((sk)), \ __tmp1 | __tmp2); \ }) #define rcu_assign_sk_user_data(sk, ptr) \ __rcu_assign_sk_user_data_with_flags(sk, ptr, 0) static inline struct net *sock_net(const struct sock *sk) { return read_pnet(&sk->sk_net); } static inline void sock_net_set(struct sock *sk, struct net *net) { write_pnet(&sk->sk_net, net); } /* * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK * or not whether his port will be reused by someone else. SK_FORCE_REUSE * on a socket means that the socket will reuse everybody else's port * without looking at the other's sk_reuse value. */ #define SK_NO_REUSE 0 #define SK_CAN_REUSE 1 #define SK_FORCE_REUSE 2 int sk_set_peek_off(struct sock *sk, int val); static inline int sk_peek_offset(const struct sock *sk, int flags) { if (unlikely(flags & MSG_PEEK)) { return READ_ONCE(sk->sk_peek_off); } return 0; } static inline void sk_peek_offset_bwd(struct sock *sk, int val) { s32 off = READ_ONCE(sk->sk_peek_off); if (unlikely(off >= 0)) { off = max_t(s32, off - val, 0); WRITE_ONCE(sk->sk_peek_off, off); } } static inline void sk_peek_offset_fwd(struct sock *sk, int val) { sk_peek_offset_bwd(sk, -val); } /* * Hashed lists helper routines */ static inline struct sock *sk_entry(const struct hlist_node *node) { return hlist_entry(node, struct sock, sk_node); } static inline struct sock *__sk_head(const struct hlist_head *head) { return hlist_entry(head->first, struct sock, sk_node); } static inline struct sock *sk_head(const struct hlist_head *head) { return hlist_empty(head) ? NULL : __sk_head(head); } static inline struct sock *__sk_nulls_head(const struct hlist_nulls_head *head) { return hlist_nulls_entry(head->first, struct sock, sk_nulls_node); } static inline struct sock *sk_nulls_head(const struct hlist_nulls_head *head) { return hlist_nulls_empty(head) ? NULL : __sk_nulls_head(head); } static inline struct sock *sk_next(const struct sock *sk) { return hlist_entry_safe(sk->sk_node.next, struct sock, sk_node); } static inline struct sock *sk_nulls_next(const struct sock *sk) { return (!is_a_nulls(sk->sk_nulls_node.next)) ? hlist_nulls_entry(sk->sk_nulls_node.next, struct sock, sk_nulls_node) : NULL; } static inline bool sk_unhashed(const struct sock *sk) { return hlist_unhashed(&sk->sk_node); } static inline bool sk_hashed(const struct sock *sk) { return !sk_unhashed(sk); } static inline void sk_node_init(struct hlist_node *node) { node->pprev = NULL; } static inline void __sk_del_node(struct sock *sk) { __hlist_del(&sk->sk_node); } /* NB: equivalent to hlist_del_init_rcu */ static inline bool __sk_del_node_init(struct sock *sk) { if (sk_hashed(sk)) { __sk_del_node(sk); sk_node_init(&sk->sk_node); return true; } return false; } /* Grab socket reference count. This operation is valid only when sk is ALREADY grabbed f.e. it is found in hash table or a list and the lookup is made under lock preventing hash table modifications. */ static __always_inline void sock_hold(struct sock *sk) { refcount_inc(&sk->sk_refcnt); } /* Ungrab socket in the context, which assumes that socket refcnt cannot hit zero, f.e. it is true in context of any socketcall. */ static __always_inline void __sock_put(struct sock *sk) { refcount_dec(&sk->sk_refcnt); } static inline bool sk_del_node_init(struct sock *sk) { bool rc = __sk_del_node_init(sk); if (rc) { /* paranoid for a while -acme */ WARN_ON(refcount_read(&sk->sk_refcnt) == 1); __sock_put(sk); } return rc; } #define sk_del_node_init_rcu(sk) sk_del_node_init(sk) static inline bool __sk_nulls_del_node_init_rcu(struct sock *sk) { if (sk_hashed(sk)) { hlist_nulls_del_init_rcu(&sk->sk_nulls_node); return true; } return false; } static inline bool sk_nulls_del_node_init_rcu(struct sock *sk) { bool rc = __sk_nulls_del_node_init_rcu(sk); if (rc) { /* paranoid for a while -acme */ WARN_ON(refcount_read(&sk->sk_refcnt) == 1); __sock_put(sk); } return rc; } static inline void __sk_add_node(struct sock *sk, struct hlist_head *list) { hlist_add_head(&sk->sk_node, list); } static inline void sk_add_node(struct sock *sk, struct hlist_head *list) { sock_hold(sk); __sk_add_node(sk, list); } static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list) { sock_hold(sk); if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && sk->sk_family == AF_INET6) hlist_add_tail_rcu(&sk->sk_node, list); else hlist_add_head_rcu(&sk->sk_node, list); } static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list) { sock_hold(sk); hlist_add_tail_rcu(&sk->sk_node, list); } static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); } static inline void __sk_nulls_add_node_tail_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_tail_rcu(&sk->sk_nulls_node, list); } static inline void sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { sock_hold(sk); __sk_nulls_add_node_rcu(sk, list); } static inline void __sk_del_bind_node(struct sock *sk) { __hlist_del(&sk->sk_bind_node); } static inline void sk_add_bind_node(struct sock *sk, struct hlist_head *list) { hlist_add_head(&sk->sk_bind_node, list); } #define sk_for_each(__sk, list) \ hlist_for_each_entry(__sk, list, sk_node) #define sk_for_each_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, sk_node) #define sk_nulls_for_each(__sk, node, list) \ hlist_nulls_for_each_entry(__sk, node, list, sk_nulls_node) #define sk_nulls_for_each_rcu(__sk, node, list) \ hlist_nulls_for_each_entry_rcu(__sk, node, list, sk_nulls_node) #define sk_for_each_from(__sk) \ hlist_for_each_entry_from(__sk, sk_node) #define sk_nulls_for_each_from(__sk, node) \ if (__sk && ({ node = &(__sk)->sk_nulls_node; 1; })) \ hlist_nulls_for_each_entry_from(__sk, node, sk_nulls_node) #define sk_for_each_safe(__sk, tmp, list) \ hlist_for_each_entry_safe(__sk, tmp, list, sk_node) #define sk_for_each_bound(__sk, list) \ hlist_for_each_entry(__sk, list, sk_bind_node) /** * sk_for_each_entry_offset_rcu - iterate over a list at a given struct offset * @tpos: the type * to use as a loop cursor. * @pos: the &struct hlist_node to use as a loop cursor. * @head: the head for your list. * @offset: offset of hlist_node within the struct. * */ #define sk_for_each_entry_offset_rcu(tpos, pos, head, offset) \ for (pos = rcu_dereference(hlist_first_rcu(head)); \ pos != NULL && \ ({ tpos = (typeof(*tpos) *)((void *)pos - offset); 1;}); \ pos = rcu_dereference(hlist_next_rcu(pos))) static inline struct user_namespace *sk_user_ns(const struct sock *sk) { /* Careful only use this in a context where these parameters * can not change and must all be valid, such as recvmsg from * userspace. */ return sk->sk_socket->file->f_cred->user_ns; } /* Sock flags */ enum sock_flags { SOCK_DEAD, SOCK_DONE, SOCK_URGINLINE, SOCK_KEEPOPEN, SOCK_LINGER, SOCK_DESTROY, SOCK_BROADCAST, SOCK_TIMESTAMP, SOCK_ZAPPED, SOCK_USE_WRITE_QUEUE, /* whether to call sk->sk_write_space in sock_wfree */ SOCK_DBG, /* %SO_DEBUG setting */ SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_MEMALLOC, /* VM depends on this socket for swapping */ SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */ SOCK_FASYNC, /* fasync() active */ SOCK_RXQ_OVFL, SOCK_ZEROCOPY, /* buffers from userspace */ SOCK_WIFI_STATUS, /* push wifi status to userspace */ SOCK_NOFCS, /* Tell NIC not to do the Ethernet FCS. * Will use last 4 bytes of packet sent from * user-space instead. */ SOCK_FILTER_LOCKED, /* Filter cannot be changed anymore */ SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */ SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */ SOCK_TXTIME, SOCK_XDP, /* XDP is attached */ SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */ SOCK_RCVMARK, /* Receive SO_MARK ancillary data with packet */ }; #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE)) static inline void sock_copy_flags(struct sock *nsk, const struct sock *osk) { nsk->sk_flags = osk->sk_flags; } static inline void sock_set_flag(struct sock *sk, enum sock_flags flag) { __set_bit(flag, &sk->sk_flags); } static inline void sock_reset_flag(struct sock *sk, enum sock_flags flag) { __clear_bit(flag, &sk->sk_flags); } static inline void sock_valbool_flag(struct sock *sk, enum sock_flags bit, int valbool) { if (valbool) sock_set_flag(sk, bit); else sock_reset_flag(sk, bit); } static inline bool sock_flag(const struct sock *sk, enum sock_flags flag) { return test_bit(flag, &sk->sk_flags); } #ifdef CONFIG_NET DECLARE_STATIC_KEY_FALSE(memalloc_socks_key); static inline int sk_memalloc_socks(void) { return static_branch_unlikely(&memalloc_socks_key); } void __receive_sock(struct file *file); #else static inline int sk_memalloc_socks(void) { return 0; } static inline void __receive_sock(struct file *file) { } #endif static inline gfp_t sk_gfp_mask(const struct sock *sk, gfp_t gfp_mask) { return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC); } static inline void sk_acceptq_removed(struct sock *sk) { WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog - 1); } static inline void sk_acceptq_added(struct sock *sk) { WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog + 1); } /* Note: If you think the test should be: * return READ_ONCE(sk->sk_ack_backlog) >= READ_ONCE(sk->sk_max_ack_backlog); * Then please take a look at commit 64a146513f8f ("[NET]: Revert incorrect accept queue backlog changes.") */ static inline bool sk_acceptq_is_full(const struct sock *sk) { return READ_ONCE(sk->sk_ack_backlog) > READ_ONCE(sk->sk_max_ack_backlog); } /* * Compute minimal free write space needed to queue new packets. */ static inline int sk_stream_min_wspace(const struct sock *sk) { return READ_ONCE(sk->sk_wmem_queued) >> 1; } static inline int sk_stream_wspace(const struct sock *sk) { return READ_ONCE(sk->sk_sndbuf) - READ_ONCE(sk->sk_wmem_queued); } static inline void sk_wmem_queued_add(struct sock *sk, int val) { WRITE_ONCE(sk->sk_wmem_queued, sk->sk_wmem_queued + val); } static inline void sk_forward_alloc_add(struct sock *sk, int val) { /* Paired with lockless reads of sk->sk_forward_alloc */ WRITE_ONCE(sk->sk_forward_alloc, sk->sk_forward_alloc + val); } void sk_stream_write_space(struct sock *sk); /* OOB backlog add */ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) { /* dont let skb dst not refcounted, we are going to leave rcu lock */ skb_dst_force(skb); if (!sk->sk_backlog.tail) WRITE_ONCE(sk->sk_backlog.head, skb); else sk->sk_backlog.tail->next = skb; WRITE_ONCE(sk->sk_backlog.tail, skb); skb->next = NULL; } /* * Take into account size of receive queue and backlog queue * Do not take into account this skb truesize, * to allow even a single big packet to come. */ static inline bool sk_rcvqueues_full(const struct sock *sk, unsigned int limit) { unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); return qsize > limit; } /* The per-socket spinlock must be held here. */ static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb, unsigned int limit) { if (sk_rcvqueues_full(sk, limit)) return -ENOBUFS; /* * If the skb was allocated from pfmemalloc reserves, only * allow SOCK_MEMALLOC sockets to use it as this socket is * helping free memory */ if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) return -ENOMEM; __sk_add_backlog(sk, skb); sk->sk_backlog.len += skb->truesize; return 0; } int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb); INDIRECT_CALLABLE_DECLARE(int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)); INDIRECT_CALLABLE_DECLARE(int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)); static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb) { if (sk_memalloc_socks() && skb_pfmemalloc(skb)) return __sk_backlog_rcv(sk, skb); return INDIRECT_CALL_INET(sk->sk_backlog_rcv, tcp_v6_do_rcv, tcp_v4_do_rcv, sk, skb); } static inline void sk_incoming_cpu_update(struct sock *sk) { int cpu = raw_smp_processor_id(); if (unlikely(READ_ONCE(sk->sk_incoming_cpu) != cpu)) WRITE_ONCE(sk->sk_incoming_cpu, cpu); } static inline void sock_rps_save_rxhash(struct sock *sk, const struct sk_buff *skb) { #ifdef CONFIG_RPS /* The following WRITE_ONCE() is paired with the READ_ONCE() * here, and another one in sock_rps_record_flow(). */ if (unlikely(READ_ONCE(sk->sk_rxhash) != skb->hash)) WRITE_ONCE(sk->sk_rxhash, skb->hash); #endif } static inline void sock_rps_reset_rxhash(struct sock *sk) { #ifdef CONFIG_RPS /* Paired with READ_ONCE() in sock_rps_record_flow() */ WRITE_ONCE(sk->sk_rxhash, 0); #endif } #define sk_wait_event(__sk, __timeo, __condition, __wait) \ ({ int __rc, __dis = __sk->sk_disconnects; \ release_sock(__sk); \ __rc = __condition; \ if (!__rc) { \ *(__timeo) = wait_woken(__wait, \ TASK_INTERRUPTIBLE, \ *(__timeo)); \ } \ sched_annotate_sleep(); \ lock_sock(__sk); \ __rc = __dis == __sk->sk_disconnects ? __condition : -EPIPE; \ __rc; \ }) int sk_stream_wait_connect(struct sock *sk, long *timeo_p); int sk_stream_wait_memory(struct sock *sk, long *timeo_p); void sk_stream_wait_close(struct sock *sk, long timeo_p); int sk_stream_error(struct sock *sk, int flags, int err); void sk_stream_kill_queues(struct sock *sk); void sk_set_memalloc(struct sock *sk); void sk_clear_memalloc(struct sock *sk); void __sk_flush_backlog(struct sock *sk); static inline bool sk_flush_backlog(struct sock *sk) { if (unlikely(READ_ONCE(sk->sk_backlog.tail))) { __sk_flush_backlog(sk); return true; } return false; } int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb); struct request_sock_ops; struct timewait_sock_ops; struct inet_hashinfo; struct raw_hashinfo; struct smc_hashinfo; struct module; struct sk_psock; /* * caches using SLAB_TYPESAFE_BY_RCU should let .next pointer from nulls nodes * un-modified. Special care is taken when initializing object to zero. */ static inline void sk_prot_clear_nulls(struct sock *sk, int size) { if (offsetof(struct sock, sk_node.next) != 0) memset(sk, 0, offsetof(struct sock, sk_node.next)); memset(&sk->sk_node.pprev, 0, size - offsetof(struct sock, sk_node.pprev)); } struct proto_accept_arg { int flags; int err; int is_empty; bool kern; }; /* Networking protocol blocks we attach to sockets. * socket layer -> transport layer interface */ struct proto { void (*close)(struct sock *sk, long timeout); int (*pre_connect)(struct sock *sk, struct sockaddr *uaddr, int addr_len); int (*connect)(struct sock *sk, struct sockaddr *uaddr, int addr_len); int (*disconnect)(struct sock *sk, int flags); struct sock * (*accept)(struct sock *sk, struct proto_accept_arg *arg); int (*ioctl)(struct sock *sk, int cmd, int *karg); int (*init)(struct sock *sk); void (*destroy)(struct sock *sk); void (*shutdown)(struct sock *sk, int how); int (*setsockopt)(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); void (*keepalive)(struct sock *sk, int valbool); #ifdef CONFIG_COMPAT int (*compat_ioctl)(struct sock *sk, unsigned int cmd, unsigned long arg); #endif int (*sendmsg)(struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len); void (*splice_eof)(struct socket *sock); int (*bind)(struct sock *sk, struct sockaddr *addr, int addr_len); int (*bind_add)(struct sock *sk, struct sockaddr *addr, int addr_len); int (*backlog_rcv) (struct sock *sk, struct sk_buff *skb); bool (*bpf_bypass_getsockopt)(int level, int optname); void (*release_cb)(struct sock *sk); /* Keeping track of sk's, looking them up, and port selection methods. */ int (*hash)(struct sock *sk); void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); void (*put_port)(struct sock *sk); #ifdef CONFIG_BPF_SYSCALL int (*psock_update_sk_prot)(struct sock *sk, struct sk_psock *psock, bool restore); #endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS unsigned int inuse_idx; #endif #if IS_ENABLED(CONFIG_MPTCP) int (*forward_alloc_get)(const struct sock *sk); #endif bool (*stream_memory_free)(const struct sock *sk, int wake); bool (*sock_is_readable)(struct sock *sk); /* Memory pressure */ void (*enter_memory_pressure)(struct sock *sk); void (*leave_memory_pressure)(struct sock *sk); atomic_long_t *memory_allocated; /* Current allocated memory. */ int __percpu *per_cpu_fw_alloc; struct percpu_counter *sockets_allocated; /* Current number of sockets. */ /* * Pressure flag: try to collapse. * Technical note: it is used by multiple contexts non atomically. * Make sure to use READ_ONCE()/WRITE_ONCE() for all reads/writes. * All the __sk_mem_schedule() is of this nature: accounting * is strict, actions are advisory and have some latency. */ unsigned long *memory_pressure; long *sysctl_mem; int *sysctl_wmem; int *sysctl_rmem; u32 sysctl_wmem_offset; u32 sysctl_rmem_offset; int max_header; bool no_autobind; struct kmem_cache *slab; unsigned int obj_size; unsigned int ipv6_pinfo_offset; slab_flags_t slab_flags; unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ unsigned int __percpu *orphan_count; struct request_sock_ops *rsk_prot; struct timewait_sock_ops *twsk_prot; union { struct inet_hashinfo *hashinfo; struct udp_table *udp_table; struct raw_hashinfo *raw_hash; struct smc_hashinfo *smc_hash; } h; struct module *owner; char name[32]; struct list_head node; int (*diag_destroy)(struct sock *sk, int err); } __randomize_layout; int proto_register(struct proto *prot, int alloc_slab); void proto_unregister(struct proto *prot); int sock_load_diag_module(int family, int protocol); INDIRECT_CALLABLE_DECLARE(bool tcp_stream_memory_free(const struct sock *sk, int wake)); static inline int sk_forward_alloc_get(const struct sock *sk) { #if IS_ENABLED(CONFIG_MPTCP) if (sk->sk_prot->forward_alloc_get) return sk->sk_prot->forward_alloc_get(sk); #endif return READ_ONCE(sk->sk_forward_alloc); } static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) { if (READ_ONCE(sk->sk_wmem_queued) >= READ_ONCE(sk->sk_sndbuf)) return false; return sk->sk_prot->stream_memory_free ? INDIRECT_CALL_INET_1(sk->sk_prot->stream_memory_free, tcp_stream_memory_free, sk, wake) : true; } static inline bool sk_stream_memory_free(const struct sock *sk) { return __sk_stream_memory_free(sk, 0); } static inline bool __sk_stream_is_writeable(const struct sock *sk, int wake) { return sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) && __sk_stream_memory_free(sk, wake); } static inline bool sk_stream_is_writeable(const struct sock *sk) { return __sk_stream_is_writeable(sk, 0); } static inline int sk_under_cgroup_hierarchy(struct sock *sk, struct cgroup *ancestor) { #ifdef CONFIG_SOCK_CGROUP_DATA return cgroup_is_descendant(sock_cgroup_ptr(&sk->sk_cgrp_data), ancestor); #else return -ENOTSUPP; #endif } #define SK_ALLOC_PERCPU_COUNTER_BATCH 16 static inline void sk_sockets_allocated_dec(struct sock *sk) { percpu_counter_add_batch(sk->sk_prot->sockets_allocated, -1, SK_ALLOC_PERCPU_COUNTER_BATCH); } static inline void sk_sockets_allocated_inc(struct sock *sk) { percpu_counter_add_batch(sk->sk_prot->sockets_allocated, 1, SK_ALLOC_PERCPU_COUNTER_BATCH); } static inline u64 sk_sockets_allocated_read_positive(struct sock *sk) { return percpu_counter_read_positive(sk->sk_prot->sockets_allocated); } static inline int proto_sockets_allocated_sum_positive(struct proto *prot) { return percpu_counter_sum_positive(prot->sockets_allocated); } #ifdef CONFIG_PROC_FS #define PROTO_INUSE_NR 64 /* should be enough for the first time */ struct prot_inuse { int all; int val[PROTO_INUSE_NR]; }; static inline void sock_prot_inuse_add(const struct net *net, const struct proto *prot, int val) { this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val); } static inline void sock_inuse_add(const struct net *net, int val) { this_cpu_add(net->core.prot_inuse->all, val); } int sock_prot_inuse_get(struct net *net, struct proto *proto); int sock_inuse_get(struct net *net); #else static inline void sock_prot_inuse_add(const struct net *net, const struct proto *prot, int val) { } static inline void sock_inuse_add(const struct net *net, int val) { } #endif /* With per-bucket locks this operation is not-atomic, so that * this version is not worse. */ static inline int __sk_prot_rehash(struct sock *sk) { sk->sk_prot->unhash(sk); return sk->sk_prot->hash(sk); } /* About 10 seconds */ #define SOCK_DESTROY_TIME (10*HZ) /* Sockets 0-1023 can't be bound to unless you are superuser */ #define PROT_SOCK 1024 #define SHUTDOWN_MASK 3 #define RCV_SHUTDOWN 1 #define SEND_SHUTDOWN 2 #define SOCK_BINDADDR_LOCK 4 #define SOCK_BINDPORT_LOCK 8 struct socket_alloc { struct socket socket; struct inode vfs_inode; }; static inline struct socket *SOCKET_I(struct inode *inode) { return &container_of(inode, struct socket_alloc, vfs_inode)->socket; } static inline struct inode *SOCK_INODE(struct socket *socket) { return &container_of(socket, struct socket_alloc, socket)->vfs_inode; } /* * Functions for memory accounting */ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind); int __sk_mem_schedule(struct sock *sk, int size, int kind); void __sk_mem_reduce_allocated(struct sock *sk, int amount); void __sk_mem_reclaim(struct sock *sk, int amount); #define SK_MEM_SEND 0 #define SK_MEM_RECV 1 /* sysctl_mem values are in pages */ static inline long sk_prot_mem_limits(const struct sock *sk, int index) { return READ_ONCE(sk->sk_prot->sysctl_mem[index]); } static inline int sk_mem_pages(int amt) { return (amt + PAGE_SIZE - 1) >> PAGE_SHIFT; } static inline bool sk_has_account(struct sock *sk) { /* return true if protocol supports memory accounting */ return !!sk->sk_prot->memory_allocated; } static inline bool sk_wmem_schedule(struct sock *sk, int size) { int delta; if (!sk_has_account(sk)) return true; delta = size - sk->sk_forward_alloc; return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_SEND); } static inline bool sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size) { int delta; if (!sk_has_account(sk)) return true; delta = size - sk->sk_forward_alloc; return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_RECV) || skb_pfmemalloc(skb); } static inline int sk_unused_reserved_mem(const struct sock *sk) { int unused_mem; if (likely(!sk->sk_reserved_mem)) return 0; unused_mem = sk->sk_reserved_mem - sk->sk_wmem_queued - atomic_read(&sk->sk_rmem_alloc); return unused_mem > 0 ? unused_mem : 0; } static inline void sk_mem_reclaim(struct sock *sk) { int reclaimable; if (!sk_has_account(sk)) return; reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); if (reclaimable >= (int)PAGE_SIZE) __sk_mem_reclaim(sk, reclaimable); } static inline void sk_mem_reclaim_final(struct sock *sk) { sk->sk_reserved_mem = 0; sk_mem_reclaim(sk); } static inline void sk_mem_charge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; sk_forward_alloc_add(sk, -size); } static inline void sk_mem_uncharge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; sk_forward_alloc_add(sk, size); sk_mem_reclaim(sk); } /* * Macro so as to not evaluate some arguments when * lockdep is not enabled. * * Mark both the sk_lock and the sk_lock.slock as a * per-address-family lock class. */ #define sock_lock_init_class_and_name(sk, sname, skey, name, key) \ do { \ sk->sk_lock.owned = 0; \ init_waitqueue_head(&sk->sk_lock.wq); \ spin_lock_init(&(sk)->sk_lock.slock); \ debug_check_no_locks_freed((void *)&(sk)->sk_lock, \ sizeof((sk)->sk_lock)); \ lockdep_set_class_and_name(&(sk)->sk_lock.slock, \ (skey), (sname)); \ lockdep_init_map(&(sk)->sk_lock.dep_map, (name), (key), 0); \ } while (0) static inline bool lockdep_sock_is_held(const struct sock *sk) { return lockdep_is_held(&sk->sk_lock) || lockdep_is_held(&sk->sk_lock.slock); } void lock_sock_nested(struct sock *sk, int subclass); static inline void lock_sock(struct sock *sk) { lock_sock_nested(sk, 0); } void __lock_sock(struct sock *sk); void __release_sock(struct sock *sk); void release_sock(struct sock *sk); /* BH context may only use the following locking interface. */ #define bh_lock_sock(__sk) spin_lock(&((__sk)->sk_lock.slock)) #define bh_lock_sock_nested(__sk) \ spin_lock_nested(&((__sk)->sk_lock.slock), \ SINGLE_DEPTH_NESTING) #define bh_unlock_sock(__sk) spin_unlock(&((__sk)->sk_lock.slock)) bool __lock_sock_fast(struct sock *sk) __acquires(&sk->sk_lock.slock); /** * lock_sock_fast - fast version of lock_sock * @sk: socket * * This version should be used for very small section, where process wont block * return false if fast path is taken: * * sk_lock.slock locked, owned = 0, BH disabled * * return true if slow path is taken: * * sk_lock.slock unlocked, owned = 1, BH enabled */ static inline bool lock_sock_fast(struct sock *sk) { /* The sk_lock has mutex_lock() semantics here. */ mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_); return __lock_sock_fast(sk); } /* fast socket lock variant for caller already holding a [different] socket lock */ static inline bool lock_sock_fast_nested(struct sock *sk) { mutex_acquire(&sk->sk_lock.dep_map, SINGLE_DEPTH_NESTING, 0, _RET_IP_); return __lock_sock_fast(sk); } /** * unlock_sock_fast - complement of lock_sock_fast * @sk: socket * @slow: slow mode * * fast unlock socket for user context. * If slow mode is on, we call regular release_sock() */ static inline void unlock_sock_fast(struct sock *sk, bool slow) __releases(&sk->sk_lock.slock) { if (slow) { release_sock(sk); __release(&sk->sk_lock.slock); } else { mutex_release(&sk->sk_lock.dep_map, _RET_IP_); spin_unlock_bh(&sk->sk_lock.slock); } } void sockopt_lock_sock(struct sock *sk); void sockopt_release_sock(struct sock *sk); bool sockopt_ns_capable(struct user_namespace *ns, int cap); bool sockopt_capable(int cap); /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming * packets, so that we won't get any new data or any * packets that change the state of the socket. * * While locked, BH processing will add new packets to * the backlog queue. This queue is processed by the * owner of the socket lock right before it is released. * * Since ~2.3.5 it is also exclusive sleep lock serializing * accesses from user process context. */ static inline void sock_owned_by_me(const struct sock *sk) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks); #endif } static inline void sock_not_owned_by_me(const struct sock *sk) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(lockdep_sock_is_held(sk) && debug_locks); #endif } static inline bool sock_owned_by_user(const struct sock *sk) { sock_owned_by_me(sk); return sk->sk_lock.owned; } static inline bool sock_owned_by_user_nocheck(const struct sock *sk) { return sk->sk_lock.owned; } static inline void sock_release_ownership(struct sock *sk) { DEBUG_NET_WARN_ON_ONCE(!sock_owned_by_user_nocheck(sk)); sk->sk_lock.owned = 0; /* The sk_lock has mutex_unlock() semantics: */ mutex_release(&sk->sk_lock.dep_map, _RET_IP_); } /* no reclassification while locks are held */ static inline bool sock_allow_reclassification(const struct sock *csk) { struct sock *sk = (struct sock *)csk; return !sock_owned_by_user_nocheck(sk) && !spin_is_locked(&sk->sk_lock.slock); } struct sock *sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern); void sk_free(struct sock *sk); void sk_destruct(struct sock *sk); struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority); void sk_free_unlock_clone(struct sock *sk); struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force, gfp_t priority); void __sock_wfree(struct sk_buff *skb); void sock_wfree(struct sk_buff *skb); struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size, gfp_t priority); void skb_orphan_partial(struct sk_buff *skb); void sock_rfree(struct sk_buff *skb); void sock_efree(struct sk_buff *skb); #ifdef CONFIG_INET void sock_edemux(struct sk_buff *skb); void sock_pfree(struct sk_buff *skb); #else #define sock_edemux sock_efree #endif int sk_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); int do_sock_setsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, int optlen); int do_sock_getsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, sockptr_t optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); int sock_gettstamp(struct socket *sock, void __user *userstamp, bool timeval, bool time32); struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, unsigned long data_len, int noblock, int *errcode, int max_page_order); static inline struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, int noblock, int *errcode) { return sock_alloc_send_pskb(sk, size, 0, noblock, errcode, 0); } void *sock_kmalloc(struct sock *sk, int size, gfp_t priority); void sock_kfree_s(struct sock *sk, void *mem, int size); void sock_kzfree_s(struct sock *sk, void *mem, int size); void sk_send_sigurg(struct sock *sk); static inline void sock_replace_proto(struct sock *sk, struct proto *proto) { if (sk->sk_socket) clear_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); WRITE_ONCE(sk->sk_prot, proto); } struct sockcm_cookie { u64 transmit_time; u32 mark; u32 tsflags; }; static inline void sockcm_init(struct sockcm_cookie *sockc, const struct sock *sk) { *sockc = (struct sockcm_cookie) { .tsflags = READ_ONCE(sk->sk_tsflags) }; } int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, struct sockcm_cookie *sockc); int sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct sockcm_cookie *sockc); /* * Functions to fill in entries in struct proto_ops when a protocol * does not implement a particular function. */ int sock_no_bind(struct socket *, struct sockaddr *, int); int sock_no_connect(struct socket *, struct sockaddr *, int, int); int sock_no_socketpair(struct socket *, struct socket *); int sock_no_accept(struct socket *, struct socket *, struct proto_accept_arg *); int sock_no_getname(struct socket *, struct sockaddr *, int); int sock_no_ioctl(struct socket *, unsigned int, unsigned long); int sock_no_listen(struct socket *, int); int sock_no_shutdown(struct socket *, int); int sock_no_sendmsg(struct socket *, struct msghdr *, size_t); int sock_no_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); int sock_no_recvmsg(struct socket *, struct msghdr *, size_t, int); int sock_no_mmap(struct file *file, struct socket *sock, struct vm_area_struct *vma); /* * Functions to fill in entries in struct proto_ops when a protocol * uses the inet style. */ int sock_common_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); int sock_common_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); int sock_common_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen); void sk_common_release(struct sock *sk); /* * Default socket callbacks and setup code */ /* Initialise core socket variables using an explicit uid. */ void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid); /* Initialise core socket variables. * Assumes struct socket *sock is embedded in a struct socket_alloc. */ void sock_init_data(struct socket *sock, struct sock *sk); /* * Socket reference counting postulates. * * * Each user of socket SHOULD hold a reference count. * * Each access point to socket (an hash table bucket, reference from a list, * running timer, skb in flight MUST hold a reference count. * * When reference count hits 0, it means it will never increase back. * * When reference count hits 0, it means that no references from * outside exist to this socket and current process on current CPU * is last user and may/should destroy this socket. * * sk_free is called from any context: process, BH, IRQ. When * it is called, socket has no references from outside -> sk_free * may release descendant resources allocated by the socket, but * to the time when it is called, socket is NOT referenced by any * hash tables, lists etc. * * Packets, delivered from outside (from network or from another process) * and enqueued on receive/error queues SHOULD NOT grab reference count, * when they sit in queue. Otherwise, packets will leak to hole, when * socket is looked up by one cpu and unhasing is made by another CPU. * It is true for udp/raw, netlink (leak to receive and error queues), tcp * (leak to backlog). Packet socket does all the processing inside * BR_NETPROTO_LOCK, so that it has not this race condition. UNIX sockets * use separate SMP lock, so that they are prone too. */ /* Ungrab socket and destroy it, if it was the last reference. */ static inline void sock_put(struct sock *sk) { if (refcount_dec_and_test(&sk->sk_refcnt)) sk_free(sk); } /* Generic version of sock_put(), dealing with all sockets * (TCP_TIMEWAIT, TCP_NEW_SYN_RECV, ESTABLISHED...) */ void sock_gen_put(struct sock *sk); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested, unsigned int trim_cap, bool refcounted); static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested) { return __sk_receive_skb(sk, skb, nested, 1, true); } static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) { /* sk_tx_queue_mapping accept only upto a 16-bit value */ if (WARN_ON_ONCE((unsigned short)tx_queue >= USHRT_MAX)) return; /* Paired with READ_ONCE() in sk_tx_queue_get() and * other WRITE_ONCE() because socket lock might be not held. */ WRITE_ONCE(sk->sk_tx_queue_mapping, tx_queue); } #define NO_QUEUE_MAPPING USHRT_MAX static inline void sk_tx_queue_clear(struct sock *sk) { /* Paired with READ_ONCE() in sk_tx_queue_get() and * other WRITE_ONCE() because socket lock might be not held. */ WRITE_ONCE(sk->sk_tx_queue_mapping, NO_QUEUE_MAPPING); } static inline int sk_tx_queue_get(const struct sock *sk) { if (sk) { /* Paired with WRITE_ONCE() in sk_tx_queue_clear() * and sk_tx_queue_set(). */ int val = READ_ONCE(sk->sk_tx_queue_mapping); if (val != NO_QUEUE_MAPPING) return val; } return -1; } static inline void __sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb, bool force_set) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING if (skb_rx_queue_recorded(skb)) { u16 rx_queue = skb_get_rx_queue(skb); if (force_set || unlikely(READ_ONCE(sk->sk_rx_queue_mapping) != rx_queue)) WRITE_ONCE(sk->sk_rx_queue_mapping, rx_queue); } #endif } static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) { __sk_rx_queue_set(sk, skb, true); } static inline void sk_rx_queue_update(struct sock *sk, const struct sk_buff *skb) { __sk_rx_queue_set(sk, skb, false); } static inline void sk_rx_queue_clear(struct sock *sk) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING WRITE_ONCE(sk->sk_rx_queue_mapping, NO_QUEUE_MAPPING); #endif } static inline int sk_rx_queue_get(const struct sock *sk) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING if (sk) { int res = READ_ONCE(sk->sk_rx_queue_mapping); if (res != NO_QUEUE_MAPPING) return res; } #endif return -1; } static inline void sk_set_socket(struct sock *sk, struct socket *sock) { sk->sk_socket = sock; } static inline wait_queue_head_t *sk_sleep(struct sock *sk) { BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0); return &rcu_dereference_raw(sk->sk_wq)->wait; } /* Detach socket from process context. * Announce socket dead, detach it from wait queue and inode. * Note that parent inode held reference count on this struct sock, * we do not release it in this function, because protocol * probably wants some additional cleanups or even continuing * to work with this socket (TCP). */ static inline void sock_orphan(struct sock *sk) { write_lock_bh(&sk->sk_callback_lock); sock_set_flag(sk, SOCK_DEAD); sk_set_socket(sk, NULL); sk->sk_wq = NULL; write_unlock_bh(&sk->sk_callback_lock); } static inline void sock_graft(struct sock *sk, struct socket *parent) { WARN_ON(parent->sk); write_lock_bh(&sk->sk_callback_lock); rcu_assign_pointer(sk->sk_wq, &parent->wq); parent->sk = sk; sk_set_socket(sk, parent); sk->sk_uid = SOCK_INODE(parent)->i_uid; security_sock_graft(sk, parent); write_unlock_bh(&sk->sk_callback_lock); } kuid_t sock_i_uid(struct sock *sk); unsigned long __sock_i_ino(struct sock *sk); unsigned long sock_i_ino(struct sock *sk); static inline kuid_t sock_net_uid(const struct net *net, const struct sock *sk) { return sk ? sk->sk_uid : make_kuid(net->user_ns, 0); } static inline u32 net_tx_rndhash(void) { u32 v = get_random_u32(); return v ?: 1; } static inline void sk_set_txhash(struct sock *sk) { /* This pairs with READ_ONCE() in skb_set_hash_from_sk() */ WRITE_ONCE(sk->sk_txhash, net_tx_rndhash()); } static inline bool sk_rethink_txhash(struct sock *sk) { if (sk->sk_txhash && sk->sk_txrehash == SOCK_TXREHASH_ENABLED) { sk_set_txhash(sk); return true; } return false; } static inline struct dst_entry * __sk_dst_get(const struct sock *sk) { return rcu_dereference_check(sk->sk_dst_cache, lockdep_sock_is_held(sk)); } static inline struct dst_entry * sk_dst_get(const struct sock *sk) { struct dst_entry *dst; rcu_read_lock(); dst = rcu_dereference(sk->sk_dst_cache); if (dst && !rcuref_get(&dst->__rcuref)) dst = NULL; rcu_read_unlock(); return dst; } static inline void __dst_negative_advice(struct sock *sk) { struct dst_entry *dst = __sk_dst_get(sk); if (dst && dst->ops->negative_advice) dst->ops->negative_advice(sk, dst); } static inline void dst_negative_advice(struct sock *sk) { sk_rethink_txhash(sk); __dst_negative_advice(sk); } static inline void __sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = rcu_dereference_protected(sk->sk_dst_cache, lockdep_sock_is_held(sk)); rcu_assign_pointer(sk->sk_dst_cache, dst); dst_release(old_dst); } static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = unrcu_pointer(xchg(&sk->sk_dst_cache, RCU_INITIALIZER(dst))); dst_release(old_dst); } static inline void __sk_dst_reset(struct sock *sk) { __sk_dst_set(sk, NULL); } static inline void sk_dst_reset(struct sock *sk) { sk_dst_set(sk, NULL); } struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie); struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie); static inline void sk_dst_confirm(struct sock *sk) { if (!READ_ONCE(sk->sk_dst_pending_confirm)) WRITE_ONCE(sk->sk_dst_pending_confirm, 1); } static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n) { if (skb_get_dst_pending_confirm(skb)) { struct sock *sk = skb->sk; if (sk && READ_ONCE(sk->sk_dst_pending_confirm)) WRITE_ONCE(sk->sk_dst_pending_confirm, 0); neigh_confirm(n); } } bool sk_mc_loop(const struct sock *sk); static inline bool sk_can_gso(const struct sock *sk) { return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type); } void sk_setup_caps(struct sock *sk, struct dst_entry *dst); static inline void sk_gso_disable(struct sock *sk) { sk->sk_gso_disabled = 1; sk->sk_route_caps &= ~NETIF_F_GSO_MASK; } static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, char *to, int copy, int offset) { if (skb->ip_summed == CHECKSUM_NONE) { __wsum csum = 0; if (!csum_and_copy_from_iter_full(to, copy, &csum, from)) return -EFAULT; skb->csum = csum_block_add(skb->csum, csum, offset); } else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) { if (!copy_from_iter_full_nocache(to, copy, from)) return -EFAULT; } else if (!copy_from_iter_full(to, copy, from)) return -EFAULT; return 0; } static inline int skb_add_data_nocache(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, int copy) { int err, offset = skb->len; err = skb_do_copy_data_nocache(sk, skb, from, skb_put(skb, copy), copy, offset); if (err) __skb_trim(skb, offset); return err; } static inline int skb_copy_to_page_nocache(struct sock *sk, struct iov_iter *from, struct sk_buff *skb, struct page *page, int off, int copy) { int err; err = skb_do_copy_data_nocache(sk, skb, from, page_address(page) + off, copy, skb->len); if (err) return err; skb_len_add(skb, copy); sk_wmem_queued_add(sk, copy); sk_mem_charge(sk, copy); return 0; } /** * sk_wmem_alloc_get - returns write allocations * @sk: socket * * Return: sk_wmem_alloc minus initial offset of one */ static inline int sk_wmem_alloc_get(const struct sock *sk) { return refcount_read(&sk->sk_wmem_alloc) - 1; } /** * sk_rmem_alloc_get - returns read allocations * @sk: socket * * Return: sk_rmem_alloc */ static inline int sk_rmem_alloc_get(const struct sock *sk) { return atomic_read(&sk->sk_rmem_alloc); } /** * sk_has_allocations - check if allocations are outstanding * @sk: socket * * Return: true if socket has write or read allocations */ static inline bool sk_has_allocations(const struct sock *sk) { return sk_wmem_alloc_get(sk) || sk_rmem_alloc_get(sk); } /** * skwq_has_sleeper - check if there are any waiting processes * @wq: struct socket_wq * * Return: true if socket_wq has waiting processes * * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory * barrier call. They were added due to the race found within the tcp code. * * Consider following tcp code paths:: * * CPU1 CPU2 * sys_select receive packet * ... ... * __add_wait_queue update tp->rcv_nxt * ... ... * tp->rcv_nxt check sock_def_readable * ... { * schedule rcu_read_lock(); * wq = rcu_dereference(sk->sk_wq); * if (wq && waitqueue_active(&wq->wait)) * wake_up_interruptible(&wq->wait) * ... * } * * The race for tcp fires when the __add_wait_queue changes done by CPU1 stay * in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 * could then endup calling schedule and sleep forever if there are no more * data on the socket. * */ static inline bool skwq_has_sleeper(struct socket_wq *wq) { return wq && wq_has_sleeper(&wq->wait); } /** * sock_poll_wait - place memory barrier behind the poll_wait call. * @filp: file * @sock: socket to wait on * @p: poll_table * * See the comments in the wq_has_sleeper function. */ static inline void sock_poll_wait(struct file *filp, struct socket *sock, poll_table *p) { if (!poll_does_not_wait(p)) { poll_wait(filp, &sock->wq.wait, p); /* We need to be sure we are in sync with the * socket flags modification. * * This memory barrier is paired in the wq_has_sleeper. */ smp_mb(); } } static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) { /* This pairs with WRITE_ONCE() in sk_set_txhash() */ u32 txhash = READ_ONCE(sk->sk_txhash); if (txhash) { skb->l4_hash = 1; skb->hash = txhash; } } void skb_set_owner_w(struct sk_buff *skb, struct sock *sk); /* * Queue a received datagram if it will fit. Stream and sequenced * protocols can't normally use this as they need to fit buffers in * and play with them. * * Inlined as it's very short and called for pretty much every * packet ever received. */ static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); skb->sk = sk; skb->destructor = sock_rfree; atomic_add(skb->truesize, &sk->sk_rmem_alloc); sk_mem_charge(sk, skb->truesize); } static inline __must_check bool skb_set_owner_sk_safe(struct sk_buff *skb, struct sock *sk) { if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) { skb_orphan(skb); skb->destructor = sock_efree; skb->sk = sk; return true; } return false; } static inline struct sk_buff *skb_clone_and_charge_r(struct sk_buff *skb, struct sock *sk) { skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC)); if (skb) { if (sk_rmem_schedule(sk, skb, skb->truesize)) { skb_set_owner_r(skb, sk); return skb; } __kfree_skb(skb); } return NULL; } static inline void skb_prepare_for_gro(struct sk_buff *skb) { if (skb->destructor != sock_wfree) { skb_orphan(skb); return; } skb->slow_gro = 1; } void sk_reset_timer(struct sock *sk, struct timer_list *timer, unsigned long expires); void sk_stop_timer(struct sock *sk, struct timer_list *timer); void sk_stop_timer_sync(struct sock *sk, struct timer_list *timer); int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue, struct sk_buff *skb, unsigned int flags, void (*destructor)(struct sock *sk, struct sk_buff *skb)); int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); int sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason *reason); static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { return sock_queue_rcv_skb_reason(sk, skb, NULL); } int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb); struct sk_buff *sock_dequeue_err_skb(struct sock *sk); /* * Recover an error report and clear atomically */ static inline int sock_error(struct sock *sk) { int err; /* Avoid an atomic operation for the common case. * This is racy since another cpu/thread can change sk_err under us. */ if (likely(data_race(!sk->sk_err))) return 0; err = xchg(&sk->sk_err, 0); return -err; } void sk_error_report(struct sock *sk); static inline unsigned long sock_wspace(struct sock *sk) { int amt = 0; if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { amt = sk->sk_sndbuf - refcount_read(&sk->sk_wmem_alloc); if (amt < 0) amt = 0; } return amt; } /* Note: * We use sk->sk_wq_raw, from contexts knowing this * pointer is not NULL and cannot disappear/change. */ static inline void sk_set_bit(int nr, struct sock *sk) { if ((nr == SOCKWQ_ASYNC_NOSPACE || nr == SOCKWQ_ASYNC_WAITDATA) && !sock_flag(sk, SOCK_FASYNC)) return; set_bit(nr, &sk->sk_wq_raw->flags); } static inline void sk_clear_bit(int nr, struct sock *sk) { if ((nr == SOCKWQ_ASYNC_NOSPACE || nr == SOCKWQ_ASYNC_WAITDATA) && !sock_flag(sk, SOCK_FASYNC)) return; clear_bit(nr, &sk->sk_wq_raw->flags); } static inline void sk_wake_async(const struct sock *sk, int how, int band) { if (sock_flag(sk, SOCK_FASYNC)) { rcu_read_lock(); sock_wake_async(rcu_dereference(sk->sk_wq), how, band); rcu_read_unlock(); } } static inline void sk_wake_async_rcu(const struct sock *sk, int how, int band) { if (unlikely(sock_flag(sk, SOCK_FASYNC))) sock_wake_async(rcu_dereference(sk->sk_wq), how, band); } /* Since sk_{r,w}mem_alloc sums skb->truesize, even a small frame might * need sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak. * Note: for send buffers, TCP works better if we can build two skbs at * minimum. */ #define TCP_SKB_MIN_TRUESIZE (2048 + SKB_DATA_ALIGN(sizeof(struct sk_buff))) #define SOCK_MIN_SNDBUF (TCP_SKB_MIN_TRUESIZE * 2) #define SOCK_MIN_RCVBUF TCP_SKB_MIN_TRUESIZE static inline void sk_stream_moderate_sndbuf(struct sock *sk) { u32 val; if (sk->sk_userlocks & SOCK_SNDBUF_LOCK) return; val = min(sk->sk_sndbuf, sk->sk_wmem_queued >> 1); val = max_t(u32, val, sk_unused_reserved_mem(sk)); WRITE_ONCE(sk->sk_sndbuf, max_t(u32, val, SOCK_MIN_SNDBUF)); } /** * sk_page_frag - return an appropriate page_frag * @sk: socket * * Use the per task page_frag instead of the per socket one for * optimization when we know that we're in process context and own * everything that's associated with %current. * * Both direct reclaim and page faults can nest inside other * socket operations and end up recursing into sk_page_frag() * while it's already in use: explicitly avoid task page_frag * when users disable sk_use_task_frag. * * Return: a per task page_frag if context allows that, * otherwise a per socket one. */ static inline struct page_frag *sk_page_frag(struct sock *sk) { if (sk->sk_use_task_frag) return ¤t->task_frag; return &sk->sk_frag; } bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag); /* * Default write policy as shown to user space via poll/select/SIGIO */ static inline bool sock_writeable(const struct sock *sk) { return refcount_read(&sk->sk_wmem_alloc) < (READ_ONCE(sk->sk_sndbuf) >> 1); } static inline gfp_t gfp_any(void) { return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } static inline gfp_t gfp_memcg_charge(void) { return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : sk->sk_rcvtimeo; } static inline long sock_sndtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : sk->sk_sndtimeo; } static inline int sock_rcvlowat(const struct sock *sk, int waitall, int len) { int v = waitall ? len : min_t(int, READ_ONCE(sk->sk_rcvlowat), len); return v ?: 1; } /* Alas, with timeout socket operations are not restartable. * Compare this to poll(). */ static inline int sock_intr_errno(long timeo) { return timeo == MAX_SCHEDULE_TIMEOUT ? -ERESTARTSYS : -EINTR; } struct sock_skb_cb { u32 dropcount; }; /* Store sock_skb_cb at the end of skb->cb[] so protocol families * using skb->cb[] would keep using it directly and utilize its * alignement guarantee. */ #define SOCK_SKB_CB_OFFSET ((sizeof_field(struct sk_buff, cb) - \ sizeof(struct sock_skb_cb))) #define SOCK_SKB_CB(__skb) ((struct sock_skb_cb *)((__skb)->cb + \ SOCK_SKB_CB_OFFSET)) #define sock_skb_cb_check_size(size) \ BUILD_BUG_ON((size) > SOCK_SKB_CB_OFFSET) static inline void sock_skb_set_dropcount(const struct sock *sk, struct sk_buff *skb) { SOCK_SKB_CB(skb)->dropcount = sock_flag(sk, SOCK_RXQ_OVFL) ? atomic_read(&sk->sk_drops) : 0; } static inline void sk_drops_add(struct sock *sk, const struct sk_buff *skb) { int segs = max_t(u16, 1, skb_shinfo(skb)->gso_segs); atomic_add(segs, &sk->sk_drops); } static inline ktime_t sock_read_timestamp(struct sock *sk) { #if BITS_PER_LONG==32 unsigned int seq; ktime_t kt; do { seq = read_seqbegin(&sk->sk_stamp_seq); kt = sk->sk_stamp; } while (read_seqretry(&sk->sk_stamp_seq, seq)); return kt; #else return READ_ONCE(sk->sk_stamp); #endif } static inline void sock_write_timestamp(struct sock *sk, ktime_t kt) { #if BITS_PER_LONG==32 write_seqlock(&sk->sk_stamp_seq); sk->sk_stamp = kt; write_sequnlock(&sk->sk_stamp_seq); #else WRITE_ONCE(sk->sk_stamp, kt); #endif } void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); static inline void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); u32 tsflags = READ_ONCE(sk->sk_tsflags); ktime_t kt = skb->tstamp; /* * generate control messages if * - receive time stamping in software requested * - software time stamp available and wanted * - hardware time stamps available and wanted */ if (sock_flag(sk, SOCK_RCVTSTAMP) || (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || (hwtstamps->hwtstamp && (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); else sock_write_timestamp(sk, kt); if (sock_flag(sk, SOCK_WIFI_STATUS) && skb_wifi_acked_valid(skb)) __sock_recv_wifi_status(msg, sk, skb); } void __sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); #define SK_DEFAULT_STAMP (-1L * NSEC_PER_SEC) static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL) | \ (1UL << SOCK_RCVTSTAMP) | \ (1UL << SOCK_RCVMARK)) #define TSFLAGS_ANY (SOF_TIMESTAMPING_SOFTWARE | \ SOF_TIMESTAMPING_RAW_HARDWARE) if (sk->sk_flags & FLAGS_RECV_CMSGS || READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY) __sock_recv_cmsgs(msg, sk, skb); else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) sock_write_timestamp(sk, skb->tstamp); else if (unlikely(sock_read_timestamp(sk) == SK_DEFAULT_STAMP)) sock_write_timestamp(sk, 0); } void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags); /** * _sock_tx_timestamp - checks whether the outgoing packet is to be time stamped * @sk: socket sending this packet * @tsflags: timestamping flags to use * @tx_flags: completed with instructions for time stamping * @tskey: filled in with next sk_tskey (not for TCP, which uses seqno) * * Note: callers should take care of initial ``*tx_flags`` value (usually 0) */ static inline void _sock_tx_timestamp(struct sock *sk, __u16 tsflags, __u8 *tx_flags, __u32 *tskey) { if (unlikely(tsflags)) { __sock_tx_timestamp(tsflags, tx_flags); if (tsflags & SOF_TIMESTAMPING_OPT_ID && tskey && tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) *tskey = atomic_inc_return(&sk->sk_tskey) - 1; } if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) *tx_flags |= SKBTX_WIFI_STATUS; } static inline void sock_tx_timestamp(struct sock *sk, __u16 tsflags, __u8 *tx_flags) { _sock_tx_timestamp(sk, tsflags, tx_flags, NULL); } static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags) { _sock_tx_timestamp(skb->sk, tsflags, &skb_shinfo(skb)->tx_flags, &skb_shinfo(skb)->tskey); } static inline bool sk_is_inet(const struct sock *sk) { int family = READ_ONCE(sk->sk_family); return family == AF_INET || family == AF_INET6; } static inline bool sk_is_tcp(const struct sock *sk) { return sk_is_inet(sk) && sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP; } static inline bool sk_is_udp(const struct sock *sk) { return sk_is_inet(sk) && sk->sk_type == SOCK_DGRAM && sk->sk_protocol == IPPROTO_UDP; } static inline bool sk_is_stream_unix(const struct sock *sk) { return sk->sk_family == AF_UNIX && sk->sk_type == SOCK_STREAM; } /** * sk_eat_skb - Release a skb if it is no longer needed * @sk: socket to eat this skb from * @skb: socket buffer to eat * * This routine must be called with interrupts disabled or with the socket * locked so that the sk_buff queue operation is ok. */ static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb) { __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); } static inline bool skb_sk_is_prefetched(struct sk_buff *skb) { #ifdef CONFIG_INET return skb->destructor == sock_pfree; #else return false; #endif /* CONFIG_INET */ } /* This helper checks if a socket is a full socket, * ie _not_ a timewait or request socket. */ static inline bool sk_fullsock(const struct sock *sk) { return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); } static inline bool sk_is_refcounted(struct sock *sk) { /* Only full sockets have sk->sk_flags. */ return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE); } /* Checks if this SKB belongs to an HW offloaded socket * and whether any SW fallbacks are required based on dev. * Check decrypted mark in case skb_orphan() cleared socket. */ static inline struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb, struct net_device *dev) { #ifdef CONFIG_SOCK_VALIDATE_XMIT struct sock *sk = skb->sk; if (sk && sk_fullsock(sk) && sk->sk_validate_xmit_skb) { skb = sk->sk_validate_xmit_skb(sk, dev, skb); } else if (unlikely(skb_is_decrypted(skb))) { pr_warn_ratelimited("unencrypted skb with no associated socket - dropping\n"); kfree_skb(skb); skb = NULL; } #endif return skb; } /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) */ static inline bool sk_listener(const struct sock *sk) { return (1 << sk->sk_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); } void sock_enable_timestamp(struct sock *sk, enum sock_flags flag); int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, int level, int type); bool sk_ns_capable(const struct sock *sk, struct user_namespace *user_ns, int cap); bool sk_capable(const struct sock *sk, int cap); bool sk_net_capable(const struct sock *sk, int cap); void sk_get_meminfo(const struct sock *sk, u32 *meminfo); /* Take into consideration the size of the struct sk_buff overhead in the * determination of these values, since that is non-constant across * platforms. This makes socket queueing behavior and performance * not depend upon such differences. */ #define _SK_MEM_PACKETS 256 #define _SK_MEM_OVERHEAD SKB_TRUESIZE(256) #define SK_WMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) #define SK_RMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; extern int sysctl_tstamp_allow_data; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; #define SKB_FRAG_PAGE_ORDER get_order(32768) DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ if (proto->sysctl_wmem_offset) return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset)); return READ_ONCE(*proto->sysctl_wmem); } static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_rmem ? */ if (proto->sysctl_rmem_offset) return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset)); return READ_ONCE(*proto->sysctl_rmem); } /* Default TCP Small queue budget is ~1 ms of data (1sec >> 10) * Some wifi drivers need to tweak it to get more chunks. * They can use this helper from their ndo_start_xmit() */ static inline void sk_pacing_shift_update(struct sock *sk, int val) { if (!sk || !sk_fullsock(sk) || READ_ONCE(sk->sk_pacing_shift) == val) return; WRITE_ONCE(sk->sk_pacing_shift, val); } /* if a socket is bound to a device, check that the given device * index is either the same or that the socket is bound to an L3 * master device and the given device index is also enslaved to * that L3 master */ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) { int bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); int mdif; if (!bound_dev_if || bound_dev_if == dif) return true; mdif = l3mdev_master_ifindex_by_index(sock_net(sk), dif); if (mdif && mdif == bound_dev_if) return true; return false; } void sock_def_readable(struct sock *sk); int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); void sock_set_timestamp(struct sock *sk, int optname, bool valbool); int sock_set_timestamping(struct sock *sk, int optname, struct so_timestamping timestamping); void sock_enable_timestamps(struct sock *sk); void sock_no_linger(struct sock *sk); void sock_set_keepalive(struct sock *sk); void sock_set_priority(struct sock *sk, u32 priority); void sock_set_rcvbuf(struct sock *sk, int val); void sock_set_mark(struct sock *sk, u32 val); void sock_set_reuseaddr(struct sock *sk); void sock_set_reuseport(struct sock *sk); void sock_set_sndtimeo(struct sock *sk, s64 secs); int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len); int sock_get_timeout(long timeo, void *optval, bool old_timeval); int sock_copy_user_timeval(struct __kernel_sock_timeval *tv, sockptr_t optval, int optlen, bool old_timeval); int sock_ioctl_inout(struct sock *sk, unsigned int cmd, void __user *arg, void *karg, size_t size); int sk_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); static inline bool sk_is_readable(struct sock *sk) { if (sk->sk_prot->sock_is_readable) return sk->sk_prot->sock_is_readable(sk); return false; } #endif /* _SOCK_H */ |
| 71 39 131 54 133 33 27 28 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM power #if !defined(_TRACE_POWER_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_POWER_H #include <linux/cpufreq.h> #include <linux/ktime.h> #include <linux/pm_qos.h> #include <linux/tracepoint.h> #include <linux/trace_events.h> #define TPS(x) tracepoint_string(x) DECLARE_EVENT_CLASS(cpu, TP_PROTO(unsigned int state, unsigned int cpu_id), TP_ARGS(state, cpu_id), TP_STRUCT__entry( __field( u32, state ) __field( u32, cpu_id ) ), TP_fast_assign( __entry->state = state; __entry->cpu_id = cpu_id; ), TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state, (unsigned long)__entry->cpu_id) ); DEFINE_EVENT(cpu, cpu_idle, TP_PROTO(unsigned int state, unsigned int cpu_id), TP_ARGS(state, cpu_id) ); TRACE_EVENT(cpu_idle_miss, TP_PROTO(unsigned int cpu_id, unsigned int state, bool below), TP_ARGS(cpu_id, state, below), TP_STRUCT__entry( __field(u32, cpu_id) __field(u32, state) __field(bool, below) ), TP_fast_assign( __entry->cpu_id = cpu_id; __entry->state = state; __entry->below = below; ), TP_printk("cpu_id=%lu state=%lu type=%s", (unsigned long)__entry->cpu_id, (unsigned long)__entry->state, (__entry->below)?"below":"above") ); TRACE_EVENT(powernv_throttle, TP_PROTO(int chip_id, const char *reason, int pmax), TP_ARGS(chip_id, reason, pmax), TP_STRUCT__entry( __field(int, chip_id) __string(reason, reason) __field(int, pmax) ), TP_fast_assign( __entry->chip_id = chip_id; __assign_str(reason); __entry->pmax = pmax; ), TP_printk("Chip %d Pmax %d %s", __entry->chip_id, __entry->pmax, __get_str(reason)) ); TRACE_EVENT(pstate_sample, TP_PROTO(u32 core_busy, u32 scaled_busy, u32 from, u32 to, u64 mperf, u64 aperf, u64 tsc, u32 freq, u32 io_boost ), TP_ARGS(core_busy, scaled_busy, from, to, mperf, aperf, tsc, freq, io_boost ), TP_STRUCT__entry( __field(u32, core_busy) __field(u32, scaled_busy) __field(u32, from) __field(u32, to) __field(u64, mperf) __field(u64, aperf) __field(u64, tsc) __field(u32, freq) __field(u32, io_boost) ), TP_fast_assign( __entry->core_busy = core_busy; __entry->scaled_busy = scaled_busy; __entry->from = from; __entry->to = to; __entry->mperf = mperf; __entry->aperf = aperf; __entry->tsc = tsc; __entry->freq = freq; __entry->io_boost = io_boost; ), TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu io_boost=%lu", (unsigned long)__entry->core_busy, (unsigned long)__entry->scaled_busy, (unsigned long)__entry->from, (unsigned long)__entry->to, (unsigned long long)__entry->mperf, (unsigned long long)__entry->aperf, (unsigned long long)__entry->tsc, (unsigned long)__entry->freq, (unsigned long)__entry->io_boost ) ); /* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */ #ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING #define _PWR_EVENT_AVOID_DOUBLE_DEFINING #define PWR_EVENT_EXIT -1 #endif #define pm_verb_symbolic(event) \ __print_symbolic(event, \ { PM_EVENT_SUSPEND, "suspend" }, \ { PM_EVENT_RESUME, "resume" }, \ { PM_EVENT_FREEZE, "freeze" }, \ { PM_EVENT_QUIESCE, "quiesce" }, \ { PM_EVENT_HIBERNATE, "hibernate" }, \ { PM_EVENT_THAW, "thaw" }, \ { PM_EVENT_RESTORE, "restore" }, \ { PM_EVENT_RECOVER, "recover" }) DEFINE_EVENT(cpu, cpu_frequency, TP_PROTO(unsigned int frequency, unsigned int cpu_id), TP_ARGS(frequency, cpu_id) ); TRACE_EVENT(cpu_frequency_limits, TP_PROTO(struct cpufreq_policy *policy), TP_ARGS(policy), TP_STRUCT__entry( __field(u32, min_freq) __field(u32, max_freq) __field(u32, cpu_id) ), TP_fast_assign( __entry->min_freq = policy->min; __entry->max_freq = policy->max; __entry->cpu_id = policy->cpu; ), TP_printk("min=%lu max=%lu cpu_id=%lu", (unsigned long)__entry->min_freq, (unsigned long)__entry->max_freq, (unsigned long)__entry->cpu_id) ); TRACE_EVENT(device_pm_callback_start, TP_PROTO(struct device *dev, const char *pm_ops, int event), TP_ARGS(dev, pm_ops, event), TP_STRUCT__entry( __string(device, dev_name(dev)) __string(driver, dev_driver_string(dev)) __string(parent, dev->parent ? dev_name(dev->parent) : "none") __string(pm_ops, pm_ops ? pm_ops : "none ") __field(int, event) ), TP_fast_assign( __assign_str(device); __assign_str(driver); __assign_str(parent); __assign_str(pm_ops); __entry->event = event; ), TP_printk("%s %s, parent: %s, %s[%s]", __get_str(driver), __get_str(device), __get_str(parent), __get_str(pm_ops), pm_verb_symbolic(__entry->event)) ); TRACE_EVENT(device_pm_callback_end, TP_PROTO(struct device *dev, int error), TP_ARGS(dev, error), TP_STRUCT__entry( __string(device, dev_name(dev)) __string(driver, dev_driver_string(dev)) __field(int, error) ), TP_fast_assign( __assign_str(device); __assign_str(driver); __entry->error = error; ), TP_printk("%s %s, err=%d", __get_str(driver), __get_str(device), __entry->error) ); TRACE_EVENT(suspend_resume, TP_PROTO(const char *action, int val, bool start), TP_ARGS(action, val, start), TP_STRUCT__entry( __field(const char *, action) __field(int, val) __field(bool, start) ), TP_fast_assign( __entry->action = action; __entry->val = val; __entry->start = start; ), TP_printk("%s[%u] %s", __entry->action, (unsigned int)__entry->val, (__entry->start)?"begin":"end") ); DECLARE_EVENT_CLASS(wakeup_source, TP_PROTO(const char *name, unsigned int state), TP_ARGS(name, state), TP_STRUCT__entry( __string( name, name ) __field( u64, state ) ), TP_fast_assign( __assign_str(name); __entry->state = state; ), TP_printk("%s state=0x%lx", __get_str(name), (unsigned long)__entry->state) ); DEFINE_EVENT(wakeup_source, wakeup_source_activate, TP_PROTO(const char *name, unsigned int state), TP_ARGS(name, state) ); DEFINE_EVENT(wakeup_source, wakeup_source_deactivate, TP_PROTO(const char *name, unsigned int state), TP_ARGS(name, state) ); /* * The clock events are used for clock enable/disable and for * clock rate change */ DECLARE_EVENT_CLASS(clock, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id), TP_STRUCT__entry( __string( name, name ) __field( u64, state ) __field( u64, cpu_id ) ), TP_fast_assign( __assign_str(name); __entry->state = state; __entry->cpu_id = cpu_id; ), TP_printk("%s state=%lu cpu_id=%lu", __get_str(name), (unsigned long)__entry->state, (unsigned long)__entry->cpu_id) ); DEFINE_EVENT(clock, clock_enable, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id) ); DEFINE_EVENT(clock, clock_disable, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id) ); DEFINE_EVENT(clock, clock_set_rate, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id) ); /* * The power domain events are used for power domains transitions */ DECLARE_EVENT_CLASS(power_domain, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id), TP_STRUCT__entry( __string( name, name ) __field( u64, state ) __field( u64, cpu_id ) ), TP_fast_assign( __assign_str(name); __entry->state = state; __entry->cpu_id = cpu_id; ), TP_printk("%s state=%lu cpu_id=%lu", __get_str(name), (unsigned long)__entry->state, (unsigned long)__entry->cpu_id) ); DEFINE_EVENT(power_domain, power_domain_target, TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id), TP_ARGS(name, state, cpu_id) ); /* * CPU latency QoS events used for global CPU latency QoS list updates */ DECLARE_EVENT_CLASS(cpu_latency_qos_request, TP_PROTO(s32 value), TP_ARGS(value), TP_STRUCT__entry( __field( s32, value ) ), TP_fast_assign( __entry->value = value; ), TP_printk("CPU_DMA_LATENCY value=%d", __entry->value) ); DEFINE_EVENT(cpu_latency_qos_request, pm_qos_add_request, TP_PROTO(s32 value), TP_ARGS(value) ); DEFINE_EVENT(cpu_latency_qos_request, pm_qos_update_request, TP_PROTO(s32 value), TP_ARGS(value) ); DEFINE_EVENT(cpu_latency_qos_request, pm_qos_remove_request, TP_PROTO(s32 value), TP_ARGS(value) ); /* * General PM QoS events used for updates of PM QoS request lists */ DECLARE_EVENT_CLASS(pm_qos_update, TP_PROTO(enum pm_qos_req_action action, int prev_value, int curr_value), TP_ARGS(action, prev_value, curr_value), TP_STRUCT__entry( __field( enum pm_qos_req_action, action ) __field( int, prev_value ) __field( int, curr_value ) ), TP_fast_assign( __entry->action = action; __entry->prev_value = prev_value; __entry->curr_value = curr_value; ), TP_printk("action=%s prev_value=%d curr_value=%d", __print_symbolic(__entry->action, { PM_QOS_ADD_REQ, "ADD_REQ" }, { PM_QOS_UPDATE_REQ, "UPDATE_REQ" }, { PM_QOS_REMOVE_REQ, "REMOVE_REQ" }), __entry->prev_value, __entry->curr_value) ); DEFINE_EVENT(pm_qos_update, pm_qos_update_target, TP_PROTO(enum pm_qos_req_action action, int prev_value, int curr_value), TP_ARGS(action, prev_value, curr_value) ); DEFINE_EVENT_PRINT(pm_qos_update, pm_qos_update_flags, TP_PROTO(enum pm_qos_req_action action, int prev_value, int curr_value), TP_ARGS(action, prev_value, curr_value), TP_printk("action=%s prev_value=0x%x curr_value=0x%x", __print_symbolic(__entry->action, { PM_QOS_ADD_REQ, "ADD_REQ" }, { PM_QOS_UPDATE_REQ, "UPDATE_REQ" }, { PM_QOS_REMOVE_REQ, "REMOVE_REQ" }), __entry->prev_value, __entry->curr_value) ); DECLARE_EVENT_CLASS(dev_pm_qos_request, TP_PROTO(const char *name, enum dev_pm_qos_req_type type, s32 new_value), TP_ARGS(name, type, new_value), TP_STRUCT__entry( __string( name, name ) __field( enum dev_pm_qos_req_type, type ) __field( s32, new_value ) ), TP_fast_assign( __assign_str(name); __entry->type = type; __entry->new_value = new_value; ), TP_printk("device=%s type=%s new_value=%d", __get_str(name), __print_symbolic(__entry->type, { DEV_PM_QOS_RESUME_LATENCY, "DEV_PM_QOS_RESUME_LATENCY" }, { DEV_PM_QOS_FLAGS, "DEV_PM_QOS_FLAGS" }), __entry->new_value) ); DEFINE_EVENT(dev_pm_qos_request, dev_pm_qos_add_request, TP_PROTO(const char *name, enum dev_pm_qos_req_type type, s32 new_value), TP_ARGS(name, type, new_value) ); DEFINE_EVENT(dev_pm_qos_request, dev_pm_qos_update_request, TP_PROTO(const char *name, enum dev_pm_qos_req_type type, s32 new_value), TP_ARGS(name, type, new_value) ); DEFINE_EVENT(dev_pm_qos_request, dev_pm_qos_remove_request, TP_PROTO(const char *name, enum dev_pm_qos_req_type type, s32 new_value), TP_ARGS(name, type, new_value) ); TRACE_EVENT(guest_halt_poll_ns, TP_PROTO(bool grow, unsigned int new, unsigned int old), TP_ARGS(grow, new, old), TP_STRUCT__entry( __field(bool, grow) __field(unsigned int, new) __field(unsigned int, old) ), TP_fast_assign( __entry->grow = grow; __entry->new = new; __entry->old = old; ), TP_printk("halt_poll_ns %u (%s %u)", __entry->new, __entry->grow ? "grow" : "shrink", __entry->old) ); #define trace_guest_halt_poll_ns_grow(new, old) \ trace_guest_halt_poll_ns(true, new, old) #define trace_guest_halt_poll_ns_shrink(new, old) \ trace_guest_halt_poll_ns(false, new, old) #endif /* _TRACE_POWER_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 2374 2297 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_USB_TYPEC_H #define __LINUX_USB_TYPEC_H #include <linux/types.h> /* USB Type-C Specification releases */ #define USB_TYPEC_REV_1_0 0x100 /* 1.0 */ #define USB_TYPEC_REV_1_1 0x110 /* 1.1 */ #define USB_TYPEC_REV_1_2 0x120 /* 1.2 */ #define USB_TYPEC_REV_1_3 0x130 /* 1.3 */ #define USB_TYPEC_REV_1_4 0x140 /* 1.4 */ #define USB_TYPEC_REV_2_0 0x200 /* 2.0 */ struct typec_partner; struct typec_cable; struct typec_plug; struct typec_port; struct typec_altmode_ops; struct typec_cable_ops; struct fwnode_handle; struct device; struct usb_power_delivery; struct usb_power_delivery_desc; enum typec_port_type { TYPEC_PORT_SRC, TYPEC_PORT_SNK, TYPEC_PORT_DRP, }; enum typec_port_data { TYPEC_PORT_DFP, TYPEC_PORT_UFP, TYPEC_PORT_DRD, }; enum typec_plug_type { USB_PLUG_NONE, USB_PLUG_TYPE_A, USB_PLUG_TYPE_B, USB_PLUG_TYPE_C, USB_PLUG_CAPTIVE, }; enum typec_data_role { TYPEC_DEVICE, TYPEC_HOST, }; enum typec_role { TYPEC_SINK, TYPEC_SOURCE, }; static inline int is_sink(enum typec_role role) { return role == TYPEC_SINK; } static inline int is_source(enum typec_role role) { return role == TYPEC_SOURCE; } enum typec_pwr_opmode { TYPEC_PWR_MODE_USB, TYPEC_PWR_MODE_1_5A, TYPEC_PWR_MODE_3_0A, TYPEC_PWR_MODE_PD, }; enum typec_accessory { TYPEC_ACCESSORY_NONE, TYPEC_ACCESSORY_AUDIO, TYPEC_ACCESSORY_DEBUG, }; #define TYPEC_MAX_ACCESSORY 3 enum typec_orientation { TYPEC_ORIENTATION_NONE, TYPEC_ORIENTATION_NORMAL, TYPEC_ORIENTATION_REVERSE, }; /* * struct enter_usb_data - Enter_USB Message details * @eudo: Enter_USB Data Object * @active_link_training: Active Cable Plug Link Training * * @active_link_training is a flag that should be set with uni-directional SBRX * communication, and left 0 with passive cables and with bi-directional SBRX * communication. */ struct enter_usb_data { u32 eudo; unsigned char active_link_training:1; }; /* * struct usb_pd_identity - USB Power Delivery identity data * @id_header: ID Header VDO * @cert_stat: Cert Stat VDO * @product: Product VDO * @vdo: Product Type Specific VDOs * * USB power delivery Discover Identity command response data. * * REVISIT: This is USB Power Delivery specific information, so this structure * probable belongs to USB Power Delivery header file once we have them. */ struct usb_pd_identity { u32 id_header; u32 cert_stat; u32 product; u32 vdo[3]; }; int typec_partner_set_identity(struct typec_partner *partner); int typec_cable_set_identity(struct typec_cable *cable); /* * struct typec_altmode_desc - USB Type-C Alternate Mode Descriptor * @svid: Standard or Vendor ID * @mode: Index of the Mode * @vdo: VDO returned by Discover Modes USB PD command * @roles: Only for ports. DRP if the mode is available in both roles * * Description of an Alternate Mode which a connector, cable plug or partner * supports. */ struct typec_altmode_desc { u16 svid; u8 mode; u32 vdo; /* Only used with ports */ enum typec_port_data roles; }; void typec_partner_set_pd_revision(struct typec_partner *partner, u16 pd_revision); int typec_partner_set_num_altmodes(struct typec_partner *partner, int num_altmodes); struct typec_altmode *typec_partner_register_altmode(struct typec_partner *partner, const struct typec_altmode_desc *desc); int typec_plug_set_num_altmodes(struct typec_plug *plug, int num_altmodes); struct typec_altmode *typec_plug_register_altmode(struct typec_plug *plug, const struct typec_altmode_desc *desc); struct typec_altmode *typec_port_register_altmode(struct typec_port *port, const struct typec_altmode_desc *desc); void typec_port_register_altmodes(struct typec_port *port, const struct typec_altmode_ops *ops, void *drvdata, struct typec_altmode **altmodes, size_t n); void typec_port_register_cable_ops(struct typec_altmode **altmodes, int max_altmodes, const struct typec_cable_ops *ops); void typec_unregister_altmode(struct typec_altmode *altmode); struct typec_port *typec_altmode2port(struct typec_altmode *alt); void typec_altmode_update_active(struct typec_altmode *alt, bool active); void typec_altmode_set_ops(struct typec_altmode *alt, const struct typec_altmode_ops *ops); enum typec_plug_index { TYPEC_PLUG_SOP_P, TYPEC_PLUG_SOP_PP, }; /* * struct typec_plug_desc - USB Type-C Cable Plug Descriptor * @index: SOP Prime for the plug connected to DFP and SOP Double Prime for the * plug connected to UFP * * Represents USB Type-C Cable Plug. */ struct typec_plug_desc { enum typec_plug_index index; }; /* * struct typec_cable_desc - USB Type-C Cable Descriptor * @type: The plug type from USB PD Cable VDO * @active: Is the cable active or passive * @identity: Result of Discover Identity command * @pd_revision: USB Power Delivery Specification revision if supported * * Represents USB Type-C Cable attached to USB Type-C port. */ struct typec_cable_desc { enum typec_plug_type type; unsigned int active:1; struct usb_pd_identity *identity; u16 pd_revision; /* 0300H = "3.0" */ }; /* * struct typec_partner_desc - USB Type-C Partner Descriptor * @usb_pd: USB Power Delivery support * @accessory: Audio, Debug or none. * @identity: Discover Identity command data * @pd_revision: USB Power Delivery Specification Revision if supported * @attach: Notification about attached USB device * @deattach: Notification about removed USB device * * Details about a partner that is attached to USB Type-C port. If @identity * member exists when partner is registered, a directory named "identity" is * created to sysfs for the partner device. * * @pd_revision is based on the setting of the "Specification Revision" field * in the message header on the initial "Source Capabilities" message received * from the partner, or a "Request" message received from the partner, depending * on whether our port is a Sink or a Source. */ struct typec_partner_desc { unsigned int usb_pd:1; enum typec_accessory accessory; struct usb_pd_identity *identity; u16 pd_revision; /* 0300H = "3.0" */ void (*attach)(struct typec_partner *partner, struct device *dev); void (*deattach)(struct typec_partner *partner, struct device *dev); }; /** * struct typec_operations - USB Type-C Port Operations * @try_role: Set data role preference for DRP port * @dr_set: Set Data Role * @pr_set: Set Power Role * @vconn_set: Source VCONN * @port_type_set: Set port type * @pd_get: Get available USB Power Delivery Capabilities. * @pd_set: Set USB Power Delivery Capabilities. */ struct typec_operations { int (*try_role)(struct typec_port *port, int role); int (*dr_set)(struct typec_port *port, enum typec_data_role role); int (*pr_set)(struct typec_port *port, enum typec_role role); int (*vconn_set)(struct typec_port *port, enum typec_role role); int (*port_type_set)(struct typec_port *port, enum typec_port_type type); struct usb_power_delivery **(*pd_get)(struct typec_port *port); int (*pd_set)(struct typec_port *port, struct usb_power_delivery *pd); }; enum usb_pd_svdm_ver { SVDM_VER_1_0 = 0, SVDM_VER_2_0 = 1, SVDM_VER_MAX = SVDM_VER_2_0, }; /* * struct typec_capability - USB Type-C Port Capabilities * @type: Supported power role of the port * @data: Supported data role of the port * @revision: USB Type-C Specification release. Binary coded decimal * @pd_revision: USB Power Delivery Specification revision if supported * @svdm_version: USB PD Structured VDM version if supported * @prefer_role: Initial role preference (DRP ports). * @accessory: Supported Accessory Modes * @fwnode: Optional fwnode of the port * @driver_data: Private pointer for driver specific info * @pd: Optional USB Power Delivery Support * @ops: Port operations vector * * Static capabilities of a single USB Type-C port. */ struct typec_capability { enum typec_port_type type; enum typec_port_data data; u16 revision; /* 0120H = "1.2" */ u16 pd_revision; /* 0300H = "3.0" */ enum usb_pd_svdm_ver svdm_version; int prefer_role; enum typec_accessory accessory[TYPEC_MAX_ACCESSORY]; unsigned int orientation_aware:1; struct fwnode_handle *fwnode; void *driver_data; struct usb_power_delivery *pd; const struct typec_operations *ops; }; /* Specific to try_role(). Indicates the user want's to clear the preference. */ #define TYPEC_NO_PREFERRED_ROLE (-1) struct typec_port *typec_register_port(struct device *parent, const struct typec_capability *cap); void typec_unregister_port(struct typec_port *port); struct typec_partner *typec_register_partner(struct typec_port *port, struct typec_partner_desc *desc); void typec_unregister_partner(struct typec_partner *partner); struct typec_cable *typec_register_cable(struct typec_port *port, struct typec_cable_desc *desc); void typec_unregister_cable(struct typec_cable *cable); struct typec_cable *typec_cable_get(struct typec_port *port); void typec_cable_put(struct typec_cable *cable); int typec_cable_is_active(struct typec_cable *cable); struct typec_plug *typec_register_plug(struct typec_cable *cable, struct typec_plug_desc *desc); void typec_unregister_plug(struct typec_plug *plug); void typec_set_data_role(struct typec_port *port, enum typec_data_role role); void typec_set_pwr_role(struct typec_port *port, enum typec_role role); void typec_set_vconn_role(struct typec_port *port, enum typec_role role); void typec_set_pwr_opmode(struct typec_port *port, enum typec_pwr_opmode mode); int typec_set_orientation(struct typec_port *port, enum typec_orientation orientation); enum typec_orientation typec_get_orientation(struct typec_port *port); int typec_set_mode(struct typec_port *port, int mode); void *typec_get_drvdata(struct typec_port *port); int typec_get_fw_cap(struct typec_capability *cap, struct fwnode_handle *fwnode); int typec_find_pwr_opmode(const char *name); int typec_find_orientation(const char *name); int typec_find_port_power_role(const char *name); int typec_find_power_role(const char *name); int typec_find_port_data_role(const char *name); void typec_partner_set_svdm_version(struct typec_partner *partner, enum usb_pd_svdm_ver svdm_version); int typec_get_negotiated_svdm_version(struct typec_port *port); int typec_get_cable_svdm_version(struct typec_port *port); void typec_cable_set_svdm_version(struct typec_cable *cable, enum usb_pd_svdm_ver svdm_version); struct usb_power_delivery *typec_partner_usb_power_delivery_register(struct typec_partner *partner, struct usb_power_delivery_desc *desc); int typec_port_set_usb_power_delivery(struct typec_port *port, struct usb_power_delivery *pd); int typec_partner_set_usb_power_delivery(struct typec_partner *partner, struct usb_power_delivery *pd); /** * struct typec_connector - Representation of Type-C port for external drivers * @attach: notification about device removal * @deattach: notification about device removal * * Drivers that control the USB and other ports (DisplayPorts, etc.), that are * connected to the Type-C connectors, can use these callbacks to inform the * Type-C connector class about connections and disconnections. That information * can then be used by the typec-port drivers to power on or off parts that are * needed or not needed - as an example, in USB mode if USB2 device is * enumerated, USB3 components (retimers, phys, and what have you) do not need * to be powered on. * * The attached (enumerated) devices will be liked with the typec-partner device. */ struct typec_connector { void (*attach)(struct typec_connector *con, struct device *dev); void (*deattach)(struct typec_connector *con, struct device *dev); }; static inline void typec_attach(struct typec_connector *con, struct device *dev) { if (con && con->attach) con->attach(con, dev); } static inline void typec_deattach(struct typec_connector *con, struct device *dev) { if (con && con->deattach) con->deattach(con, dev); } #endif /* __LINUX_USB_TYPEC_H */ |
| 235 236 234 4 738 28 207 234 2 6 6 1 8 2 2 2 49 6 52 52 42 52 45 1 48 2 1 30 45 29 238 240 240 239 2 240 25 465 14 491 490 491 555 557 557 556 184 184 184 184 139 491 492 491 489 17 485 344 165 545 49 842 10 10 10 10 150 139 30 840 846 842 842 841 1 841 845 842 10 842 509 6 341 565 839 1 778 70 4 2 6 829 13 1 2 1 11 13 11 11 9 2 6 5 1 1 1 492 491 2 488 1 1 2 242 240 240 242 241 242 242 240 3 2 3 3 6 217 224 245 223 56 11 11 7 233 231 67 224 247 3 8 4 78 78 55 24 5 3 2 1 3 2 6 11 43 15 25 9 6 1 27 6 27 8 23 2 17 6 2 18 15 6 9 8 18 4 6 4 4 15 17 67 11 75 3 48 30 78 60 5 3 3 2 2 2 2 620 717 723 14 620 631 632 628 629 200 199 200 200 182 7 734 734 736 737 736 348 346 348 28 16 16 1 3 7 5 10 2 2 2 1 1 35 35 2 3 1 24 5 4 2 22 1 26 20 4 1 2 3 1 17 15 1 2 14 3 12 5 10 1 10 10 1 2 1 6 1 1 10 10 1 1 10 1 3 2 10 10 10 8 2 6 6 5 1 1 3 2 14 8 6 8 6 11 10 10 10 7 10 269 1 1 12 266 267 1 269 264 1 4 2 8 7 7 7 7 6 6 3 3 1 1 2 3 1 4 2 1 12 5 8 8 4 1 7 21 3 18 21 14 12 11 12 9 9 3 7 10 2 2 2 1 4 9 1 7 7 1 5 1 1 18 18 3 3 1 2 2 1 1 1 1 8 8 8 8 8 8 4 4 4 4 4 4 6 3 4 6 6 2 5 4 5 2 6 6 8 8 8 8 4 4 5 8 260 261 742 742 740 641 741 741 353 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Generic address resolution entity * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> * * Fixes: * Vitaly E. Lavrov releasing NULL neighbor in neigh_add. * Harald Welte Add neighbour cache statistics like rtstat */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/slab.h> #include <linux/kmemleak.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/socket.h> #include <linux/netdevice.h> #include <linux/proc_fs.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif #include <linux/times.h> #include <net/net_namespace.h> #include <net/neighbour.h> #include <net/arp.h> #include <net/dst.h> #include <net/sock.h> #include <net/netevent.h> #include <net/netlink.h> #include <linux/rtnetlink.h> #include <linux/random.h> #include <linux/string.h> #include <linux/log2.h> #include <linux/inetdevice.h> #include <net/addrconf.h> #include <trace/events/neigh.h> #define NEIGH_DEBUG 1 #define neigh_dbg(level, fmt, ...) \ do { \ if (level <= NEIGH_DEBUG) \ pr_debug(fmt, ##__VA_ARGS__); \ } while (0) #define PNEIGH_HASHMASK 0xF static void neigh_timer_handler(struct timer_list *t); static void __neigh_notify(struct neighbour *n, int type, int flags, u32 pid); static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid); static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, struct net_device *dev); #ifdef CONFIG_PROC_FS static const struct seq_operations neigh_stat_seq_ops; #endif /* Neighbour hash table buckets are protected with rwlock tbl->lock. - All the scans/updates to hash buckets MUST be made under this lock. - NOTHING clever should be made under this lock: no callbacks to protocol backends, no attempts to send something to network. It will result in deadlocks, if backend/driver wants to use neighbour cache. - If the entry requires some non-trivial actions, increase its reference count and release table lock. Neighbour entries are protected: - with reference count. - with rwlock neigh->lock Reference count prevents destruction. neigh->lock mainly serializes ll address data and its validity state. However, the same lock is used to protect another entry fields: - timer - resolution queue Again, nothing clever shall be made under neigh->lock, the most complicated procedure, which we allow is dev->hard_header. It is supposed, that dev->hard_header is simplistic and does not make callbacks to neighbour tables. */ static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb) { kfree_skb(skb); return -ENETDOWN; } static void neigh_cleanup_and_release(struct neighbour *neigh) { trace_neigh_cleanup_and_release(neigh, 0); __neigh_notify(neigh, RTM_DELNEIGH, 0, 0); call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); neigh_release(neigh); } /* * It is random distribution in the interval (1/2)*base...(3/2)*base. * It corresponds to default IPv6 settings and is not overridable, * because it is really reasonable choice. */ unsigned long neigh_rand_reach_time(unsigned long base) { return base ? get_random_u32_below(base) + (base >> 1) : 0; } EXPORT_SYMBOL(neigh_rand_reach_time); static void neigh_mark_dead(struct neighbour *n) { n->dead = 1; if (!list_empty(&n->gc_list)) { list_del_init(&n->gc_list); atomic_dec(&n->tbl->gc_entries); } if (!list_empty(&n->managed_list)) list_del_init(&n->managed_list); } static void neigh_update_gc_list(struct neighbour *n) { bool on_gc_list, exempt_from_gc; write_lock_bh(&n->tbl->lock); write_lock(&n->lock); if (n->dead) goto out; /* remove from the gc list if new state is permanent or if neighbor * is externally learned; otherwise entry should be on the gc list */ exempt_from_gc = n->nud_state & NUD_PERMANENT || n->flags & NTF_EXT_LEARNED; on_gc_list = !list_empty(&n->gc_list); if (exempt_from_gc && on_gc_list) { list_del_init(&n->gc_list); atomic_dec(&n->tbl->gc_entries); } else if (!exempt_from_gc && !on_gc_list) { /* add entries to the tail; cleaning removes from the front */ list_add_tail(&n->gc_list, &n->tbl->gc_list); atomic_inc(&n->tbl->gc_entries); } out: write_unlock(&n->lock); write_unlock_bh(&n->tbl->lock); } static void neigh_update_managed_list(struct neighbour *n) { bool on_managed_list, add_to_managed; write_lock_bh(&n->tbl->lock); write_lock(&n->lock); if (n->dead) goto out; add_to_managed = n->flags & NTF_MANAGED; on_managed_list = !list_empty(&n->managed_list); if (!add_to_managed && on_managed_list) list_del_init(&n->managed_list); else if (add_to_managed && !on_managed_list) list_add_tail(&n->managed_list, &n->tbl->managed_list); out: write_unlock(&n->lock); write_unlock_bh(&n->tbl->lock); } static void neigh_update_flags(struct neighbour *neigh, u32 flags, int *notify, bool *gc_update, bool *managed_update) { u32 ndm_flags, old_flags = neigh->flags; if (!(flags & NEIGH_UPDATE_F_ADMIN)) return; ndm_flags = (flags & NEIGH_UPDATE_F_EXT_LEARNED) ? NTF_EXT_LEARNED : 0; ndm_flags |= (flags & NEIGH_UPDATE_F_MANAGED) ? NTF_MANAGED : 0; if ((old_flags ^ ndm_flags) & NTF_EXT_LEARNED) { if (ndm_flags & NTF_EXT_LEARNED) neigh->flags |= NTF_EXT_LEARNED; else neigh->flags &= ~NTF_EXT_LEARNED; *notify = 1; *gc_update = true; } if ((old_flags ^ ndm_flags) & NTF_MANAGED) { if (ndm_flags & NTF_MANAGED) neigh->flags |= NTF_MANAGED; else neigh->flags &= ~NTF_MANAGED; *notify = 1; *managed_update = true; } } static bool neigh_del(struct neighbour *n, struct neighbour __rcu **np, struct neigh_table *tbl) { bool retval = false; write_lock(&n->lock); if (refcount_read(&n->refcnt) == 1) { struct neighbour *neigh; neigh = rcu_dereference_protected(n->next, lockdep_is_held(&tbl->lock)); rcu_assign_pointer(*np, neigh); neigh_mark_dead(n); retval = true; } write_unlock(&n->lock); if (retval) neigh_cleanup_and_release(n); return retval; } bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl) { struct neigh_hash_table *nht; void *pkey = ndel->primary_key; u32 hash_val; struct neighbour *n; struct neighbour __rcu **np; nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); hash_val = tbl->hash(pkey, ndel->dev, nht->hash_rnd); hash_val = hash_val >> (32 - nht->hash_shift); np = &nht->hash_buckets[hash_val]; while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock)))) { if (n == ndel) return neigh_del(n, np, tbl); np = &n->next; } return false; } static int neigh_forced_gc(struct neigh_table *tbl) { int max_clean = atomic_read(&tbl->gc_entries) - READ_ONCE(tbl->gc_thresh2); u64 tmax = ktime_get_ns() + NSEC_PER_MSEC; unsigned long tref = jiffies - 5 * HZ; struct neighbour *n, *tmp; int shrunk = 0; int loop = 0; NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs); write_lock_bh(&tbl->lock); list_for_each_entry_safe(n, tmp, &tbl->gc_list, gc_list) { if (refcount_read(&n->refcnt) == 1) { bool remove = false; write_lock(&n->lock); if ((n->nud_state == NUD_FAILED) || (n->nud_state == NUD_NOARP) || (tbl->is_multicast && tbl->is_multicast(n->primary_key)) || !time_in_range(n->updated, tref, jiffies)) remove = true; write_unlock(&n->lock); if (remove && neigh_remove_one(n, tbl)) shrunk++; if (shrunk >= max_clean) break; if (++loop == 16) { if (ktime_get_ns() > tmax) goto unlock; loop = 0; } } } WRITE_ONCE(tbl->last_flush, jiffies); unlock: write_unlock_bh(&tbl->lock); return shrunk; } static void neigh_add_timer(struct neighbour *n, unsigned long when) { /* Use safe distance from the jiffies - LONG_MAX point while timer * is running in DELAY/PROBE state but still show to user space * large times in the past. */ unsigned long mint = jiffies - (LONG_MAX - 86400 * HZ); neigh_hold(n); if (!time_in_range(n->confirmed, mint, jiffies)) n->confirmed = mint; if (time_before(n->used, n->confirmed)) n->used = n->confirmed; if (unlikely(mod_timer(&n->timer, when))) { printk("NEIGH: BUG, double timer add, state is %x\n", n->nud_state); dump_stack(); } } static int neigh_del_timer(struct neighbour *n) { if ((n->nud_state & NUD_IN_TIMER) && del_timer(&n->timer)) { neigh_release(n); return 1; } return 0; } static struct neigh_parms *neigh_get_dev_parms_rcu(struct net_device *dev, int family) { switch (family) { case AF_INET: return __in_dev_arp_parms_get_rcu(dev); case AF_INET6: return __in6_dev_nd_parms_get_rcu(dev); } return NULL; } static void neigh_parms_qlen_dec(struct net_device *dev, int family) { struct neigh_parms *p; rcu_read_lock(); p = neigh_get_dev_parms_rcu(dev, family); if (p) p->qlen--; rcu_read_unlock(); } static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net, int family) { struct sk_buff_head tmp; unsigned long flags; struct sk_buff *skb; skb_queue_head_init(&tmp); spin_lock_irqsave(&list->lock, flags); skb = skb_peek(list); while (skb != NULL) { struct sk_buff *skb_next = skb_peek_next(skb, list); struct net_device *dev = skb->dev; if (net == NULL || net_eq(dev_net(dev), net)) { neigh_parms_qlen_dec(dev, family); __skb_unlink(skb, list); __skb_queue_tail(&tmp, skb); } skb = skb_next; } spin_unlock_irqrestore(&list->lock, flags); while ((skb = __skb_dequeue(&tmp))) { dev_put(skb->dev); kfree_skb(skb); } } static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, bool skip_perm) { int i; struct neigh_hash_table *nht; nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); for (i = 0; i < (1 << nht->hash_shift); i++) { struct neighbour *n; struct neighbour __rcu **np = &nht->hash_buckets[i]; while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock))) != NULL) { if (dev && n->dev != dev) { np = &n->next; continue; } if (skip_perm && n->nud_state & NUD_PERMANENT) { np = &n->next; continue; } rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(&tbl->lock))); write_lock(&n->lock); neigh_del_timer(n); neigh_mark_dead(n); if (refcount_read(&n->refcnt) != 1) { /* The most unpleasant situation. We must destroy neighbour entry, but someone still uses it. The destroy will be delayed until the last user releases us, but we must kill timers etc. and move it to safe state. */ __skb_queue_purge(&n->arp_queue); n->arp_queue_len_bytes = 0; WRITE_ONCE(n->output, neigh_blackhole); if (n->nud_state & NUD_VALID) n->nud_state = NUD_NOARP; else n->nud_state = NUD_NONE; neigh_dbg(2, "neigh %p is stray\n", n); } write_unlock(&n->lock); neigh_cleanup_and_release(n); } } } void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev, false); write_unlock_bh(&tbl->lock); } EXPORT_SYMBOL(neigh_changeaddr); static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev, bool skip_perm) { write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev, skip_perm); pneigh_ifdown_and_unlock(tbl, dev); pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL, tbl->family); if (skb_queue_empty_lockless(&tbl->proxy_queue)) del_timer_sync(&tbl->proxy_timer); return 0; } int neigh_carrier_down(struct neigh_table *tbl, struct net_device *dev) { __neigh_ifdown(tbl, dev, true); return 0; } EXPORT_SYMBOL(neigh_carrier_down); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { __neigh_ifdown(tbl, dev, false); return 0; } EXPORT_SYMBOL(neigh_ifdown); static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev, u32 flags, bool exempt_from_gc) { struct neighbour *n = NULL; unsigned long now = jiffies; int entries, gc_thresh3; if (exempt_from_gc) goto do_alloc; entries = atomic_inc_return(&tbl->gc_entries) - 1; gc_thresh3 = READ_ONCE(tbl->gc_thresh3); if (entries >= gc_thresh3 || (entries >= READ_ONCE(tbl->gc_thresh2) && time_after(now, READ_ONCE(tbl->last_flush) + 5 * HZ))) { if (!neigh_forced_gc(tbl) && entries >= gc_thresh3) { net_info_ratelimited("%s: neighbor table overflow!\n", tbl->id); NEIGH_CACHE_STAT_INC(tbl, table_fulls); goto out_entries; } } do_alloc: n = kzalloc(tbl->entry_size + dev->neigh_priv_len, GFP_ATOMIC); if (!n) goto out_entries; __skb_queue_head_init(&n->arp_queue); rwlock_init(&n->lock); seqlock_init(&n->ha_lock); n->updated = n->used = now; n->nud_state = NUD_NONE; n->output = neigh_blackhole; n->flags = flags; seqlock_init(&n->hh.hh_lock); n->parms = neigh_parms_clone(&tbl->parms); timer_setup(&n->timer, neigh_timer_handler, 0); NEIGH_CACHE_STAT_INC(tbl, allocs); n->tbl = tbl; refcount_set(&n->refcnt, 1); n->dead = 1; INIT_LIST_HEAD(&n->gc_list); INIT_LIST_HEAD(&n->managed_list); atomic_inc(&tbl->entries); out: return n; out_entries: if (!exempt_from_gc) atomic_dec(&tbl->gc_entries); goto out; } static void neigh_get_hash_rnd(u32 *x) { *x = get_random_u32() | 1; } static struct neigh_hash_table *neigh_hash_alloc(unsigned int shift) { size_t size = (1 << shift) * sizeof(struct neighbour *); struct neigh_hash_table *ret; struct neighbour __rcu **buckets; int i; ret = kmalloc(sizeof(*ret), GFP_ATOMIC); if (!ret) return NULL; if (size <= PAGE_SIZE) { buckets = kzalloc(size, GFP_ATOMIC); } else { buckets = (struct neighbour __rcu **) __get_free_pages(GFP_ATOMIC | __GFP_ZERO, get_order(size)); kmemleak_alloc(buckets, size, 1, GFP_ATOMIC); } if (!buckets) { kfree(ret); return NULL; } ret->hash_buckets = buckets; ret->hash_shift = shift; for (i = 0; i < NEIGH_NUM_HASH_RND; i++) neigh_get_hash_rnd(&ret->hash_rnd[i]); return ret; } static void neigh_hash_free_rcu(struct rcu_head *head) { struct neigh_hash_table *nht = container_of(head, struct neigh_hash_table, rcu); size_t size = (1 << nht->hash_shift) * sizeof(struct neighbour *); struct neighbour __rcu **buckets = nht->hash_buckets; if (size <= PAGE_SIZE) { kfree(buckets); } else { kmemleak_free(buckets); free_pages((unsigned long)buckets, get_order(size)); } kfree(nht); } static struct neigh_hash_table *neigh_hash_grow(struct neigh_table *tbl, unsigned long new_shift) { unsigned int i, hash; struct neigh_hash_table *new_nht, *old_nht; NEIGH_CACHE_STAT_INC(tbl, hash_grows); old_nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); new_nht = neigh_hash_alloc(new_shift); if (!new_nht) return old_nht; for (i = 0; i < (1 << old_nht->hash_shift); i++) { struct neighbour *n, *next; for (n = rcu_dereference_protected(old_nht->hash_buckets[i], lockdep_is_held(&tbl->lock)); n != NULL; n = next) { hash = tbl->hash(n->primary_key, n->dev, new_nht->hash_rnd); hash >>= (32 - new_nht->hash_shift); next = rcu_dereference_protected(n->next, lockdep_is_held(&tbl->lock)); rcu_assign_pointer(n->next, rcu_dereference_protected( new_nht->hash_buckets[hash], lockdep_is_held(&tbl->lock))); rcu_assign_pointer(new_nht->hash_buckets[hash], n); } } rcu_assign_pointer(tbl->nht, new_nht); call_rcu(&old_nht->rcu, neigh_hash_free_rcu); return new_nht; } struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey, struct net_device *dev) { struct neighbour *n; NEIGH_CACHE_STAT_INC(tbl, lookups); rcu_read_lock(); n = __neigh_lookup_noref(tbl, pkey, dev); if (n) { if (!refcount_inc_not_zero(&n->refcnt)) n = NULL; NEIGH_CACHE_STAT_INC(tbl, hits); } rcu_read_unlock(); return n; } EXPORT_SYMBOL(neigh_lookup); static struct neighbour * ___neigh_create(struct neigh_table *tbl, const void *pkey, struct net_device *dev, u32 flags, bool exempt_from_gc, bool want_ref) { u32 hash_val, key_len = tbl->key_len; struct neighbour *n1, *rc, *n; struct neigh_hash_table *nht; int error; n = neigh_alloc(tbl, dev, flags, exempt_from_gc); trace_neigh_create(tbl, dev, pkey, n, exempt_from_gc); if (!n) { rc = ERR_PTR(-ENOBUFS); goto out; } memcpy(n->primary_key, pkey, key_len); n->dev = dev; netdev_hold(dev, &n->dev_tracker, GFP_ATOMIC); /* Protocol specific setup. */ if (tbl->constructor && (error = tbl->constructor(n)) < 0) { rc = ERR_PTR(error); goto out_neigh_release; } if (dev->netdev_ops->ndo_neigh_construct) { error = dev->netdev_ops->ndo_neigh_construct(dev, n); if (error < 0) { rc = ERR_PTR(error); goto out_neigh_release; } } /* Device specific setup. */ if (n->parms->neigh_setup && (error = n->parms->neigh_setup(n)) < 0) { rc = ERR_PTR(error); goto out_neigh_release; } n->confirmed = jiffies - (NEIGH_VAR(n->parms, BASE_REACHABLE_TIME) << 1); write_lock_bh(&tbl->lock); nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); if (atomic_read(&tbl->entries) > (1 << nht->hash_shift)) nht = neigh_hash_grow(tbl, nht->hash_shift + 1); hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift); if (n->parms->dead) { rc = ERR_PTR(-EINVAL); goto out_tbl_unlock; } for (n1 = rcu_dereference_protected(nht->hash_buckets[hash_val], lockdep_is_held(&tbl->lock)); n1 != NULL; n1 = rcu_dereference_protected(n1->next, lockdep_is_held(&tbl->lock))) { if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) { if (want_ref) neigh_hold(n1); rc = n1; goto out_tbl_unlock; } } n->dead = 0; if (!exempt_from_gc) list_add_tail(&n->gc_list, &n->tbl->gc_list); if (n->flags & NTF_MANAGED) list_add_tail(&n->managed_list, &n->tbl->managed_list); if (want_ref) neigh_hold(n); rcu_assign_pointer(n->next, rcu_dereference_protected(nht->hash_buckets[hash_val], lockdep_is_held(&tbl->lock))); rcu_assign_pointer(nht->hash_buckets[hash_val], n); write_unlock_bh(&tbl->lock); neigh_dbg(2, "neigh %p is created\n", n); rc = n; out: return rc; out_tbl_unlock: write_unlock_bh(&tbl->lock); out_neigh_release: if (!exempt_from_gc) atomic_dec(&tbl->gc_entries); neigh_release(n); goto out; } struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey, struct net_device *dev, bool want_ref) { bool exempt_from_gc = !!(dev->flags & IFF_LOOPBACK); return ___neigh_create(tbl, pkey, dev, 0, exempt_from_gc, want_ref); } EXPORT_SYMBOL(__neigh_create); static u32 pneigh_hash(const void *pkey, unsigned int key_len) { u32 hash_val = *(u32 *)(pkey + key_len - 4); hash_val ^= (hash_val >> 16); hash_val ^= hash_val >> 8; hash_val ^= hash_val >> 4; hash_val &= PNEIGH_HASHMASK; return hash_val; } static struct pneigh_entry *__pneigh_lookup_1(struct pneigh_entry *n, struct net *net, const void *pkey, unsigned int key_len, struct net_device *dev) { while (n) { if (!memcmp(n->key, pkey, key_len) && net_eq(pneigh_net(n), net) && (n->dev == dev || !n->dev)) return n; n = n->next; } return NULL; } struct pneigh_entry *__pneigh_lookup(struct neigh_table *tbl, struct net *net, const void *pkey, struct net_device *dev) { unsigned int key_len = tbl->key_len; u32 hash_val = pneigh_hash(pkey, key_len); return __pneigh_lookup_1(tbl->phash_buckets[hash_val], net, pkey, key_len, dev); } EXPORT_SYMBOL_GPL(__pneigh_lookup); struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl, struct net *net, const void *pkey, struct net_device *dev, int creat) { struct pneigh_entry *n; unsigned int key_len = tbl->key_len; u32 hash_val = pneigh_hash(pkey, key_len); read_lock_bh(&tbl->lock); n = __pneigh_lookup_1(tbl->phash_buckets[hash_val], net, pkey, key_len, dev); read_unlock_bh(&tbl->lock); if (n || !creat) goto out; ASSERT_RTNL(); n = kzalloc(sizeof(*n) + key_len, GFP_KERNEL); if (!n) goto out; write_pnet(&n->net, net); memcpy(n->key, pkey, key_len); n->dev = dev; netdev_hold(dev, &n->dev_tracker, GFP_KERNEL); if (tbl->pconstructor && tbl->pconstructor(n)) { netdev_put(dev, &n->dev_tracker); kfree(n); n = NULL; goto out; } write_lock_bh(&tbl->lock); n->next = tbl->phash_buckets[hash_val]; tbl->phash_buckets[hash_val] = n; write_unlock_bh(&tbl->lock); out: return n; } EXPORT_SYMBOL(pneigh_lookup); int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey, struct net_device *dev) { struct pneigh_entry *n, **np; unsigned int key_len = tbl->key_len; u32 hash_val = pneigh_hash(pkey, key_len); write_lock_bh(&tbl->lock); for (np = &tbl->phash_buckets[hash_val]; (n = *np) != NULL; np = &n->next) { if (!memcmp(n->key, pkey, key_len) && n->dev == dev && net_eq(pneigh_net(n), net)) { *np = n->next; write_unlock_bh(&tbl->lock); if (tbl->pdestructor) tbl->pdestructor(n); netdev_put(n->dev, &n->dev_tracker); kfree(n); return 0; } } write_unlock_bh(&tbl->lock); return -ENOENT; } static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, struct net_device *dev) { struct pneigh_entry *n, **np, *freelist = NULL; u32 h; for (h = 0; h <= PNEIGH_HASHMASK; h++) { np = &tbl->phash_buckets[h]; while ((n = *np) != NULL) { if (!dev || n->dev == dev) { *np = n->next; n->next = freelist; freelist = n; continue; } np = &n->next; } } write_unlock_bh(&tbl->lock); while ((n = freelist)) { freelist = n->next; n->next = NULL; if (tbl->pdestructor) tbl->pdestructor(n); netdev_put(n->dev, &n->dev_tracker); kfree(n); } return -ENOENT; } static void neigh_parms_destroy(struct neigh_parms *parms); static inline void neigh_parms_put(struct neigh_parms *parms) { if (refcount_dec_and_test(&parms->refcnt)) neigh_parms_destroy(parms); } /* * neighbour must already be out of the table; * */ void neigh_destroy(struct neighbour *neigh) { struct net_device *dev = neigh->dev; NEIGH_CACHE_STAT_INC(neigh->tbl, destroys); if (!neigh->dead) { pr_warn("Destroying alive neighbour %p\n", neigh); dump_stack(); return; } if (neigh_del_timer(neigh)) pr_warn("Impossible event\n"); write_lock_bh(&neigh->lock); __skb_queue_purge(&neigh->arp_queue); write_unlock_bh(&neigh->lock); neigh->arp_queue_len_bytes = 0; if (dev->netdev_ops->ndo_neigh_destroy) dev->netdev_ops->ndo_neigh_destroy(dev, neigh); netdev_put(dev, &neigh->dev_tracker); neigh_parms_put(neigh->parms); neigh_dbg(2, "neigh %p is destroyed\n", neigh); atomic_dec(&neigh->tbl->entries); kfree_rcu(neigh, rcu); } EXPORT_SYMBOL(neigh_destroy); /* Neighbour state is suspicious; disable fast path. Called with write_locked neigh. */ static void neigh_suspect(struct neighbour *neigh) { neigh_dbg(2, "neigh %p is suspected\n", neigh); WRITE_ONCE(neigh->output, neigh->ops->output); } /* Neighbour state is OK; enable fast path. Called with write_locked neigh. */ static void neigh_connect(struct neighbour *neigh) { neigh_dbg(2, "neigh %p is connected\n", neigh); WRITE_ONCE(neigh->output, neigh->ops->connected_output); } static void neigh_periodic_work(struct work_struct *work) { struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work); struct neighbour *n; struct neighbour __rcu **np; unsigned int i; struct neigh_hash_table *nht; NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs); write_lock_bh(&tbl->lock); nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); /* * periodically recompute ReachableTime from random function */ if (time_after(jiffies, tbl->last_rand + 300 * HZ)) { struct neigh_parms *p; WRITE_ONCE(tbl->last_rand, jiffies); list_for_each_entry(p, &tbl->parms_list, list) p->reachable_time = neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME)); } if (atomic_read(&tbl->entries) < READ_ONCE(tbl->gc_thresh1)) goto out; for (i = 0 ; i < (1 << nht->hash_shift); i++) { np = &nht->hash_buckets[i]; while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock))) != NULL) { unsigned int state; write_lock(&n->lock); state = n->nud_state; if ((state & (NUD_PERMANENT | NUD_IN_TIMER)) || (n->flags & NTF_EXT_LEARNED)) { write_unlock(&n->lock); goto next_elt; } if (time_before(n->used, n->confirmed) && time_is_before_eq_jiffies(n->confirmed)) n->used = n->confirmed; if (refcount_read(&n->refcnt) == 1 && (state == NUD_FAILED || !time_in_range_open(jiffies, n->used, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) { rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(&tbl->lock))); neigh_mark_dead(n); write_unlock(&n->lock); neigh_cleanup_and_release(n); continue; } write_unlock(&n->lock); next_elt: np = &n->next; } /* * It's fine to release lock here, even if hash table * grows while we are preempted. */ write_unlock_bh(&tbl->lock); cond_resched(); write_lock_bh(&tbl->lock); nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); } out: /* Cycle through all hash buckets every BASE_REACHABLE_TIME/2 ticks. * ARP entry timeouts range from 1/2 BASE_REACHABLE_TIME to 3/2 * BASE_REACHABLE_TIME. */ queue_delayed_work(system_power_efficient_wq, &tbl->gc_work, NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME) >> 1); write_unlock_bh(&tbl->lock); } static __inline__ int neigh_max_probes(struct neighbour *n) { struct neigh_parms *p = n->parms; return NEIGH_VAR(p, UCAST_PROBES) + NEIGH_VAR(p, APP_PROBES) + (n->nud_state & NUD_PROBE ? NEIGH_VAR(p, MCAST_REPROBES) : NEIGH_VAR(p, MCAST_PROBES)); } static void neigh_invalidate(struct neighbour *neigh) __releases(neigh->lock) __acquires(neigh->lock) { struct sk_buff *skb; NEIGH_CACHE_STAT_INC(neigh->tbl, res_failed); neigh_dbg(2, "neigh %p is failed\n", neigh); neigh->updated = jiffies; /* It is very thin place. report_unreachable is very complicated routine. Particularly, it can hit the same neighbour entry! So that, we try to be accurate and avoid dead loop. --ANK */ while (neigh->nud_state == NUD_FAILED && (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) { write_unlock(&neigh->lock); neigh->ops->error_report(neigh, skb); write_lock(&neigh->lock); } __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } static void neigh_probe(struct neighbour *neigh) __releases(neigh->lock) { struct sk_buff *skb = skb_peek_tail(&neigh->arp_queue); /* keep skb alive even if arp_queue overflows */ if (skb) skb = skb_clone(skb, GFP_ATOMIC); write_unlock(&neigh->lock); if (neigh->ops->solicit) neigh->ops->solicit(neigh, skb); atomic_inc(&neigh->probes); consume_skb(skb); } /* Called when a timer expires for a neighbour entry. */ static void neigh_timer_handler(struct timer_list *t) { unsigned long now, next; struct neighbour *neigh = from_timer(neigh, t, timer); unsigned int state; int notify = 0; write_lock(&neigh->lock); state = neigh->nud_state; now = jiffies; next = now + HZ; if (!(state & NUD_IN_TIMER)) goto out; if (state & NUD_REACHABLE) { if (time_before_eq(now, neigh->confirmed + neigh->parms->reachable_time)) { neigh_dbg(2, "neigh %p is still alive\n", neigh); next = neigh->confirmed + neigh->parms->reachable_time; } else if (time_before_eq(now, neigh->used + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) { neigh_dbg(2, "neigh %p is delayed\n", neigh); WRITE_ONCE(neigh->nud_state, NUD_DELAY); neigh->updated = jiffies; neigh_suspect(neigh); next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME); } else { neigh_dbg(2, "neigh %p is suspected\n", neigh); WRITE_ONCE(neigh->nud_state, NUD_STALE); neigh->updated = jiffies; neigh_suspect(neigh); notify = 1; } } else if (state & NUD_DELAY) { if (time_before_eq(now, neigh->confirmed + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) { neigh_dbg(2, "neigh %p is now reachable\n", neigh); WRITE_ONCE(neigh->nud_state, NUD_REACHABLE); neigh->updated = jiffies; neigh_connect(neigh); notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { neigh_dbg(2, "neigh %p is probed\n", neigh); WRITE_ONCE(neigh->nud_state, NUD_PROBE); neigh->updated = jiffies; atomic_set(&neigh->probes, 0); notify = 1; next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME), HZ/100); } } else { /* NUD_PROBE|NUD_INCOMPLETE */ next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME), HZ/100); } if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) && atomic_read(&neigh->probes) >= neigh_max_probes(neigh)) { WRITE_ONCE(neigh->nud_state, NUD_FAILED); notify = 1; neigh_invalidate(neigh); goto out; } if (neigh->nud_state & NUD_IN_TIMER) { if (time_before(next, jiffies + HZ/100)) next = jiffies + HZ/100; if (!mod_timer(&neigh->timer, next)) neigh_hold(neigh); } if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) { neigh_probe(neigh); } else { out: write_unlock(&neigh->lock); } if (notify) neigh_update_notify(neigh, 0); trace_neigh_timer_handler(neigh, 0); neigh_release(neigh); } int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb, const bool immediate_ok) { int rc; bool immediate_probe = false; write_lock_bh(&neigh->lock); rc = 0; if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE)) goto out_unlock_bh; if (neigh->dead) goto out_dead; if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) { if (NEIGH_VAR(neigh->parms, MCAST_PROBES) + NEIGH_VAR(neigh->parms, APP_PROBES)) { unsigned long next, now = jiffies; atomic_set(&neigh->probes, NEIGH_VAR(neigh->parms, UCAST_PROBES)); neigh_del_timer(neigh); WRITE_ONCE(neigh->nud_state, NUD_INCOMPLETE); neigh->updated = now; if (!immediate_ok) { next = now + 1; } else { immediate_probe = true; next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME), HZ / 100); } neigh_add_timer(neigh, next); } else { WRITE_ONCE(neigh->nud_state, NUD_FAILED); neigh->updated = jiffies; write_unlock_bh(&neigh->lock); kfree_skb_reason(skb, SKB_DROP_REASON_NEIGH_FAILED); return 1; } } else if (neigh->nud_state & NUD_STALE) { neigh_dbg(2, "neigh %p is delayed\n", neigh); neigh_del_timer(neigh); WRITE_ONCE(neigh->nud_state, NUD_DELAY); neigh->updated = jiffies; neigh_add_timer(neigh, jiffies + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME)); } if (neigh->nud_state == NUD_INCOMPLETE) { if (skb) { while (neigh->arp_queue_len_bytes + skb->truesize > NEIGH_VAR(neigh->parms, QUEUE_LEN_BYTES)) { struct sk_buff *buff; buff = __skb_dequeue(&neigh->arp_queue); if (!buff) break; neigh->arp_queue_len_bytes -= buff->truesize; kfree_skb_reason(buff, SKB_DROP_REASON_NEIGH_QUEUEFULL); NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); } skb_dst_force(skb); __skb_queue_tail(&neigh->arp_queue, skb); neigh->arp_queue_len_bytes += skb->truesize; } rc = 1; } out_unlock_bh: if (immediate_probe) neigh_probe(neigh); else write_unlock(&neigh->lock); local_bh_enable(); trace_neigh_event_send_done(neigh, rc); return rc; out_dead: if (neigh->nud_state & NUD_STALE) goto out_unlock_bh; write_unlock_bh(&neigh->lock); kfree_skb_reason(skb, SKB_DROP_REASON_NEIGH_DEAD); trace_neigh_event_send_dead(neigh, 1); return 1; } EXPORT_SYMBOL(__neigh_event_send); static void neigh_update_hhs(struct neighbour *neigh) { struct hh_cache *hh; void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *) = NULL; if (neigh->dev->header_ops) update = neigh->dev->header_ops->cache_update; if (update) { hh = &neigh->hh; if (READ_ONCE(hh->hh_len)) { write_seqlock_bh(&hh->hh_lock); update(hh, neigh->dev, neigh->ha); write_sequnlock_bh(&hh->hh_lock); } } } /* Generic update routine. -- lladdr is new lladdr or NULL, if it is not supplied. -- new is new state. -- flags NEIGH_UPDATE_F_OVERRIDE allows to override existing lladdr, if it is different. NEIGH_UPDATE_F_WEAK_OVERRIDE will suspect existing "connected" lladdr instead of overriding it if it is different. NEIGH_UPDATE_F_ADMIN means that the change is administrative. NEIGH_UPDATE_F_USE means that the entry is user triggered. NEIGH_UPDATE_F_MANAGED means that the entry will be auto-refreshed. NEIGH_UPDATE_F_OVERRIDE_ISROUTER allows to override existing NTF_ROUTER flag. NEIGH_UPDATE_F_ISROUTER indicates if the neighbour is known as a router. Caller MUST hold reference count on the entry. */ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags, u32 nlmsg_pid, struct netlink_ext_ack *extack) { bool gc_update = false, managed_update = false; int update_isrouter = 0; struct net_device *dev; int err, notify = 0; u8 old; trace_neigh_update(neigh, lladdr, new, flags, nlmsg_pid); write_lock_bh(&neigh->lock); dev = neigh->dev; old = neigh->nud_state; err = -EPERM; if (neigh->dead) { NL_SET_ERR_MSG(extack, "Neighbor entry is now dead"); new = old; goto out; } if (!(flags & NEIGH_UPDATE_F_ADMIN) && (old & (NUD_NOARP | NUD_PERMANENT))) goto out; neigh_update_flags(neigh, flags, ¬ify, &gc_update, &managed_update); if (flags & (NEIGH_UPDATE_F_USE | NEIGH_UPDATE_F_MANAGED)) { new = old & ~NUD_PERMANENT; WRITE_ONCE(neigh->nud_state, new); err = 0; goto out; } if (!(new & NUD_VALID)) { neigh_del_timer(neigh); if (old & NUD_CONNECTED) neigh_suspect(neigh); WRITE_ONCE(neigh->nud_state, new); err = 0; notify = old & NUD_VALID; if ((old & (NUD_INCOMPLETE | NUD_PROBE)) && (new & NUD_FAILED)) { neigh_invalidate(neigh); notify = 1; } goto out; } /* Compare new lladdr with cached one */ if (!dev->addr_len) { /* First case: device needs no address. */ lladdr = neigh->ha; } else if (lladdr) { /* The second case: if something is already cached and a new address is proposed: - compare new & old - if they are different, check override flag */ if ((old & NUD_VALID) && !memcmp(lladdr, neigh->ha, dev->addr_len)) lladdr = neigh->ha; } else { /* No address is supplied; if we know something, use it, otherwise discard the request. */ err = -EINVAL; if (!(old & NUD_VALID)) { NL_SET_ERR_MSG(extack, "No link layer address given"); goto out; } lladdr = neigh->ha; } /* Update confirmed timestamp for neighbour entry after we * received ARP packet even if it doesn't change IP to MAC binding. */ if (new & NUD_CONNECTED) neigh->confirmed = jiffies; /* If entry was valid and address is not changed, do not change entry state, if new one is STALE. */ err = 0; update_isrouter = flags & NEIGH_UPDATE_F_OVERRIDE_ISROUTER; if (old & NUD_VALID) { if (lladdr != neigh->ha && !(flags & NEIGH_UPDATE_F_OVERRIDE)) { update_isrouter = 0; if ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) && (old & NUD_CONNECTED)) { lladdr = neigh->ha; new = NUD_STALE; } else goto out; } else { if (lladdr == neigh->ha && new == NUD_STALE && !(flags & NEIGH_UPDATE_F_ADMIN)) new = old; } } /* Update timestamp only once we know we will make a change to the * neighbour entry. Otherwise we risk to move the locktime window with * noop updates and ignore relevant ARP updates. */ if (new != old || lladdr != neigh->ha) neigh->updated = jiffies; if (new != old) { neigh_del_timer(neigh); if (new & NUD_PROBE) atomic_set(&neigh->probes, 0); if (new & NUD_IN_TIMER) neigh_add_timer(neigh, (jiffies + ((new & NUD_REACHABLE) ? neigh->parms->reachable_time : 0))); WRITE_ONCE(neigh->nud_state, new); notify = 1; } if (lladdr != neigh->ha) { write_seqlock(&neigh->ha_lock); memcpy(&neigh->ha, lladdr, dev->addr_len); write_sequnlock(&neigh->ha_lock); neigh_update_hhs(neigh); if (!(new & NUD_CONNECTED)) neigh->confirmed = jiffies - (NEIGH_VAR(neigh->parms, BASE_REACHABLE_TIME) << 1); notify = 1; } if (new == old) goto out; if (new & NUD_CONNECTED) neigh_connect(neigh); else neigh_suspect(neigh); if (!(old & NUD_VALID)) { struct sk_buff *skb; /* Again: avoid dead loop if something went wrong */ while (neigh->nud_state & NUD_VALID && (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) { struct dst_entry *dst = skb_dst(skb); struct neighbour *n2, *n1 = neigh; write_unlock_bh(&neigh->lock); rcu_read_lock(); /* Why not just use 'neigh' as-is? The problem is that * things such as shaper, eql, and sch_teql can end up * using alternative, different, neigh objects to output * the packet in the output path. So what we need to do * here is re-lookup the top-level neigh in the path so * we can reinject the packet there. */ n2 = NULL; if (dst && dst->obsolete != DST_OBSOLETE_DEAD) { n2 = dst_neigh_lookup_skb(dst, skb); if (n2) n1 = n2; } READ_ONCE(n1->output)(n1, skb); if (n2) neigh_release(n2); rcu_read_unlock(); write_lock_bh(&neigh->lock); } __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } out: if (update_isrouter) neigh_update_is_router(neigh, flags, ¬ify); write_unlock_bh(&neigh->lock); if (((new ^ old) & NUD_PERMANENT) || gc_update) neigh_update_gc_list(neigh); if (managed_update) neigh_update_managed_list(neigh); if (notify) neigh_update_notify(neigh, nlmsg_pid); trace_neigh_update_done(neigh, err); return err; } int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags, u32 nlmsg_pid) { return __neigh_update(neigh, lladdr, new, flags, nlmsg_pid, NULL); } EXPORT_SYMBOL(neigh_update); /* Update the neigh to listen temporarily for probe responses, even if it is * in a NUD_FAILED state. The caller has to hold neigh->lock for writing. */ void __neigh_set_probe_once(struct neighbour *neigh) { if (neigh->dead) return; neigh->updated = jiffies; if (!(neigh->nud_state & NUD_FAILED)) return; WRITE_ONCE(neigh->nud_state, NUD_INCOMPLETE); atomic_set(&neigh->probes, neigh_max_probes(neigh)); neigh_add_timer(neigh, jiffies + max(NEIGH_VAR(neigh->parms, RETRANS_TIME), HZ/100)); } EXPORT_SYMBOL(__neigh_set_probe_once); struct neighbour *neigh_event_ns(struct neigh_table *tbl, u8 *lladdr, void *saddr, struct net_device *dev) { struct neighbour *neigh = __neigh_lookup(tbl, saddr, dev, lladdr || !dev->addr_len); if (neigh) neigh_update(neigh, lladdr, NUD_STALE, NEIGH_UPDATE_F_OVERRIDE, 0); return neigh; } EXPORT_SYMBOL(neigh_event_ns); /* called with read_lock_bh(&n->lock); */ static void neigh_hh_init(struct neighbour *n) { struct net_device *dev = n->dev; __be16 prot = n->tbl->protocol; struct hh_cache *hh = &n->hh; write_lock_bh(&n->lock); /* Only one thread can come in here and initialize the * hh_cache entry. */ if (!hh->hh_len) dev->header_ops->cache(n, hh, prot); write_unlock_bh(&n->lock); } /* Slow and careful. */ int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb) { int rc = 0; if (!neigh_event_send(neigh, skb)) { int err; struct net_device *dev = neigh->dev; unsigned int seq; if (dev->header_ops->cache && !READ_ONCE(neigh->hh.hh_len)) neigh_hh_init(neigh); do { __skb_pull(skb, skb_network_offset(skb)); seq = read_seqbegin(&neigh->ha_lock); err = dev_hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); } while (read_seqretry(&neigh->ha_lock, seq)); if (err >= 0) rc = dev_queue_xmit(skb); else goto out_kfree_skb; } out: return rc; out_kfree_skb: rc = -EINVAL; kfree_skb(skb); goto out; } EXPORT_SYMBOL(neigh_resolve_output); /* As fast as possible without hh cache */ int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb) { struct net_device *dev = neigh->dev; unsigned int seq; int err; do { __skb_pull(skb, skb_network_offset(skb)); seq = read_seqbegin(&neigh->ha_lock); err = dev_hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); } while (read_seqretry(&neigh->ha_lock, seq)); if (err >= 0) err = dev_queue_xmit(skb); else { err = -EINVAL; kfree_skb(skb); } return err; } EXPORT_SYMBOL(neigh_connected_output); int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb) { return dev_queue_xmit(skb); } EXPORT_SYMBOL(neigh_direct_output); static void neigh_managed_work(struct work_struct *work) { struct neigh_table *tbl = container_of(work, struct neigh_table, managed_work.work); struct neighbour *neigh; write_lock_bh(&tbl->lock); list_for_each_entry(neigh, &tbl->managed_list, managed_list) neigh_event_send_probe(neigh, NULL, false); queue_delayed_work(system_power_efficient_wq, &tbl->managed_work, NEIGH_VAR(&tbl->parms, INTERVAL_PROBE_TIME_MS)); write_unlock_bh(&tbl->lock); } static void neigh_proxy_process(struct timer_list *t) { struct neigh_table *tbl = from_timer(tbl, t, proxy_timer); long sched_next = 0; unsigned long now = jiffies; struct sk_buff *skb, *n; spin_lock(&tbl->proxy_queue.lock); skb_queue_walk_safe(&tbl->proxy_queue, skb, n) { long tdif = NEIGH_CB(skb)->sched_next - now; if (tdif <= 0) { struct net_device *dev = skb->dev; neigh_parms_qlen_dec(dev, tbl->family); __skb_unlink(skb, &tbl->proxy_queue); if (tbl->proxy_redo && netif_running(dev)) { rcu_read_lock(); tbl->proxy_redo(skb); rcu_read_unlock(); } else { kfree_skb(skb); } dev_put(dev); } else if (!sched_next || tdif < sched_next) sched_next = tdif; } del_timer(&tbl->proxy_timer); if (sched_next) mod_timer(&tbl->proxy_timer, jiffies + sched_next); spin_unlock(&tbl->proxy_queue.lock); } static unsigned long neigh_proxy_delay(struct neigh_parms *p) { /* If proxy_delay is zero, do not call get_random_u32_below() * as it is undefined behavior. */ unsigned long proxy_delay = NEIGH_VAR(p, PROXY_DELAY); return proxy_delay ? jiffies + get_random_u32_below(proxy_delay) : jiffies; } void pneigh_enqueue(struct neigh_table *tbl, struct neigh_parms *p, struct sk_buff *skb) { unsigned long sched_next = neigh_proxy_delay(p); if (p->qlen > NEIGH_VAR(p, PROXY_QLEN)) { kfree_skb(skb); return; } NEIGH_CB(skb)->sched_next = sched_next; NEIGH_CB(skb)->flags |= LOCALLY_ENQUEUED; spin_lock(&tbl->proxy_queue.lock); if (del_timer(&tbl->proxy_timer)) { if (time_before(tbl->proxy_timer.expires, sched_next)) sched_next = tbl->proxy_timer.expires; } skb_dst_drop(skb); dev_hold(skb->dev); __skb_queue_tail(&tbl->proxy_queue, skb); p->qlen++; mod_timer(&tbl->proxy_timer, sched_next); spin_unlock(&tbl->proxy_queue.lock); } EXPORT_SYMBOL(pneigh_enqueue); static inline struct neigh_parms *lookup_neigh_parms(struct neigh_table *tbl, struct net *net, int ifindex) { struct neigh_parms *p; list_for_each_entry(p, &tbl->parms_list, list) { if ((p->dev && p->dev->ifindex == ifindex && net_eq(neigh_parms_net(p), net)) || (!p->dev && !ifindex && net_eq(net, &init_net))) return p; } return NULL; } struct neigh_parms *neigh_parms_alloc(struct net_device *dev, struct neigh_table *tbl) { struct neigh_parms *p; struct net *net = dev_net(dev); const struct net_device_ops *ops = dev->netdev_ops; p = kmemdup(&tbl->parms, sizeof(*p), GFP_KERNEL); if (p) { p->tbl = tbl; refcount_set(&p->refcnt, 1); p->reachable_time = neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME)); p->qlen = 0; netdev_hold(dev, &p->dev_tracker, GFP_KERNEL); p->dev = dev; write_pnet(&p->net, net); p->sysctl_table = NULL; if (ops->ndo_neigh_setup && ops->ndo_neigh_setup(dev, p)) { netdev_put(dev, &p->dev_tracker); kfree(p); return NULL; } write_lock_bh(&tbl->lock); list_add(&p->list, &tbl->parms.list); write_unlock_bh(&tbl->lock); neigh_parms_data_state_cleanall(p); } return p; } EXPORT_SYMBOL(neigh_parms_alloc); static void neigh_rcu_free_parms(struct rcu_head *head) { struct neigh_parms *parms = container_of(head, struct neigh_parms, rcu_head); neigh_parms_put(parms); } void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) { if (!parms || parms == &tbl->parms) return; write_lock_bh(&tbl->lock); list_del(&parms->list); parms->dead = 1; write_unlock_bh(&tbl->lock); netdev_put(parms->dev, &parms->dev_tracker); call_rcu(&parms->rcu_head, neigh_rcu_free_parms); } EXPORT_SYMBOL(neigh_parms_release); static void neigh_parms_destroy(struct neigh_parms *parms) { kfree(parms); } static struct lock_class_key neigh_table_proxy_queue_class; static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly; void neigh_table_init(int index, struct neigh_table *tbl) { unsigned long now = jiffies; unsigned long phsize; INIT_LIST_HEAD(&tbl->parms_list); INIT_LIST_HEAD(&tbl->gc_list); INIT_LIST_HEAD(&tbl->managed_list); list_add(&tbl->parms.list, &tbl->parms_list); write_pnet(&tbl->parms.net, &init_net); refcount_set(&tbl->parms.refcnt, 1); tbl->parms.reachable_time = neigh_rand_reach_time(NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME)); tbl->parms.qlen = 0; tbl->stats = alloc_percpu(struct neigh_statistics); if (!tbl->stats) panic("cannot create neighbour cache statistics"); #ifdef CONFIG_PROC_FS if (!proc_create_seq_data(tbl->id, 0, init_net.proc_net_stat, &neigh_stat_seq_ops, tbl)) panic("cannot create neighbour proc dir entry"); #endif RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3)); phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *); tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL); if (!tbl->nht || !tbl->phash_buckets) panic("cannot allocate neighbour cache hashes"); if (!tbl->entry_size) tbl->entry_size = ALIGN(offsetof(struct neighbour, primary_key) + tbl->key_len, NEIGH_PRIV_ALIGN); else WARN_ON(tbl->entry_size % NEIGH_PRIV_ALIGN); rwlock_init(&tbl->lock); INIT_DEFERRABLE_WORK(&tbl->gc_work, neigh_periodic_work); queue_delayed_work(system_power_efficient_wq, &tbl->gc_work, tbl->parms.reachable_time); INIT_DEFERRABLE_WORK(&tbl->managed_work, neigh_managed_work); queue_delayed_work(system_power_efficient_wq, &tbl->managed_work, 0); timer_setup(&tbl->proxy_timer, neigh_proxy_process, 0); skb_queue_head_init_class(&tbl->proxy_queue, &neigh_table_proxy_queue_class); tbl->last_flush = now; tbl->last_rand = now + tbl->parms.reachable_time * 20; rcu_assign_pointer(neigh_tables[index], tbl); } EXPORT_SYMBOL(neigh_table_init); /* * Only called from ndisc_cleanup(), which means this is dead code * because we no longer can unload IPv6 module. */ int neigh_table_clear(int index, struct neigh_table *tbl) { RCU_INIT_POINTER(neigh_tables[index], NULL); synchronize_rcu(); /* It is not clean... Fix it to unload IPv6 module safely */ cancel_delayed_work_sync(&tbl->managed_work); cancel_delayed_work_sync(&tbl->gc_work); del_timer_sync(&tbl->proxy_timer); pneigh_queue_purge(&tbl->proxy_queue, NULL, tbl->family); neigh_ifdown(tbl, NULL); if (atomic_read(&tbl->entries)) pr_crit("neighbour leakage\n"); call_rcu(&rcu_dereference_protected(tbl->nht, 1)->rcu, neigh_hash_free_rcu); tbl->nht = NULL; kfree(tbl->phash_buckets); tbl->phash_buckets = NULL; remove_proc_entry(tbl->id, init_net.proc_net_stat); free_percpu(tbl->stats); tbl->stats = NULL; return 0; } EXPORT_SYMBOL(neigh_table_clear); static struct neigh_table *neigh_find_table(int family) { struct neigh_table *tbl = NULL; switch (family) { case AF_INET: tbl = rcu_dereference_rtnl(neigh_tables[NEIGH_ARP_TABLE]); break; case AF_INET6: tbl = rcu_dereference_rtnl(neigh_tables[NEIGH_ND_TABLE]); break; } return tbl; } const struct nla_policy nda_policy[NDA_MAX+1] = { [NDA_UNSPEC] = { .strict_start_type = NDA_NH_ID }, [NDA_DST] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN }, [NDA_LLADDR] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN }, [NDA_CACHEINFO] = { .len = sizeof(struct nda_cacheinfo) }, [NDA_PROBES] = { .type = NLA_U32 }, [NDA_VLAN] = { .type = NLA_U16 }, [NDA_PORT] = { .type = NLA_U16 }, [NDA_VNI] = { .type = NLA_U32 }, [NDA_IFINDEX] = { .type = NLA_U32 }, [NDA_MASTER] = { .type = NLA_U32 }, [NDA_PROTOCOL] = { .type = NLA_U8 }, [NDA_NH_ID] = { .type = NLA_U32 }, [NDA_FLAGS_EXT] = NLA_POLICY_MASK(NLA_U32, NTF_EXT_MASK), [NDA_FDB_EXT_ATTRS] = { .type = NLA_NESTED }, }; static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct ndmsg *ndm; struct nlattr *dst_attr; struct neigh_table *tbl; struct neighbour *neigh; struct net_device *dev = NULL; int err = -EINVAL; ASSERT_RTNL(); if (nlmsg_len(nlh) < sizeof(*ndm)) goto out; dst_attr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_DST); if (!dst_attr) { NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; } ndm = nlmsg_data(nlh); if (ndm->ndm_ifindex) { dev = __dev_get_by_index(net, ndm->ndm_ifindex); if (dev == NULL) { err = -ENODEV; goto out; } } tbl = neigh_find_table(ndm->ndm_family); if (tbl == NULL) return -EAFNOSUPPORT; if (nla_len(dst_attr) < (int)tbl->key_len) { NL_SET_ERR_MSG(extack, "Invalid network address"); goto out; } if (ndm->ndm_flags & NTF_PROXY) { err = pneigh_delete(tbl, net, nla_data(dst_attr), dev); goto out; } if (dev == NULL) goto out; neigh = neigh_lookup(tbl, nla_data(dst_attr), dev); if (neigh == NULL) { err = -ENOENT; goto out; } err = __neigh_update(neigh, NULL, NUD_FAILED, NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_ADMIN, NETLINK_CB(skb).portid, extack); write_lock_bh(&tbl->lock); neigh_release(neigh); neigh_remove_one(neigh, tbl); write_unlock_bh(&tbl->lock); out: return err; } static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_OVERRIDE_ISROUTER; struct net *net = sock_net(skb->sk); struct ndmsg *ndm; struct nlattr *tb[NDA_MAX+1]; struct neigh_table *tbl; struct net_device *dev = NULL; struct neighbour *neigh; void *dst, *lladdr; u8 protocol = 0; u32 ndm_flags; int err; ASSERT_RTNL(); err = nlmsg_parse_deprecated(nlh, sizeof(*ndm), tb, NDA_MAX, nda_policy, extack); if (err < 0) goto out; err = -EINVAL; if (!tb[NDA_DST]) { NL_SET_ERR_MSG(extack, "Network address not specified"); goto out; } ndm = nlmsg_data(nlh); ndm_flags = ndm->ndm_flags; if (tb[NDA_FLAGS_EXT]) { u32 ext = nla_get_u32(tb[NDA_FLAGS_EXT]); BUILD_BUG_ON(sizeof(neigh->flags) * BITS_PER_BYTE < (sizeof(ndm->ndm_flags) * BITS_PER_BYTE + hweight32(NTF_EXT_MASK))); ndm_flags |= (ext << NTF_EXT_SHIFT); } if (ndm->ndm_ifindex) { dev = __dev_get_by_index(net, ndm->ndm_ifindex); if (dev == NULL) { err = -ENODEV; goto out; } if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) { NL_SET_ERR_MSG(extack, "Invalid link address"); goto out; } } tbl = neigh_find_table(ndm->ndm_family); if (tbl == NULL) return -EAFNOSUPPORT; if (nla_len(tb[NDA_DST]) < (int)tbl->key_len) { NL_SET_ERR_MSG(extack, "Invalid network address"); goto out; } dst = nla_data(tb[NDA_DST]); lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL; if (tb[NDA_PROTOCOL]) protocol = nla_get_u8(tb[NDA_PROTOCOL]); if (ndm_flags & NTF_PROXY) { struct pneigh_entry *pn; if (ndm_flags & NTF_MANAGED) { NL_SET_ERR_MSG(extack, "Invalid NTF_* flag combination"); goto out; } err = -ENOBUFS; pn = pneigh_lookup(tbl, net, dst, dev, 1); if (pn) { pn->flags = ndm_flags; if (protocol) pn->protocol = protocol; err = 0; } goto out; } if (!dev) { NL_SET_ERR_MSG(extack, "Device not specified"); goto out; } if (tbl->allow_add && !tbl->allow_add(dev, extack)) { err = -EINVAL; goto out; } neigh = neigh_lookup(tbl, dst, dev); if (neigh == NULL) { bool ndm_permanent = ndm->ndm_state & NUD_PERMANENT; bool exempt_from_gc = ndm_permanent || ndm_flags & NTF_EXT_LEARNED; if (!(nlh->nlmsg_flags & NLM_F_CREATE)) { err = -ENOENT; goto out; } if (ndm_permanent && (ndm_flags & NTF_MANAGED)) { NL_SET_ERR_MSG(extack, "Invalid NTF_* flag for permanent entry"); err = -EINVAL; goto out; } neigh = ___neigh_create(tbl, dst, dev, ndm_flags & (NTF_EXT_LEARNED | NTF_MANAGED), exempt_from_gc, true); if (IS_ERR(neigh)) { err = PTR_ERR(neigh); goto out; } } else { if (nlh->nlmsg_flags & NLM_F_EXCL) { err = -EEXIST; neigh_release(neigh); goto out; } if (!(nlh->nlmsg_flags & NLM_F_REPLACE)) flags &= ~(NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_OVERRIDE_ISROUTER); } if (protocol) neigh->protocol = protocol; if (ndm_flags & NTF_EXT_LEARNED) flags |= NEIGH_UPDATE_F_EXT_LEARNED; if (ndm_flags & NTF_ROUTER) flags |= NEIGH_UPDATE_F_ISROUTER; if (ndm_flags & NTF_MANAGED) flags |= NEIGH_UPDATE_F_MANAGED; if (ndm_flags & NTF_USE) flags |= NEIGH_UPDATE_F_USE; err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags, NETLINK_CB(skb).portid, extack); if (!err && ndm_flags & (NTF_USE | NTF_MANAGED)) { neigh_event_send(neigh, NULL); err = 0; } neigh_release(neigh); out: return err; } static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms) { struct nlattr *nest; nest = nla_nest_start_noflag(skb, NDTA_PARMS); if (nest == NULL) return -ENOBUFS; if ((parms->dev && nla_put_u32(skb, NDTPA_IFINDEX, parms->dev->ifindex)) || nla_put_u32(skb, NDTPA_REFCNT, refcount_read(&parms->refcnt)) || nla_put_u32(skb, NDTPA_QUEUE_LENBYTES, NEIGH_VAR(parms, QUEUE_LEN_BYTES)) || /* approximative value for deprecated QUEUE_LEN (in packets) */ nla_put_u32(skb, NDTPA_QUEUE_LEN, NEIGH_VAR(parms, QUEUE_LEN_BYTES) / SKB_TRUESIZE(ETH_FRAME_LEN)) || nla_put_u32(skb, NDTPA_PROXY_QLEN, NEIGH_VAR(parms, PROXY_QLEN)) || nla_put_u32(skb, NDTPA_APP_PROBES, NEIGH_VAR(parms, APP_PROBES)) || nla_put_u32(skb, NDTPA_UCAST_PROBES, NEIGH_VAR(parms, UCAST_PROBES)) || nla_put_u32(skb, NDTPA_MCAST_PROBES, NEIGH_VAR(parms, MCAST_PROBES)) || nla_put_u32(skb, NDTPA_MCAST_REPROBES, NEIGH_VAR(parms, MCAST_REPROBES)) || nla_put_msecs(skb, NDTPA_REACHABLE_TIME, parms->reachable_time, NDTPA_PAD) || nla_put_msecs(skb, NDTPA_BASE_REACHABLE_TIME, NEIGH_VAR(parms, BASE_REACHABLE_TIME), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_GC_STALETIME, NEIGH_VAR(parms, GC_STALETIME), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_DELAY_PROBE_TIME, NEIGH_VAR(parms, DELAY_PROBE_TIME), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_RETRANS_TIME, NEIGH_VAR(parms, RETRANS_TIME), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_ANYCAST_DELAY, NEIGH_VAR(parms, ANYCAST_DELAY), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_PROXY_DELAY, NEIGH_VAR(parms, PROXY_DELAY), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_LOCKTIME, NEIGH_VAR(parms, LOCKTIME), NDTPA_PAD) || nla_put_msecs(skb, NDTPA_INTERVAL_PROBE_TIME_MS, NEIGH_VAR(parms, INTERVAL_PROBE_TIME_MS), NDTPA_PAD)) goto nla_put_failure; return nla_nest_end(skb, nest); nla_put_failure: nla_nest_cancel(skb, nest); return -EMSGSIZE; } static int neightbl_fill_info(struct sk_buff *skb, struct neigh_table *tbl, u32 pid, u32 seq, int type, int flags) { struct nlmsghdr *nlh; struct ndtmsg *ndtmsg; nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndtmsg), flags); if (nlh == NULL) return -EMSGSIZE; ndtmsg = nlmsg_data(nlh); read_lock_bh(&tbl->lock); ndtmsg->ndtm_family = tbl->family; ndtmsg->ndtm_pad1 = 0; ndtmsg->ndtm_pad2 = 0; if (nla_put_string(skb, NDTA_NAME, tbl->id) || nla_put_msecs(skb, NDTA_GC_INTERVAL, READ_ONCE(tbl->gc_interval), NDTA_PAD) || nla_put_u32(skb, NDTA_THRESH1, READ_ONCE(tbl->gc_thresh1)) || nla_put_u32(skb, NDTA_THRESH2, READ_ONCE(tbl->gc_thresh2)) || nla_put_u32(skb, NDTA_THRESH3, READ_ONCE(tbl->gc_thresh3))) goto nla_put_failure; { unsigned long now = jiffies; long flush_delta = now - READ_ONCE(tbl->last_flush); long rand_delta = now - READ_ONCE(tbl->last_rand); struct neigh_hash_table *nht; struct ndt_config ndc = { .ndtc_key_len = tbl->key_len, .ndtc_entry_size = tbl->entry_size, .ndtc_entries = atomic_read(&tbl->entries), .ndtc_last_flush = jiffies_to_msecs(flush_delta), .ndtc_last_rand = jiffies_to_msecs(rand_delta), .ndtc_proxy_qlen = READ_ONCE(tbl->proxy_queue.qlen), }; rcu_read_lock(); nht = rcu_dereference(tbl->nht); ndc.ndtc_hash_rnd = nht->hash_rnd[0]; ndc.ndtc_hash_mask = ((1 << nht->hash_shift) - 1); rcu_read_unlock(); if (nla_put(skb, NDTA_CONFIG, sizeof(ndc), &ndc)) goto nla_put_failure; } { int cpu; struct ndt_stats ndst; memset(&ndst, 0, sizeof(ndst)); for_each_possible_cpu(cpu) { struct neigh_statistics *st; st = per_cpu_ptr(tbl->stats, cpu); ndst.ndts_allocs += READ_ONCE(st->allocs); ndst.ndts_destroys += READ_ONCE(st->destroys); ndst.ndts_hash_grows += READ_ONCE(st->hash_grows); ndst.ndts_res_failed += READ_ONCE(st->res_failed); ndst.ndts_lookups += READ_ONCE(st->lookups); ndst.ndts_hits += READ_ONCE(st->hits); ndst.ndts_rcv_probes_mcast += READ_ONCE(st->rcv_probes_mcast); ndst.ndts_rcv_probes_ucast += READ_ONCE(st->rcv_probes_ucast); ndst.ndts_periodic_gc_runs += READ_ONCE(st->periodic_gc_runs); ndst.ndts_forced_gc_runs += READ_ONCE(st->forced_gc_runs); ndst.ndts_table_fulls += READ_ONCE(st->table_fulls); } if (nla_put_64bit(skb, NDTA_STATS, sizeof(ndst), &ndst, NDTA_PAD)) goto nla_put_failure; } BUG_ON(tbl->parms.dev); if (neightbl_fill_parms(skb, &tbl->parms) < 0) goto nla_put_failure; read_unlock_bh(&tbl->lock); nlmsg_end(skb, nlh); return 0; nla_put_failure: read_unlock_bh(&tbl->lock); nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int neightbl_fill_param_info(struct sk_buff *skb, struct neigh_table *tbl, struct neigh_parms *parms, u32 pid, u32 seq, int type, unsigned int flags) { struct ndtmsg *ndtmsg; struct nlmsghdr *nlh; nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndtmsg), flags); if (nlh == NULL) return -EMSGSIZE; ndtmsg = nlmsg_data(nlh); read_lock_bh(&tbl->lock); ndtmsg->ndtm_family = tbl->family; ndtmsg->ndtm_pad1 = 0; ndtmsg->ndtm_pad2 = 0; if (nla_put_string(skb, NDTA_NAME, tbl->id) < 0 || neightbl_fill_parms(skb, parms) < 0) goto errout; read_unlock_bh(&tbl->lock); nlmsg_end(skb, nlh); return 0; errout: read_unlock_bh(&tbl->lock); nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static const struct nla_policy nl_neightbl_policy[NDTA_MAX+1] = { [NDTA_NAME] = { .type = NLA_STRING }, [NDTA_THRESH1] = { .type = NLA_U32 }, [NDTA_THRESH2] = { .type = NLA_U32 }, [NDTA_THRESH3] = { .type = NLA_U32 }, [NDTA_GC_INTERVAL] = { .type = NLA_U64 }, [NDTA_PARMS] = { .type = NLA_NESTED }, }; static const struct nla_policy nl_ntbl_parm_policy[NDTPA_MAX+1] = { [NDTPA_IFINDEX] = { .type = NLA_U32 }, [NDTPA_QUEUE_LEN] = { .type = NLA_U32 }, [NDTPA_PROXY_QLEN] = { .type = NLA_U32 }, [NDTPA_APP_PROBES] = { .type = NLA_U32 }, [NDTPA_UCAST_PROBES] = { .type = NLA_U32 }, [NDTPA_MCAST_PROBES] = { .type = NLA_U32 }, [NDTPA_MCAST_REPROBES] = { .type = NLA_U32 }, [NDTPA_BASE_REACHABLE_TIME] = { .type = NLA_U64 }, [NDTPA_GC_STALETIME] = { .type = NLA_U64 }, [NDTPA_DELAY_PROBE_TIME] = { .type = NLA_U64 }, [NDTPA_RETRANS_TIME] = { .type = NLA_U64 }, [NDTPA_ANYCAST_DELAY] = { .type = NLA_U64 }, [NDTPA_PROXY_DELAY] = { .type = NLA_U64 }, [NDTPA_LOCKTIME] = { .type = NLA_U64 }, [NDTPA_INTERVAL_PROBE_TIME_MS] = { .type = NLA_U64, .min = 1 }, }; static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct neigh_table *tbl; struct ndtmsg *ndtmsg; struct nlattr *tb[NDTA_MAX+1]; bool found = false; int err, tidx; err = nlmsg_parse_deprecated(nlh, sizeof(*ndtmsg), tb, NDTA_MAX, nl_neightbl_policy, extack); if (err < 0) goto errout; if (tb[NDTA_NAME] == NULL) { err = -EINVAL; goto errout; } ndtmsg = nlmsg_data(nlh); for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) { tbl = rcu_dereference_rtnl(neigh_tables[tidx]); if (!tbl) continue; if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family) continue; if (nla_strcmp(tb[NDTA_NAME], tbl->id) == 0) { found = true; break; } } if (!found) return -ENOENT; /* * We acquire tbl->lock to be nice to the periodic timers and * make sure they always see a consistent set of values. */ write_lock_bh(&tbl->lock); if (tb[NDTA_PARMS]) { struct nlattr *tbp[NDTPA_MAX+1]; struct neigh_parms *p; int i, ifindex = 0; err = nla_parse_nested_deprecated(tbp, NDTPA_MAX, tb[NDTA_PARMS], nl_ntbl_parm_policy, extack); if (err < 0) goto errout_tbl_lock; if (tbp[NDTPA_IFINDEX]) ifindex = nla_get_u32(tbp[NDTPA_IFINDEX]); p = lookup_neigh_parms(tbl, net, ifindex); if (p == NULL) { err = -ENOENT; goto errout_tbl_lock; } for (i = 1; i <= NDTPA_MAX; i++) { if (tbp[i] == NULL) continue; switch (i) { case NDTPA_QUEUE_LEN: NEIGH_VAR_SET(p, QUEUE_LEN_BYTES, nla_get_u32(tbp[i]) * SKB_TRUESIZE(ETH_FRAME_LEN)); break; case NDTPA_QUEUE_LENBYTES: NEIGH_VAR_SET(p, QUEUE_LEN_BYTES, nla_get_u32(tbp[i])); break; case NDTPA_PROXY_QLEN: NEIGH_VAR_SET(p, PROXY_QLEN, nla_get_u32(tbp[i])); break; case NDTPA_APP_PROBES: NEIGH_VAR_SET(p, APP_PROBES, nla_get_u32(tbp[i])); break; case NDTPA_UCAST_PROBES: NEIGH_VAR_SET(p, UCAST_PROBES, nla_get_u32(tbp[i])); break; case NDTPA_MCAST_PROBES: NEIGH_VAR_SET(p, MCAST_PROBES, nla_get_u32(tbp[i])); break; case NDTPA_MCAST_REPROBES: NEIGH_VAR_SET(p, MCAST_REPROBES, nla_get_u32(tbp[i])); break; case NDTPA_BASE_REACHABLE_TIME: NEIGH_VAR_SET(p, BASE_REACHABLE_TIME, nla_get_msecs(tbp[i])); /* update reachable_time as well, otherwise, the change will * only be effective after the next time neigh_periodic_work * decides to recompute it (can be multiple minutes) */ p->reachable_time = neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME)); break; case NDTPA_GC_STALETIME: NEIGH_VAR_SET(p, GC_STALETIME, nla_get_msecs(tbp[i])); break; case NDTPA_DELAY_PROBE_TIME: NEIGH_VAR_SET(p, DELAY_PROBE_TIME, nla_get_msecs(tbp[i])); call_netevent_notifiers(NETEVENT_DELAY_PROBE_TIME_UPDATE, p); break; case NDTPA_INTERVAL_PROBE_TIME_MS: NEIGH_VAR_SET(p, INTERVAL_PROBE_TIME_MS, nla_get_msecs(tbp[i])); break; case NDTPA_RETRANS_TIME: NEIGH_VAR_SET(p, RETRANS_TIME, nla_get_msecs(tbp[i])); break; case NDTPA_ANYCAST_DELAY: NEIGH_VAR_SET(p, ANYCAST_DELAY, nla_get_msecs(tbp[i])); break; case NDTPA_PROXY_DELAY: NEIGH_VAR_SET(p, PROXY_DELAY, nla_get_msecs(tbp[i])); break; case NDTPA_LOCKTIME: NEIGH_VAR_SET(p, LOCKTIME, nla_get_msecs(tbp[i])); break; } } } err = -ENOENT; if ((tb[NDTA_THRESH1] || tb[NDTA_THRESH2] || tb[NDTA_THRESH3] || tb[NDTA_GC_INTERVAL]) && !net_eq(net, &init_net)) goto errout_tbl_lock; if (tb[NDTA_THRESH1]) WRITE_ONCE(tbl->gc_thresh1, nla_get_u32(tb[NDTA_THRESH1])); if (tb[NDTA_THRESH2]) WRITE_ONCE(tbl->gc_thresh2, nla_get_u32(tb[NDTA_THRESH2])); if (tb[NDTA_THRESH3]) WRITE_ONCE(tbl->gc_thresh3, nla_get_u32(tb[NDTA_THRESH3])); if (tb[NDTA_GC_INTERVAL]) WRITE_ONCE(tbl->gc_interval, nla_get_msecs(tb[NDTA_GC_INTERVAL])); err = 0; errout_tbl_lock: write_unlock_bh(&tbl->lock); errout: return err; } static int neightbl_valid_dump_info(const struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct ndtmsg *ndtm; if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndtm))) { NL_SET_ERR_MSG(extack, "Invalid header for neighbor table dump request"); return -EINVAL; } ndtm = nlmsg_data(nlh); if (ndtm->ndtm_pad1 || ndtm->ndtm_pad2) { NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor table dump request"); return -EINVAL; } if (nlmsg_attrlen(nlh, sizeof(*ndtm))) { NL_SET_ERR_MSG(extack, "Invalid data after header in neighbor table dump request"); return -EINVAL; } return 0; } static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); int family, tidx, nidx = 0; int tbl_skip = cb->args[0]; int neigh_skip = cb->args[1]; struct neigh_table *tbl; if (cb->strict_check) { int err = neightbl_valid_dump_info(nlh, cb->extack); if (err < 0) return err; } family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family; for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) { struct neigh_parms *p; tbl = rcu_dereference_rtnl(neigh_tables[tidx]); if (!tbl) continue; if (tidx < tbl_skip || (family && tbl->family != family)) continue; if (neightbl_fill_info(skb, tbl, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWNEIGHTBL, NLM_F_MULTI) < 0) break; nidx = 0; p = list_next_entry(&tbl->parms, list); list_for_each_entry_from(p, &tbl->parms_list, list) { if (!net_eq(neigh_parms_net(p), net)) continue; if (nidx < neigh_skip) goto next; if (neightbl_fill_param_info(skb, tbl, p, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWNEIGHTBL, NLM_F_MULTI) < 0) goto out; next: nidx++; } neigh_skip = 0; } out: cb->args[0] = tidx; cb->args[1] = nidx; return skb->len; } static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh, u32 pid, u32 seq, int type, unsigned int flags) { u32 neigh_flags, neigh_flags_ext; unsigned long now = jiffies; struct nda_cacheinfo ci; struct nlmsghdr *nlh; struct ndmsg *ndm; nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndm), flags); if (nlh == NULL) return -EMSGSIZE; neigh_flags_ext = neigh->flags >> NTF_EXT_SHIFT; neigh_flags = neigh->flags & NTF_OLD_MASK; ndm = nlmsg_data(nlh); ndm->ndm_family = neigh->ops->family; ndm->ndm_pad1 = 0; ndm->ndm_pad2 = 0; ndm->ndm_flags = neigh_flags; ndm->ndm_type = neigh->type; ndm->ndm_ifindex = neigh->dev->ifindex; if (nla_put(skb, NDA_DST, neigh->tbl->key_len, neigh->primary_key)) goto nla_put_failure; read_lock_bh(&neigh->lock); ndm->ndm_state = neigh->nud_state; if (neigh->nud_state & NUD_VALID) { char haddr[MAX_ADDR_LEN]; neigh_ha_snapshot(haddr, neigh, neigh->dev); if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0) { read_unlock_bh(&neigh->lock); goto nla_put_failure; } } ci.ndm_used = jiffies_to_clock_t(now - neigh->used); ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed); ci.ndm_updated = jiffies_to_clock_t(now - neigh->updated); ci.ndm_refcnt = refcount_read(&neigh->refcnt) - 1; read_unlock_bh(&neigh->lock); if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) || nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci)) goto nla_put_failure; if (neigh->protocol && nla_put_u8(skb, NDA_PROTOCOL, neigh->protocol)) goto nla_put_failure; if (neigh_flags_ext && nla_put_u32(skb, NDA_FLAGS_EXT, neigh_flags_ext)) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn, u32 pid, u32 seq, int type, unsigned int flags, struct neigh_table *tbl) { u32 neigh_flags, neigh_flags_ext; struct nlmsghdr *nlh; struct ndmsg *ndm; nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndm), flags); if (nlh == NULL) return -EMSGSIZE; neigh_flags_ext = pn->flags >> NTF_EXT_SHIFT; neigh_flags = pn->flags & NTF_OLD_MASK; ndm = nlmsg_data(nlh); ndm->ndm_family = tbl->family; ndm->ndm_pad1 = 0; ndm->ndm_pad2 = 0; ndm->ndm_flags = neigh_flags | NTF_PROXY; ndm->ndm_type = RTN_UNICAST; ndm->ndm_ifindex = pn->dev ? pn->dev->ifindex : 0; ndm->ndm_state = NUD_NONE; if (nla_put(skb, NDA_DST, tbl->key_len, pn->key)) goto nla_put_failure; if (pn->protocol && nla_put_u8(skb, NDA_PROTOCOL, pn->protocol)) goto nla_put_failure; if (neigh_flags_ext && nla_put_u32(skb, NDA_FLAGS_EXT, neigh_flags_ext)) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid) { call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); __neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid); } static bool neigh_master_filtered(struct net_device *dev, int master_idx) { struct net_device *master; if (!master_idx) return false; master = dev ? netdev_master_upper_dev_get_rcu(dev) : NULL; /* 0 is already used to denote NDA_MASTER wasn't passed, therefore need another * invalid value for ifindex to denote "no master". */ if (master_idx == -1) return !!master; if (!master || master->ifindex != master_idx) return true; return false; } static bool neigh_ifindex_filtered(struct net_device *dev, int filter_idx) { if (filter_idx && (!dev || dev->ifindex != filter_idx)) return true; return false; } struct neigh_dump_filter { int master_idx; int dev_idx; }; static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb, struct neigh_dump_filter *filter) { struct net *net = sock_net(skb->sk); struct neighbour *n; int err = 0, h, s_h = cb->args[1]; int idx, s_idx = idx = cb->args[2]; struct neigh_hash_table *nht; unsigned int flags = NLM_F_MULTI; if (filter->dev_idx || filter->master_idx) flags |= NLM_F_DUMP_FILTERED; nht = rcu_dereference(tbl->nht); for (h = s_h; h < (1 << nht->hash_shift); h++) { if (h > s_h) s_idx = 0; for (n = rcu_dereference(nht->hash_buckets[h]), idx = 0; n != NULL; n = rcu_dereference(n->next)) { if (idx < s_idx || !net_eq(dev_net(n->dev), net)) goto next; if (neigh_ifindex_filtered(n->dev, filter->dev_idx) || neigh_master_filtered(n->dev, filter->master_idx)) goto next; err = neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, flags); if (err < 0) goto out; next: idx++; } } out: cb->args[1] = h; cb->args[2] = idx; return err; } static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb, struct neigh_dump_filter *filter) { struct pneigh_entry *n; struct net *net = sock_net(skb->sk); int err = 0, h, s_h = cb->args[3]; int idx, s_idx = idx = cb->args[4]; unsigned int flags = NLM_F_MULTI; if (filter->dev_idx || filter->master_idx) flags |= NLM_F_DUMP_FILTERED; read_lock_bh(&tbl->lock); for (h = s_h; h <= PNEIGH_HASHMASK; h++) { if (h > s_h) s_idx = 0; for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) { if (idx < s_idx || pneigh_net(n) != net) goto next; if (neigh_ifindex_filtered(n->dev, filter->dev_idx) || neigh_master_filtered(n->dev, filter->master_idx)) goto next; err = pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, flags, tbl); if (err < 0) { read_unlock_bh(&tbl->lock); goto out; } next: idx++; } } read_unlock_bh(&tbl->lock); out: cb->args[3] = h; cb->args[4] = idx; return err; } static int neigh_valid_dump_req(const struct nlmsghdr *nlh, bool strict_check, struct neigh_dump_filter *filter, struct netlink_ext_ack *extack) { struct nlattr *tb[NDA_MAX + 1]; int err, i; if (strict_check) { struct ndmsg *ndm; if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) { NL_SET_ERR_MSG(extack, "Invalid header for neighbor dump request"); return -EINVAL; } ndm = nlmsg_data(nlh); if (ndm->ndm_pad1 || ndm->ndm_pad2 || ndm->ndm_ifindex || ndm->ndm_state || ndm->ndm_type) { NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor dump request"); return -EINVAL; } if (ndm->ndm_flags & ~NTF_PROXY) { NL_SET_ERR_MSG(extack, "Invalid flags in header for neighbor dump request"); return -EINVAL; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX, nda_policy, extack); } else { err = nlmsg_parse_deprecated(nlh, sizeof(struct ndmsg), tb, NDA_MAX, nda_policy, extack); } if (err < 0) return err; for (i = 0; i <= NDA_MAX; ++i) { if (!tb[i]) continue; /* all new attributes should require strict_check */ switch (i) { case NDA_IFINDEX: filter->dev_idx = nla_get_u32(tb[i]); break; case NDA_MASTER: filter->master_idx = nla_get_u32(tb[i]); break; default: if (strict_check) { NL_SET_ERR_MSG(extack, "Unsupported attribute in neighbor dump request"); return -EINVAL; } } } return 0; } static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; struct neigh_dump_filter filter = {}; struct neigh_table *tbl; int t, family, s_t; int proxy = 0; int err; family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family; /* check for full ndmsg structure presence, family member is * the same for both structures */ if (nlmsg_len(nlh) >= sizeof(struct ndmsg) && ((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY) proxy = 1; err = neigh_valid_dump_req(nlh, cb->strict_check, &filter, cb->extack); if (err < 0 && cb->strict_check) return err; s_t = cb->args[0]; rcu_read_lock(); for (t = 0; t < NEIGH_NR_TABLES; t++) { tbl = rcu_dereference(neigh_tables[t]); if (!tbl) continue; if (t < s_t || (family && tbl->family != family)) continue; if (t > s_t) memset(&cb->args[1], 0, sizeof(cb->args) - sizeof(cb->args[0])); if (proxy) err = pneigh_dump_table(tbl, skb, cb, &filter); else err = neigh_dump_table(tbl, skb, cb, &filter); if (err < 0) break; } rcu_read_unlock(); cb->args[0] = t; return err; } static int neigh_valid_get_req(const struct nlmsghdr *nlh, struct neigh_table **tbl, void **dst, int *dev_idx, u8 *ndm_flags, struct netlink_ext_ack *extack) { struct nlattr *tb[NDA_MAX + 1]; struct ndmsg *ndm; int err, i; if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) { NL_SET_ERR_MSG(extack, "Invalid header for neighbor get request"); return -EINVAL; } ndm = nlmsg_data(nlh); if (ndm->ndm_pad1 || ndm->ndm_pad2 || ndm->ndm_state || ndm->ndm_type) { NL_SET_ERR_MSG(extack, "Invalid values in header for neighbor get request"); return -EINVAL; } if (ndm->ndm_flags & ~NTF_PROXY) { NL_SET_ERR_MSG(extack, "Invalid flags in header for neighbor get request"); return -EINVAL; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(struct ndmsg), tb, NDA_MAX, nda_policy, extack); if (err < 0) return err; *ndm_flags = ndm->ndm_flags; *dev_idx = ndm->ndm_ifindex; *tbl = neigh_find_table(ndm->ndm_family); if (*tbl == NULL) { NL_SET_ERR_MSG(extack, "Unsupported family in header for neighbor get request"); return -EAFNOSUPPORT; } for (i = 0; i <= NDA_MAX; ++i) { if (!tb[i]) continue; switch (i) { case NDA_DST: if (nla_len(tb[i]) != (int)(*tbl)->key_len) { NL_SET_ERR_MSG(extack, "Invalid network address in neighbor get request"); return -EINVAL; } *dst = nla_data(tb[i]); break; default: NL_SET_ERR_MSG(extack, "Unsupported attribute in neighbor get request"); return -EINVAL; } } return 0; } static inline size_t neigh_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(MAX_ADDR_LEN) /* NDA_DST */ + nla_total_size(MAX_ADDR_LEN) /* NDA_LLADDR */ + nla_total_size(sizeof(struct nda_cacheinfo)) + nla_total_size(4) /* NDA_PROBES */ + nla_total_size(4) /* NDA_FLAGS_EXT */ + nla_total_size(1); /* NDA_PROTOCOL */ } static int neigh_get_reply(struct net *net, struct neighbour *neigh, u32 pid, u32 seq) { struct sk_buff *skb; int err = 0; skb = nlmsg_new(neigh_nlmsg_size(), GFP_KERNEL); if (!skb) return -ENOBUFS; err = neigh_fill_info(skb, neigh, pid, seq, RTM_NEWNEIGH, 0); if (err) { kfree_skb(skb); goto errout; } err = rtnl_unicast(skb, net, pid); errout: return err; } static inline size_t pneigh_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(MAX_ADDR_LEN) /* NDA_DST */ + nla_total_size(4) /* NDA_FLAGS_EXT */ + nla_total_size(1); /* NDA_PROTOCOL */ } static int pneigh_get_reply(struct net *net, struct pneigh_entry *neigh, u32 pid, u32 seq, struct neigh_table *tbl) { struct sk_buff *skb; int err = 0; skb = nlmsg_new(pneigh_nlmsg_size(), GFP_KERNEL); if (!skb) return -ENOBUFS; err = pneigh_fill_info(skb, neigh, pid, seq, RTM_NEWNEIGH, 0, tbl); if (err) { kfree_skb(skb); goto errout; } err = rtnl_unicast(skb, net, pid); errout: return err; } static int neigh_get(struct sk_buff *in_skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(in_skb->sk); struct net_device *dev = NULL; struct neigh_table *tbl = NULL; struct neighbour *neigh; void *dst = NULL; u8 ndm_flags = 0; int dev_idx = 0; int err; err = neigh_valid_get_req(nlh, &tbl, &dst, &dev_idx, &ndm_flags, extack); if (err < 0) return err; if (dev_idx) { dev = __dev_get_by_index(net, dev_idx); if (!dev) { NL_SET_ERR_MSG(extack, "Unknown device ifindex"); return -ENODEV; } } if (!dst) { NL_SET_ERR_MSG(extack, "Network address not specified"); return -EINVAL; } if (ndm_flags & NTF_PROXY) { struct pneigh_entry *pn; pn = pneigh_lookup(tbl, net, dst, dev, 0); if (!pn) { NL_SET_ERR_MSG(extack, "Proxy neighbour entry not found"); return -ENOENT; } return pneigh_get_reply(net, pn, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, tbl); } if (!dev) { NL_SET_ERR_MSG(extack, "No device specified"); return -EINVAL; } neigh = neigh_lookup(tbl, dst, dev); if (!neigh) { NL_SET_ERR_MSG(extack, "Neighbour entry not found"); return -ENOENT; } err = neigh_get_reply(net, neigh, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq); neigh_release(neigh); return err; } void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie) { int chain; struct neigh_hash_table *nht; rcu_read_lock(); nht = rcu_dereference(tbl->nht); read_lock_bh(&tbl->lock); /* avoid resizes */ for (chain = 0; chain < (1 << nht->hash_shift); chain++) { struct neighbour *n; for (n = rcu_dereference(nht->hash_buckets[chain]); n != NULL; n = rcu_dereference(n->next)) cb(n, cookie); } read_unlock_bh(&tbl->lock); rcu_read_unlock(); } EXPORT_SYMBOL(neigh_for_each); /* The tbl->lock must be held as a writer and BH disabled. */ void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *)) { int chain; struct neigh_hash_table *nht; nht = rcu_dereference_protected(tbl->nht, lockdep_is_held(&tbl->lock)); for (chain = 0; chain < (1 << nht->hash_shift); chain++) { struct neighbour *n; struct neighbour __rcu **np; np = &nht->hash_buckets[chain]; while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock))) != NULL) { int release; write_lock(&n->lock); release = cb(n); if (release) { rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(&tbl->lock))); neigh_mark_dead(n); } else np = &n->next; write_unlock(&n->lock); if (release) neigh_cleanup_and_release(n); } } } EXPORT_SYMBOL(__neigh_for_each_release); int neigh_xmit(int index, struct net_device *dev, const void *addr, struct sk_buff *skb) { int err = -EAFNOSUPPORT; if (likely(index < NEIGH_NR_TABLES)) { struct neigh_table *tbl; struct neighbour *neigh; rcu_read_lock(); tbl = rcu_dereference(neigh_tables[index]); if (!tbl) goto out_unlock; if (index == NEIGH_ARP_TABLE) { u32 key = *((u32 *)addr); neigh = __ipv4_neigh_lookup_noref(dev, key); } else { neigh = __neigh_lookup_noref(tbl, addr, dev); } if (!neigh) neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh); if (IS_ERR(neigh)) { rcu_read_unlock(); goto out_kfree_skb; } err = READ_ONCE(neigh->output)(neigh, skb); out_unlock: rcu_read_unlock(); } else if (index == NEIGH_LINK_TABLE) { err = dev_hard_header(skb, dev, ntohs(skb->protocol), addr, NULL, skb->len); if (err < 0) goto out_kfree_skb; err = dev_queue_xmit(skb); } out: return err; out_kfree_skb: kfree_skb(skb); goto out; } EXPORT_SYMBOL(neigh_xmit); #ifdef CONFIG_PROC_FS static struct neighbour *neigh_get_first(struct seq_file *seq) { struct neigh_seq_state *state = seq->private; struct net *net = seq_file_net(seq); struct neigh_hash_table *nht = state->nht; struct neighbour *n = NULL; int bucket; state->flags &= ~NEIGH_SEQ_IS_PNEIGH; for (bucket = 0; bucket < (1 << nht->hash_shift); bucket++) { n = rcu_dereference(nht->hash_buckets[bucket]); while (n) { if (!net_eq(dev_net(n->dev), net)) goto next; if (state->neigh_sub_iter) { loff_t fakep = 0; void *v; v = state->neigh_sub_iter(state, n, &fakep); if (!v) goto next; } if (!(state->flags & NEIGH_SEQ_SKIP_NOARP)) break; if (READ_ONCE(n->nud_state) & ~NUD_NOARP) break; next: n = rcu_dereference(n->next); } if (n) break; } state->bucket = bucket; return n; } static struct neighbour *neigh_get_next(struct seq_file *seq, struct neighbour *n, loff_t *pos) { struct neigh_seq_state *state = seq->private; struct net *net = seq_file_net(seq); struct neigh_hash_table *nht = state->nht; if (state->neigh_sub_iter) { void *v = state->neigh_sub_iter(state, n, pos); if (v) return n; } n = rcu_dereference(n->next); while (1) { while (n) { if (!net_eq(dev_net(n->dev), net)) goto next; if (state->neigh_sub_iter) { void *v = state->neigh_sub_iter(state, n, pos); if (v) return n; goto next; } if (!(state->flags & NEIGH_SEQ_SKIP_NOARP)) break; if (READ_ONCE(n->nud_state) & ~NUD_NOARP) break; next: n = rcu_dereference(n->next); } if (n) break; if (++state->bucket >= (1 << nht->hash_shift)) break; n = rcu_dereference(nht->hash_buckets[state->bucket]); } if (n && pos) --(*pos); return n; } static struct neighbour *neigh_get_idx(struct seq_file *seq, loff_t *pos) { struct neighbour *n = neigh_get_first(seq); if (n) { --(*pos); while (*pos) { n = neigh_get_next(seq, n, pos); if (!n) break; } } return *pos ? NULL : n; } static struct pneigh_entry *pneigh_get_first(struct seq_file *seq) { struct neigh_seq_state *state = seq->private; struct net *net = seq_file_net(seq); struct neigh_table *tbl = state->tbl; struct pneigh_entry *pn = NULL; int bucket; state->flags |= NEIGH_SEQ_IS_PNEIGH; for (bucket = 0; bucket <= PNEIGH_HASHMASK; bucket++) { pn = tbl->phash_buckets[bucket]; while (pn && !net_eq(pneigh_net(pn), net)) pn = pn->next; if (pn) break; } state->bucket = bucket; return pn; } static struct pneigh_entry *pneigh_get_next(struct seq_file *seq, struct pneigh_entry *pn, loff_t *pos) { struct neigh_seq_state *state = seq->private; struct net *net = seq_file_net(seq); struct neigh_table *tbl = state->tbl; do { pn = pn->next; } while (pn && !net_eq(pneigh_net(pn), net)); while (!pn) { if (++state->bucket > PNEIGH_HASHMASK) break; pn = tbl->phash_buckets[state->bucket]; while (pn && !net_eq(pneigh_net(pn), net)) pn = pn->next; if (pn) break; } if (pn && pos) --(*pos); return pn; } static struct pneigh_entry *pneigh_get_idx(struct seq_file *seq, loff_t *pos) { struct pneigh_entry *pn = pneigh_get_first(seq); if (pn) { --(*pos); while (*pos) { pn = pneigh_get_next(seq, pn, pos); if (!pn) break; } } return *pos ? NULL : pn; } static void *neigh_get_idx_any(struct seq_file *seq, loff_t *pos) { struct neigh_seq_state *state = seq->private; void *rc; loff_t idxpos = *pos; rc = neigh_get_idx(seq, &idxpos); if (!rc && !(state->flags & NEIGH_SEQ_NEIGH_ONLY)) rc = pneigh_get_idx(seq, &idxpos); return rc; } void *neigh_seq_start(struct seq_file *seq, loff_t *pos, struct neigh_table *tbl, unsigned int neigh_seq_flags) __acquires(tbl->lock) __acquires(rcu) { struct neigh_seq_state *state = seq->private; state->tbl = tbl; state->bucket = 0; state->flags = (neigh_seq_flags & ~NEIGH_SEQ_IS_PNEIGH); rcu_read_lock(); state->nht = rcu_dereference(tbl->nht); read_lock_bh(&tbl->lock); return *pos ? neigh_get_idx_any(seq, pos) : SEQ_START_TOKEN; } EXPORT_SYMBOL(neigh_seq_start); void *neigh_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct neigh_seq_state *state; void *rc; if (v == SEQ_START_TOKEN) { rc = neigh_get_first(seq); goto out; } state = seq->private; if (!(state->flags & NEIGH_SEQ_IS_PNEIGH)) { rc = neigh_get_next(seq, v, NULL); if (rc) goto out; if (!(state->flags & NEIGH_SEQ_NEIGH_ONLY)) rc = pneigh_get_first(seq); } else { BUG_ON(state->flags & NEIGH_SEQ_NEIGH_ONLY); rc = pneigh_get_next(seq, v, NULL); } out: ++(*pos); return rc; } EXPORT_SYMBOL(neigh_seq_next); void neigh_seq_stop(struct seq_file *seq, void *v) __releases(tbl->lock) __releases(rcu) { struct neigh_seq_state *state = seq->private; struct neigh_table *tbl = state->tbl; read_unlock_bh(&tbl->lock); rcu_read_unlock(); } EXPORT_SYMBOL(neigh_seq_stop); /* statistics via seq_file */ static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos) { struct neigh_table *tbl = pde_data(file_inode(seq->file)); int cpu; if (*pos == 0) return SEQ_START_TOKEN; for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { if (!cpu_possible(cpu)) continue; *pos = cpu+1; return per_cpu_ptr(tbl->stats, cpu); } return NULL; } static void *neigh_stat_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct neigh_table *tbl = pde_data(file_inode(seq->file)); int cpu; for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { if (!cpu_possible(cpu)) continue; *pos = cpu+1; return per_cpu_ptr(tbl->stats, cpu); } (*pos)++; return NULL; } static void neigh_stat_seq_stop(struct seq_file *seq, void *v) { } static int neigh_stat_seq_show(struct seq_file *seq, void *v) { struct neigh_table *tbl = pde_data(file_inode(seq->file)); struct neigh_statistics *st = v; if (v == SEQ_START_TOKEN) { seq_puts(seq, "entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs unresolved_discards table_fulls\n"); return 0; } seq_printf(seq, "%08x %08lx %08lx %08lx %08lx %08lx %08lx " "%08lx %08lx %08lx " "%08lx %08lx %08lx\n", atomic_read(&tbl->entries), st->allocs, st->destroys, st->hash_grows, st->lookups, st->hits, st->res_failed, st->rcv_probes_mcast, st->rcv_probes_ucast, st->periodic_gc_runs, st->forced_gc_runs, st->unres_discards, st->table_fulls ); return 0; } static const struct seq_operations neigh_stat_seq_ops = { .start = neigh_stat_seq_start, .next = neigh_stat_seq_next, .stop = neigh_stat_seq_stop, .show = neigh_stat_seq_show, }; #endif /* CONFIG_PROC_FS */ static void __neigh_notify(struct neighbour *n, int type, int flags, u32 pid) { struct net *net = dev_net(n->dev); struct sk_buff *skb; int err = -ENOBUFS; skb = nlmsg_new(neigh_nlmsg_size(), GFP_ATOMIC); if (skb == NULL) goto errout; err = neigh_fill_info(skb, n, pid, 0, type, flags); if (err < 0) { /* -EMSGSIZE implies BUG in neigh_nlmsg_size() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC); return; errout: if (err < 0) rtnl_set_sk_err(net, RTNLGRP_NEIGH, err); } void neigh_app_ns(struct neighbour *n) { __neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST, 0); } EXPORT_SYMBOL(neigh_app_ns); #ifdef CONFIG_SYSCTL static int unres_qlen_max = INT_MAX / SKB_TRUESIZE(ETH_FRAME_LEN); static int proc_unres_qlen(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int size, ret; struct ctl_table tmp = *ctl; tmp.extra1 = SYSCTL_ZERO; tmp.extra2 = &unres_qlen_max; tmp.data = &size; size = *(int *)ctl->data / SKB_TRUESIZE(ETH_FRAME_LEN); ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && !ret) *(int *)ctl->data = size * SKB_TRUESIZE(ETH_FRAME_LEN); return ret; } static void neigh_copy_dflt_parms(struct net *net, struct neigh_parms *p, int index) { struct net_device *dev; int family = neigh_parms_family(p); rcu_read_lock(); for_each_netdev_rcu(net, dev) { struct neigh_parms *dst_p = neigh_get_dev_parms_rcu(dev, family); if (dst_p && !test_bit(index, dst_p->data_state)) dst_p->data[index] = p->data[index]; } rcu_read_unlock(); } static void neigh_proc_update(const struct ctl_table *ctl, int write) { struct net_device *dev = ctl->extra1; struct neigh_parms *p = ctl->extra2; struct net *net = neigh_parms_net(p); int index = (int *) ctl->data - p->data; if (!write) return; set_bit(index, p->data_state); if (index == NEIGH_VAR_DELAY_PROBE_TIME) call_netevent_notifiers(NETEVENT_DELAY_PROBE_TIME_UPDATE, p); if (!dev) /* NULL dev means this is default value */ neigh_copy_dflt_parms(net, p, index); } static int neigh_proc_dointvec_zero_intmax(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table tmp = *ctl; int ret; tmp.extra1 = SYSCTL_ZERO; tmp.extra2 = SYSCTL_INT_MAX; ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } static int neigh_proc_dointvec_ms_jiffies_positive(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table tmp = *ctl; int ret; int min = msecs_to_jiffies(1); tmp.extra1 = &min; tmp.extra2 = NULL; ret = proc_dointvec_ms_jiffies_minmax(&tmp, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } int neigh_proc_dointvec(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = proc_dointvec(ctl, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } EXPORT_SYMBOL(neigh_proc_dointvec); int neigh_proc_dointvec_jiffies(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = proc_dointvec_jiffies(ctl, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } EXPORT_SYMBOL(neigh_proc_dointvec_jiffies); static int neigh_proc_dointvec_userhz_jiffies(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = proc_dointvec_userhz_jiffies(ctl, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } int neigh_proc_dointvec_ms_jiffies(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = proc_dointvec_ms_jiffies(ctl, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } EXPORT_SYMBOL(neigh_proc_dointvec_ms_jiffies); static int neigh_proc_dointvec_unres_qlen(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = proc_unres_qlen(ctl, write, buffer, lenp, ppos); neigh_proc_update(ctl, write); return ret; } static int neigh_proc_base_reachable_time(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct neigh_parms *p = ctl->extra2; int ret; if (strcmp(ctl->procname, "base_reachable_time") == 0) ret = neigh_proc_dointvec_jiffies(ctl, write, buffer, lenp, ppos); else if (strcmp(ctl->procname, "base_reachable_time_ms") == 0) ret = neigh_proc_dointvec_ms_jiffies(ctl, write, buffer, lenp, ppos); else ret = -1; if (write && ret == 0) { /* update reachable_time as well, otherwise, the change will * only be effective after the next time neigh_periodic_work * decides to recompute it */ p->reachable_time = neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME)); } return ret; } #define NEIGH_PARMS_DATA_OFFSET(index) \ (&((struct neigh_parms *) 0)->data[index]) #define NEIGH_SYSCTL_ENTRY(attr, data_attr, name, mval, proc) \ [NEIGH_VAR_ ## attr] = { \ .procname = name, \ .data = NEIGH_PARMS_DATA_OFFSET(NEIGH_VAR_ ## data_attr), \ .maxlen = sizeof(int), \ .mode = mval, \ .proc_handler = proc, \ } #define NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(attr, name) \ NEIGH_SYSCTL_ENTRY(attr, attr, name, 0644, neigh_proc_dointvec_zero_intmax) #define NEIGH_SYSCTL_JIFFIES_ENTRY(attr, name) \ NEIGH_SYSCTL_ENTRY(attr, attr, name, 0644, neigh_proc_dointvec_jiffies) #define NEIGH_SYSCTL_USERHZ_JIFFIES_ENTRY(attr, name) \ NEIGH_SYSCTL_ENTRY(attr, attr, name, 0644, neigh_proc_dointvec_userhz_jiffies) #define NEIGH_SYSCTL_MS_JIFFIES_POSITIVE_ENTRY(attr, name) \ NEIGH_SYSCTL_ENTRY(attr, attr, name, 0644, neigh_proc_dointvec_ms_jiffies_positive) #define NEIGH_SYSCTL_MS_JIFFIES_REUSED_ENTRY(attr, data_attr, name) \ NEIGH_SYSCTL_ENTRY(attr, data_attr, name, 0644, neigh_proc_dointvec_ms_jiffies) #define NEIGH_SYSCTL_UNRES_QLEN_REUSED_ENTRY(attr, data_attr, name) \ NEIGH_SYSCTL_ENTRY(attr, data_attr, name, 0644, neigh_proc_dointvec_unres_qlen) static struct neigh_sysctl_table { struct ctl_table_header *sysctl_header; struct ctl_table neigh_vars[NEIGH_VAR_MAX]; } neigh_sysctl_template __read_mostly = { .neigh_vars = { NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(MCAST_PROBES, "mcast_solicit"), NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(UCAST_PROBES, "ucast_solicit"), NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(APP_PROBES, "app_solicit"), NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(MCAST_REPROBES, "mcast_resolicit"), NEIGH_SYSCTL_USERHZ_JIFFIES_ENTRY(RETRANS_TIME, "retrans_time"), NEIGH_SYSCTL_JIFFIES_ENTRY(BASE_REACHABLE_TIME, "base_reachable_time"), NEIGH_SYSCTL_JIFFIES_ENTRY(DELAY_PROBE_TIME, "delay_first_probe_time"), NEIGH_SYSCTL_MS_JIFFIES_POSITIVE_ENTRY(INTERVAL_PROBE_TIME_MS, "interval_probe_time_ms"), NEIGH_SYSCTL_JIFFIES_ENTRY(GC_STALETIME, "gc_stale_time"), NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(QUEUE_LEN_BYTES, "unres_qlen_bytes"), NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(PROXY_QLEN, "proxy_qlen"), NEIGH_SYSCTL_USERHZ_JIFFIES_ENTRY(ANYCAST_DELAY, "anycast_delay"), NEIGH_SYSCTL_USERHZ_JIFFIES_ENTRY(PROXY_DELAY, "proxy_delay"), NEIGH_SYSCTL_USERHZ_JIFFIES_ENTRY(LOCKTIME, "locktime"), NEIGH_SYSCTL_UNRES_QLEN_REUSED_ENTRY(QUEUE_LEN, QUEUE_LEN_BYTES, "unres_qlen"), NEIGH_SYSCTL_MS_JIFFIES_REUSED_ENTRY(RETRANS_TIME_MS, RETRANS_TIME, "retrans_time_ms"), NEIGH_SYSCTL_MS_JIFFIES_REUSED_ENTRY(BASE_REACHABLE_TIME_MS, BASE_REACHABLE_TIME, "base_reachable_time_ms"), [NEIGH_VAR_GC_INTERVAL] = { .procname = "gc_interval", .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, [NEIGH_VAR_GC_THRESH1] = { .procname = "gc_thresh1", .maxlen = sizeof(int), .mode = 0644, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, .proc_handler = proc_dointvec_minmax, }, [NEIGH_VAR_GC_THRESH2] = { .procname = "gc_thresh2", .maxlen = sizeof(int), .mode = 0644, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, .proc_handler = proc_dointvec_minmax, }, [NEIGH_VAR_GC_THRESH3] = { .procname = "gc_thresh3", .maxlen = sizeof(int), .mode = 0644, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_INT_MAX, .proc_handler = proc_dointvec_minmax, }, }, }; int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p, proc_handler *handler) { int i; struct neigh_sysctl_table *t; const char *dev_name_source; char neigh_path[ sizeof("net//neigh/") + IFNAMSIZ + IFNAMSIZ ]; char *p_name; size_t neigh_vars_size; t = kmemdup(&neigh_sysctl_template, sizeof(*t), GFP_KERNEL_ACCOUNT); if (!t) goto err; for (i = 0; i < NEIGH_VAR_GC_INTERVAL; i++) { t->neigh_vars[i].data += (long) p; t->neigh_vars[i].extra1 = dev; t->neigh_vars[i].extra2 = p; } neigh_vars_size = ARRAY_SIZE(t->neigh_vars); if (dev) { dev_name_source = dev->name; /* Terminate the table early */ neigh_vars_size = NEIGH_VAR_BASE_REACHABLE_TIME_MS + 1; } else { struct neigh_table *tbl = p->tbl; dev_name_source = "default"; t->neigh_vars[NEIGH_VAR_GC_INTERVAL].data = &tbl->gc_interval; t->neigh_vars[NEIGH_VAR_GC_THRESH1].data = &tbl->gc_thresh1; t->neigh_vars[NEIGH_VAR_GC_THRESH2].data = &tbl->gc_thresh2; t->neigh_vars[NEIGH_VAR_GC_THRESH3].data = &tbl->gc_thresh3; } if (handler) { /* RetransTime */ t->neigh_vars[NEIGH_VAR_RETRANS_TIME].proc_handler = handler; /* ReachableTime */ t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].proc_handler = handler; /* RetransTime (in milliseconds)*/ t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].proc_handler = handler; /* ReachableTime (in milliseconds) */ t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler = handler; } else { /* Those handlers will update p->reachable_time after * base_reachable_time(_ms) is set to ensure the new timer starts being * applied after the next neighbour update instead of waiting for * neigh_periodic_work to update its value (can be multiple minutes) * So any handler that replaces them should do this as well */ /* ReachableTime */ t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].proc_handler = neigh_proc_base_reachable_time; /* ReachableTime (in milliseconds) */ t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler = neigh_proc_base_reachable_time; } switch (neigh_parms_family(p)) { case AF_INET: p_name = "ipv4"; break; case AF_INET6: p_name = "ipv6"; break; default: BUG(); } snprintf(neigh_path, sizeof(neigh_path), "net/%s/neigh/%s", p_name, dev_name_source); t->sysctl_header = register_net_sysctl_sz(neigh_parms_net(p), neigh_path, t->neigh_vars, neigh_vars_size); if (!t->sysctl_header) goto free; p->sysctl_table = t; return 0; free: kfree(t); err: return -ENOBUFS; } EXPORT_SYMBOL(neigh_sysctl_register); void neigh_sysctl_unregister(struct neigh_parms *p) { if (p->sysctl_table) { struct neigh_sysctl_table *t = p->sysctl_table; p->sysctl_table = NULL; unregister_net_sysctl_table(t->sysctl_header); kfree(t); } } EXPORT_SYMBOL(neigh_sysctl_unregister); #endif /* CONFIG_SYSCTL */ static int __init neigh_init(void) { rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL, 0); rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL, 0); rtnl_register(PF_UNSPEC, RTM_GETNEIGH, neigh_get, neigh_dump_info, RTNL_FLAG_DUMP_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info, 0); rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL, 0); return 0; } subsys_initcall(neigh_init); |
| 4 3 3 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | /* * Cryptographic API. * * T10 Data Integrity Field CRC16 Crypto Transform * * Copyright (c) 2007 Oracle Corporation. All rights reserved. * Written by Martin K. Petersen <martin.petersen@oracle.com> * Copyright (C) 2013 Intel Corporation * Author: Tim Chen <tim.c.chen@linux.intel.com> * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free * Software Foundation; either version 2 of the License, or (at your option) * any later version. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/module.h> #include <linux/crc-t10dif.h> #include <crypto/internal/hash.h> #include <linux/init.h> #include <linux/kernel.h> struct chksum_desc_ctx { __u16 crc; }; /* * Steps through buffer one byte at a time, calculates reflected * crc using table. */ static int chksum_init(struct shash_desc *desc) { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); ctx->crc = 0; return 0; } static int chksum_update(struct shash_desc *desc, const u8 *data, unsigned int length) { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); ctx->crc = crc_t10dif_generic(ctx->crc, data, length); return 0; } static int chksum_final(struct shash_desc *desc, u8 *out) { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); *(__u16 *)out = ctx->crc; return 0; } static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { *(__u16 *)out = crc_t10dif_generic(crc, data, len); return 0; } static int chksum_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); return __chksum_finup(ctx->crc, data, len, out); } static int chksum_digest(struct shash_desc *desc, const u8 *data, unsigned int length, u8 *out) { return __chksum_finup(0, data, length, out); } static struct shash_alg alg = { .digestsize = CRC_T10DIF_DIGEST_SIZE, .init = chksum_init, .update = chksum_update, .final = chksum_final, .finup = chksum_finup, .digest = chksum_digest, .descsize = sizeof(struct chksum_desc_ctx), .base = { .cra_name = "crct10dif", .cra_driver_name = "crct10dif-generic", .cra_priority = 100, .cra_blocksize = CRC_T10DIF_BLOCK_SIZE, .cra_module = THIS_MODULE, } }; static int __init crct10dif_mod_init(void) { return crypto_register_shash(&alg); } static void __exit crct10dif_mod_fini(void) { crypto_unregister_shash(&alg); } subsys_initcall(crct10dif_mod_init); module_exit(crct10dif_mod_fini); MODULE_AUTHOR("Tim Chen <tim.c.chen@linux.intel.com>"); MODULE_DESCRIPTION("T10 DIF CRC calculation."); MODULE_LICENSE("GPL"); MODULE_ALIAS_CRYPTO("crct10dif"); MODULE_ALIAS_CRYPTO("crct10dif-generic"); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 | /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ /* * Copyright (c) 2004 Topspin Communications. All rights reserved. * Copyright (c) 2005 Voltaire, Inc. All rights reserved. * Copyright (c) 2006 Intel Corporation. All rights reserved. */ #ifndef IB_SA_H #define IB_SA_H #include <linux/completion.h> #include <linux/compiler.h> #include <linux/atomic.h> #include <linux/netdevice.h> #include <rdma/ib_verbs.h> #include <rdma/ib_mad.h> #include <rdma/ib_addr.h> #include <rdma/opa_addr.h> enum { IB_SA_CLASS_VERSION = 2, /* IB spec version 1.1/1.2 */ IB_SA_METHOD_GET_TABLE = 0x12, IB_SA_METHOD_GET_TABLE_RESP = 0x92, IB_SA_METHOD_DELETE = 0x15, IB_SA_METHOD_DELETE_RESP = 0x95, IB_SA_METHOD_GET_MULTI = 0x14, IB_SA_METHOD_GET_MULTI_RESP = 0x94, IB_SA_METHOD_GET_TRACE_TBL = 0x13 }; #define OPA_SA_CLASS_VERSION 0x80 enum { IB_SA_ATTR_CLASS_PORTINFO = 0x01, IB_SA_ATTR_NOTICE = 0x02, IB_SA_ATTR_INFORM_INFO = 0x03, IB_SA_ATTR_NODE_REC = 0x11, IB_SA_ATTR_PORT_INFO_REC = 0x12, IB_SA_ATTR_SL2VL_REC = 0x13, IB_SA_ATTR_SWITCH_REC = 0x14, IB_SA_ATTR_LINEAR_FDB_REC = 0x15, IB_SA_ATTR_RANDOM_FDB_REC = 0x16, IB_SA_ATTR_MCAST_FDB_REC = 0x17, IB_SA_ATTR_SM_INFO_REC = 0x18, IB_SA_ATTR_LINK_REC = 0x20, IB_SA_ATTR_GUID_INFO_REC = 0x30, IB_SA_ATTR_SERVICE_REC = 0x31, IB_SA_ATTR_PARTITION_REC = 0x33, IB_SA_ATTR_PATH_REC = 0x35, IB_SA_ATTR_VL_ARB_REC = 0x36, IB_SA_ATTR_MC_MEMBER_REC = 0x38, IB_SA_ATTR_TRACE_REC = 0x39, IB_SA_ATTR_MULTI_PATH_REC = 0x3a, IB_SA_ATTR_SERVICE_ASSOC_REC = 0x3b, IB_SA_ATTR_INFORM_INFO_REC = 0xf3 }; enum ib_sa_selector { IB_SA_GT = 0, IB_SA_LT = 1, IB_SA_EQ = 2, /* * The meaning of "best" depends on the attribute: for * example, for MTU best will return the largest available * MTU, while for packet life time, best will return the * smallest available life time. */ IB_SA_BEST = 3 }; /* * There are 4 types of join states: * FullMember, NonMember, SendOnlyNonMember, SendOnlyFullMember. * The order corresponds to JoinState bits in MCMemberRecord. */ enum ib_sa_mc_join_states { FULLMEMBER_JOIN, NONMEMBER_JOIN, SENDONLY_NONMEBER_JOIN, SENDONLY_FULLMEMBER_JOIN, NUM_JOIN_MEMBERSHIP_TYPES, }; #define IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT BIT(12) /* * Structures for SA records are named "struct ib_sa_xxx_rec." No * attempt is made to pack structures to match the physical layout of * SA records in SA MADs; all packing and unpacking is handled by the * SA query code. * * For a record with structure ib_sa_xxx_rec, the naming convention * for the component mask value for field yyy is IB_SA_XXX_REC_YYY (we * never use different abbreviations or otherwise change the spelling * of xxx/yyy between ib_sa_xxx_rec.yyy and IB_SA_XXX_REC_YYY). * * Reserved rows are indicated with comments to help maintainability. */ #define IB_SA_PATH_REC_SERVICE_ID (IB_SA_COMP_MASK( 0) |\ IB_SA_COMP_MASK( 1)) #define IB_SA_PATH_REC_DGID IB_SA_COMP_MASK( 2) #define IB_SA_PATH_REC_SGID IB_SA_COMP_MASK( 3) #define IB_SA_PATH_REC_DLID IB_SA_COMP_MASK( 4) #define IB_SA_PATH_REC_SLID IB_SA_COMP_MASK( 5) #define IB_SA_PATH_REC_RAW_TRAFFIC IB_SA_COMP_MASK( 6) /* reserved: 7 */ #define IB_SA_PATH_REC_FLOW_LABEL IB_SA_COMP_MASK( 8) #define IB_SA_PATH_REC_HOP_LIMIT IB_SA_COMP_MASK( 9) #define IB_SA_PATH_REC_TRAFFIC_CLASS IB_SA_COMP_MASK(10) #define IB_SA_PATH_REC_REVERSIBLE IB_SA_COMP_MASK(11) #define IB_SA_PATH_REC_NUMB_PATH IB_SA_COMP_MASK(12) #define IB_SA_PATH_REC_PKEY IB_SA_COMP_MASK(13) #define IB_SA_PATH_REC_QOS_CLASS IB_SA_COMP_MASK(14) #define IB_SA_PATH_REC_SL IB_SA_COMP_MASK(15) #define IB_SA_PATH_REC_MTU_SELECTOR IB_SA_COMP_MASK(16) #define IB_SA_PATH_REC_MTU IB_SA_COMP_MASK(17) #define IB_SA_PATH_REC_RATE_SELECTOR IB_SA_COMP_MASK(18) #define IB_SA_PATH_REC_RATE IB_SA_COMP_MASK(19) #define IB_SA_PATH_REC_PACKET_LIFE_TIME_SELECTOR IB_SA_COMP_MASK(20) #define IB_SA_PATH_REC_PACKET_LIFE_TIME IB_SA_COMP_MASK(21) #define IB_SA_PATH_REC_PREFERENCE IB_SA_COMP_MASK(22) enum sa_path_rec_type { SA_PATH_REC_TYPE_IB, SA_PATH_REC_TYPE_ROCE_V1, SA_PATH_REC_TYPE_ROCE_V2, SA_PATH_REC_TYPE_OPA }; struct sa_path_rec_ib { __be16 dlid; __be16 slid; u8 raw_traffic; }; /** * struct sa_path_rec_roce - RoCE specific portion of the path record entry * @route_resolved: When set, it indicates that this route is already * resolved for this path record entry. * @dmac: Destination mac address for the given DGID entry * of the path record entry. */ struct sa_path_rec_roce { bool route_resolved; u8 dmac[ETH_ALEN]; }; struct sa_path_rec_opa { __be32 dlid; __be32 slid; u8 raw_traffic; u8 l2_8B; u8 l2_10B; u8 l2_9B; u8 l2_16B; u8 qos_type; u8 qos_priority; }; struct sa_path_rec { union ib_gid dgid; union ib_gid sgid; __be64 service_id; /* reserved */ __be32 flow_label; u8 hop_limit; u8 traffic_class; u8 reversible; u8 numb_path; __be16 pkey; __be16 qos_class; u8 sl; u8 mtu_selector; u8 mtu; u8 rate_selector; u8 rate; u8 packet_life_time_selector; u8 packet_life_time; u8 preference; union { struct sa_path_rec_ib ib; struct sa_path_rec_roce roce; struct sa_path_rec_opa opa; }; enum sa_path_rec_type rec_type; u32 flags; }; static inline enum ib_gid_type sa_conv_pathrec_to_gid_type(struct sa_path_rec *rec) { switch (rec->rec_type) { case SA_PATH_REC_TYPE_ROCE_V1: return IB_GID_TYPE_ROCE; case SA_PATH_REC_TYPE_ROCE_V2: return IB_GID_TYPE_ROCE_UDP_ENCAP; default: return IB_GID_TYPE_IB; } } static inline enum sa_path_rec_type sa_conv_gid_to_pathrec_type(enum ib_gid_type type) { switch (type) { case IB_GID_TYPE_ROCE: return SA_PATH_REC_TYPE_ROCE_V1; case IB_GID_TYPE_ROCE_UDP_ENCAP: return SA_PATH_REC_TYPE_ROCE_V2; default: return SA_PATH_REC_TYPE_IB; } } static inline void path_conv_opa_to_ib(struct sa_path_rec *ib, struct sa_path_rec *opa) { if ((be32_to_cpu(opa->opa.dlid) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) || (be32_to_cpu(opa->opa.slid) >= be16_to_cpu(IB_MULTICAST_LID_BASE))) { /* Create OPA GID and zero out the LID */ ib->dgid.global.interface_id = OPA_MAKE_ID(be32_to_cpu(opa->opa.dlid)); ib->dgid.global.subnet_prefix = opa->dgid.global.subnet_prefix; ib->sgid.global.interface_id = OPA_MAKE_ID(be32_to_cpu(opa->opa.slid)); ib->dgid.global.subnet_prefix = opa->dgid.global.subnet_prefix; ib->ib.dlid = 0; ib->ib.slid = 0; } else { ib->ib.dlid = htons(ntohl(opa->opa.dlid)); ib->ib.slid = htons(ntohl(opa->opa.slid)); } ib->service_id = opa->service_id; ib->ib.raw_traffic = opa->opa.raw_traffic; } static inline void path_conv_ib_to_opa(struct sa_path_rec *opa, struct sa_path_rec *ib) { __be32 slid, dlid; if ((ib_is_opa_gid(&ib->sgid)) || (ib_is_opa_gid(&ib->dgid))) { slid = htonl(opa_get_lid_from_gid(&ib->sgid)); dlid = htonl(opa_get_lid_from_gid(&ib->dgid)); } else { slid = htonl(ntohs(ib->ib.slid)); dlid = htonl(ntohs(ib->ib.dlid)); } opa->opa.slid = slid; opa->opa.dlid = dlid; opa->service_id = ib->service_id; opa->opa.raw_traffic = ib->ib.raw_traffic; } /* Convert from OPA to IB path record */ static inline void sa_convert_path_opa_to_ib(struct sa_path_rec *dest, struct sa_path_rec *src) { if (src->rec_type != SA_PATH_REC_TYPE_OPA) return; *dest = *src; dest->rec_type = SA_PATH_REC_TYPE_IB; path_conv_opa_to_ib(dest, src); } /* Convert from IB to OPA path record */ static inline void sa_convert_path_ib_to_opa(struct sa_path_rec *dest, struct sa_path_rec *src) { if (src->rec_type != SA_PATH_REC_TYPE_IB) return; /* Do a structure copy and overwrite the relevant fields */ *dest = *src; dest->rec_type = SA_PATH_REC_TYPE_OPA; path_conv_ib_to_opa(dest, src); } #define IB_SA_MCMEMBER_REC_MGID IB_SA_COMP_MASK( 0) #define IB_SA_MCMEMBER_REC_PORT_GID IB_SA_COMP_MASK( 1) #define IB_SA_MCMEMBER_REC_QKEY IB_SA_COMP_MASK( 2) #define IB_SA_MCMEMBER_REC_MLID IB_SA_COMP_MASK( 3) #define IB_SA_MCMEMBER_REC_MTU_SELECTOR IB_SA_COMP_MASK( 4) #define IB_SA_MCMEMBER_REC_MTU IB_SA_COMP_MASK( 5) #define IB_SA_MCMEMBER_REC_TRAFFIC_CLASS IB_SA_COMP_MASK( 6) #define IB_SA_MCMEMBER_REC_PKEY IB_SA_COMP_MASK( 7) #define IB_SA_MCMEMBER_REC_RATE_SELECTOR IB_SA_COMP_MASK( 8) #define IB_SA_MCMEMBER_REC_RATE IB_SA_COMP_MASK( 9) #define IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR IB_SA_COMP_MASK(10) #define IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME IB_SA_COMP_MASK(11) #define IB_SA_MCMEMBER_REC_SL IB_SA_COMP_MASK(12) #define IB_SA_MCMEMBER_REC_FLOW_LABEL IB_SA_COMP_MASK(13) #define IB_SA_MCMEMBER_REC_HOP_LIMIT IB_SA_COMP_MASK(14) #define IB_SA_MCMEMBER_REC_SCOPE IB_SA_COMP_MASK(15) #define IB_SA_MCMEMBER_REC_JOIN_STATE IB_SA_COMP_MASK(16) #define IB_SA_MCMEMBER_REC_PROXY_JOIN IB_SA_COMP_MASK(17) struct ib_sa_mcmember_rec { union ib_gid mgid; union ib_gid port_gid; __be32 qkey; __be16 mlid; u8 mtu_selector; u8 mtu; u8 traffic_class; __be16 pkey; u8 rate_selector; u8 rate; u8 packet_life_time_selector; u8 packet_life_time; u8 sl; __be32 flow_label; u8 hop_limit; u8 scope; u8 join_state; u8 proxy_join; }; /* Service Record Component Mask Sec 15.2.5.14 Ver 1.1 */ #define IB_SA_SERVICE_REC_SERVICE_ID IB_SA_COMP_MASK( 0) #define IB_SA_SERVICE_REC_SERVICE_GID IB_SA_COMP_MASK( 1) #define IB_SA_SERVICE_REC_SERVICE_PKEY IB_SA_COMP_MASK( 2) /* reserved: 3 */ #define IB_SA_SERVICE_REC_SERVICE_LEASE IB_SA_COMP_MASK( 4) #define IB_SA_SERVICE_REC_SERVICE_KEY IB_SA_COMP_MASK( 5) #define IB_SA_SERVICE_REC_SERVICE_NAME IB_SA_COMP_MASK( 6) #define IB_SA_SERVICE_REC_SERVICE_DATA8_0 IB_SA_COMP_MASK( 7) #define IB_SA_SERVICE_REC_SERVICE_DATA8_1 IB_SA_COMP_MASK( 8) #define IB_SA_SERVICE_REC_SERVICE_DATA8_2 IB_SA_COMP_MASK( 9) #define IB_SA_SERVICE_REC_SERVICE_DATA8_3 IB_SA_COMP_MASK(10) #define IB_SA_SERVICE_REC_SERVICE_DATA8_4 IB_SA_COMP_MASK(11) #define IB_SA_SERVICE_REC_SERVICE_DATA8_5 IB_SA_COMP_MASK(12) #define IB_SA_SERVICE_REC_SERVICE_DATA8_6 IB_SA_COMP_MASK(13) #define IB_SA_SERVICE_REC_SERVICE_DATA8_7 IB_SA_COMP_MASK(14) #define IB_SA_SERVICE_REC_SERVICE_DATA8_8 IB_SA_COMP_MASK(15) #define IB_SA_SERVICE_REC_SERVICE_DATA8_9 IB_SA_COMP_MASK(16) #define IB_SA_SERVICE_REC_SERVICE_DATA8_10 IB_SA_COMP_MASK(17) #define IB_SA_SERVICE_REC_SERVICE_DATA8_11 IB_SA_COMP_MASK(18) #define IB_SA_SERVICE_REC_SERVICE_DATA8_12 IB_SA_COMP_MASK(19) #define IB_SA_SERVICE_REC_SERVICE_DATA8_13 IB_SA_COMP_MASK(20) #define IB_SA_SERVICE_REC_SERVICE_DATA8_14 IB_SA_COMP_MASK(21) #define IB_SA_SERVICE_REC_SERVICE_DATA8_15 IB_SA_COMP_MASK(22) #define IB_SA_SERVICE_REC_SERVICE_DATA16_0 IB_SA_COMP_MASK(23) #define IB_SA_SERVICE_REC_SERVICE_DATA16_1 IB_SA_COMP_MASK(24) #define IB_SA_SERVICE_REC_SERVICE_DATA16_2 IB_SA_COMP_MASK(25) #define IB_SA_SERVICE_REC_SERVICE_DATA16_3 IB_SA_COMP_MASK(26) #define IB_SA_SERVICE_REC_SERVICE_DATA16_4 IB_SA_COMP_MASK(27) #define IB_SA_SERVICE_REC_SERVICE_DATA16_5 IB_SA_COMP_MASK(28) #define IB_SA_SERVICE_REC_SERVICE_DATA16_6 IB_SA_COMP_MASK(29) #define IB_SA_SERVICE_REC_SERVICE_DATA16_7 IB_SA_COMP_MASK(30) #define IB_SA_SERVICE_REC_SERVICE_DATA32_0 IB_SA_COMP_MASK(31) #define IB_SA_SERVICE_REC_SERVICE_DATA32_1 IB_SA_COMP_MASK(32) #define IB_SA_SERVICE_REC_SERVICE_DATA32_2 IB_SA_COMP_MASK(33) #define IB_SA_SERVICE_REC_SERVICE_DATA32_3 IB_SA_COMP_MASK(34) #define IB_SA_SERVICE_REC_SERVICE_DATA64_0 IB_SA_COMP_MASK(35) #define IB_SA_SERVICE_REC_SERVICE_DATA64_1 IB_SA_COMP_MASK(36) #define IB_DEFAULT_SERVICE_LEASE 0xFFFFFFFF #define IB_SA_GUIDINFO_REC_LID IB_SA_COMP_MASK(0) #define IB_SA_GUIDINFO_REC_BLOCK_NUM IB_SA_COMP_MASK(1) #define IB_SA_GUIDINFO_REC_RES1 IB_SA_COMP_MASK(2) #define IB_SA_GUIDINFO_REC_RES2 IB_SA_COMP_MASK(3) #define IB_SA_GUIDINFO_REC_GID0 IB_SA_COMP_MASK(4) #define IB_SA_GUIDINFO_REC_GID1 IB_SA_COMP_MASK(5) #define IB_SA_GUIDINFO_REC_GID2 IB_SA_COMP_MASK(6) #define IB_SA_GUIDINFO_REC_GID3 IB_SA_COMP_MASK(7) #define IB_SA_GUIDINFO_REC_GID4 IB_SA_COMP_MASK(8) #define IB_SA_GUIDINFO_REC_GID5 IB_SA_COMP_MASK(9) #define IB_SA_GUIDINFO_REC_GID6 IB_SA_COMP_MASK(10) #define IB_SA_GUIDINFO_REC_GID7 IB_SA_COMP_MASK(11) struct ib_sa_guidinfo_rec { __be16 lid; u8 block_num; /* reserved */ u8 res1; __be32 res2; u8 guid_info_list[64]; }; struct ib_sa_client { atomic_t users; struct completion comp; }; /** * ib_sa_register_client - Register an SA client. */ void ib_sa_register_client(struct ib_sa_client *client); /** * ib_sa_unregister_client - Deregister an SA client. * @client: Client object to deregister. */ void ib_sa_unregister_client(struct ib_sa_client *client); struct ib_sa_query; void ib_sa_cancel_query(int id, struct ib_sa_query *query); int ib_sa_path_rec_get(struct ib_sa_client *client, struct ib_device *device, u32 port_num, struct sa_path_rec *rec, ib_sa_comp_mask comp_mask, unsigned long timeout_ms, gfp_t gfp_mask, void (*callback)(int status, struct sa_path_rec *resp, unsigned int num_prs, void *context), void *context, struct ib_sa_query **query); struct ib_sa_multicast { struct ib_sa_mcmember_rec rec; ib_sa_comp_mask comp_mask; int (*callback)(int status, struct ib_sa_multicast *multicast); void *context; }; /** * ib_sa_join_multicast - Initiates a join request to the specified multicast * group. * @client: SA client * @device: Device associated with the multicast group. * @port_num: Port on the specified device to associate with the multicast * group. * @rec: SA multicast member record specifying group attributes. * @comp_mask: Component mask indicating which group attributes of %rec are * valid. * @gfp_mask: GFP mask for memory allocations. * @callback: User callback invoked once the join operation completes. * @context: User specified context stored with the ib_sa_multicast structure. * * This call initiates a multicast join request with the SA for the specified * multicast group. If the join operation is started successfully, it returns * an ib_sa_multicast structure that is used to track the multicast operation. * Users must free this structure by calling ib_free_multicast, even if the * join operation later fails. (The callback status is non-zero.) * * If the join operation fails; status will be non-zero, with the following * failures possible: * -ETIMEDOUT: The request timed out. * -EIO: An error occurred sending the query. * -EINVAL: The MCMemberRecord values differed from the existing group's. * -ENETRESET: Indicates that an fatal error has occurred on the multicast * group, and the user must rejoin the group to continue using it. */ struct ib_sa_multicast *ib_sa_join_multicast(struct ib_sa_client *client, struct ib_device *device, u32 port_num, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, gfp_t gfp_mask, int (*callback)(int status, struct ib_sa_multicast *multicast), void *context); /** * ib_free_multicast - Frees the multicast tracking structure, and releases * any reference on the multicast group. * @multicast: Multicast tracking structure allocated by ib_join_multicast. * * This call blocks until the multicast identifier is destroyed. It may * not be called from within the multicast callback; however, returning a non- * zero value from the callback will result in destroying the multicast * tracking structure. */ void ib_sa_free_multicast(struct ib_sa_multicast *multicast); /** * ib_get_mcmember_rec - Looks up a multicast member record by its MGID and * returns it if found. * @device: Device associated with the multicast group. * @port_num: Port on the specified device to associate with the multicast * group. * @mgid: MGID of multicast group. * @rec: Location to copy SA multicast member record. */ int ib_sa_get_mcmember_rec(struct ib_device *device, u32 port_num, union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); /** * ib_init_ah_from_mcmember - Initialize address handle attributes based on * an SA multicast member record. */ int ib_init_ah_from_mcmember(struct ib_device *device, u32 port_num, struct ib_sa_mcmember_rec *rec, struct net_device *ndev, enum ib_gid_type gid_type, struct rdma_ah_attr *ah_attr); int ib_init_ah_attr_from_path(struct ib_device *device, u32 port_num, struct sa_path_rec *rec, struct rdma_ah_attr *ah_attr, const struct ib_gid_attr *sgid_attr); /** * ib_sa_pack_path - Conert a path record from struct ib_sa_path_rec * to IB MAD wire format. */ void ib_sa_pack_path(struct sa_path_rec *rec, void *attribute); /** * ib_sa_unpack_path - Convert a path record from MAD format to struct * ib_sa_path_rec. */ void ib_sa_unpack_path(void *attribute, struct sa_path_rec *rec); /* Support GuidInfoRecord */ int ib_sa_guid_info_rec_query(struct ib_sa_client *client, struct ib_device *device, u32 port_num, struct ib_sa_guidinfo_rec *rec, ib_sa_comp_mask comp_mask, u8 method, unsigned long timeout_ms, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_guidinfo_rec *resp, void *context), void *context, struct ib_sa_query **sa_query); static inline bool sa_path_is_roce(struct sa_path_rec *rec) { return ((rec->rec_type == SA_PATH_REC_TYPE_ROCE_V1) || (rec->rec_type == SA_PATH_REC_TYPE_ROCE_V2)); } static inline bool sa_path_is_opa(struct sa_path_rec *rec) { return (rec->rec_type == SA_PATH_REC_TYPE_OPA); } static inline void sa_path_set_slid(struct sa_path_rec *rec, u32 slid) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) rec->ib.slid = cpu_to_be16(slid); else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) rec->opa.slid = cpu_to_be32(slid); } static inline void sa_path_set_dlid(struct sa_path_rec *rec, u32 dlid) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) rec->ib.dlid = cpu_to_be16(dlid); else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) rec->opa.dlid = cpu_to_be32(dlid); } static inline void sa_path_set_raw_traffic(struct sa_path_rec *rec, u8 raw_traffic) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) rec->ib.raw_traffic = raw_traffic; else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) rec->opa.raw_traffic = raw_traffic; } static inline __be32 sa_path_get_slid(struct sa_path_rec *rec) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) return htonl(ntohs(rec->ib.slid)); else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) return rec->opa.slid; return 0; } static inline __be32 sa_path_get_dlid(struct sa_path_rec *rec) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) return htonl(ntohs(rec->ib.dlid)); else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) return rec->opa.dlid; return 0; } static inline u8 sa_path_get_raw_traffic(struct sa_path_rec *rec) { if (rec->rec_type == SA_PATH_REC_TYPE_IB) return rec->ib.raw_traffic; else if (rec->rec_type == SA_PATH_REC_TYPE_OPA) return rec->opa.raw_traffic; return 0; } static inline void sa_path_set_dmac(struct sa_path_rec *rec, u8 *dmac) { if (sa_path_is_roce(rec)) memcpy(rec->roce.dmac, dmac, ETH_ALEN); } static inline void sa_path_set_dmac_zero(struct sa_path_rec *rec) { if (sa_path_is_roce(rec)) eth_zero_addr(rec->roce.dmac); } static inline u8 *sa_path_get_dmac(struct sa_path_rec *rec) { if (sa_path_is_roce(rec)) return rec->roce.dmac; return NULL; } #endif /* IB_SA_H */ |
| 3 8 5 5 1 7 1 1 7 82 9 15 9 9 15 16 72 2 70 4 66 13 2 3 23 30 24 2 1 32 10 1 4 68 17 13 49 45 44 49 11 4 13 18 17 26 15 12 12 3 2 2 2 13 4 13 14 34 18 18 14 4 29 8 5 6 14 14 12 7 4 7 3 2 14 9 5 14 8 5 2 10 4 4 10 17 17 16 5 1 2 3 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 | // SPDX-License-Identifier: GPL-2.0-only /* * sysctl.c: General linux system control interface * * Begun 24 March 1995, Stephen Tweedie * Added /proc support, Dec 1995 * Added bdflush entry and intvec min/max checking, 2/23/96, Tom Dyas. * Added hooks for /proc/sys/net (minor, minor patch), 96/4/1, Mike Shaver. * Added kernel/java-{interpreter,appletviewer}, 96/5/10, Mike Shaver. * Dynamic registration fixes, Stephen Tweedie. * Added kswapd-interval, ctrl-alt-del, printk stuff, 1/8/97, Chris Horn. * Made sysctl support optional via CONFIG_SYSCTL, 1/10/97, Chris * Horn. * Added proc_doulongvec_ms_jiffies_minmax, 09/08/99, Carlos H. Bauer. * Added proc_doulongvec_minmax, 09/08/99, Carlos H. Bauer. * Changed linked lists to use list.h instead of lists.h, 02/24/00, Bill * Wendling. * The list_for_each() macro wasn't appropriate for the sysctl loop. * Removed it and replaced it with older style, 03/23/00, Bill Wendling */ #include <linux/module.h> #include <linux/mm.h> #include <linux/swap.h> #include <linux/slab.h> #include <linux/sysctl.h> #include <linux/bitmap.h> #include <linux/signal.h> #include <linux/panic.h> #include <linux/printk.h> #include <linux/proc_fs.h> #include <linux/security.h> #include <linux/ctype.h> #include <linux/kmemleak.h> #include <linux/filter.h> #include <linux/fs.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/kobject.h> #include <linux/net.h> #include <linux/sysrq.h> #include <linux/highuid.h> #include <linux/writeback.h> #include <linux/ratelimit.h> #include <linux/hugetlb.h> #include <linux/initrd.h> #include <linux/key.h> #include <linux/times.h> #include <linux/limits.h> #include <linux/dcache.h> #include <linux/syscalls.h> #include <linux/vmstat.h> #include <linux/nfs_fs.h> #include <linux/acpi.h> #include <linux/reboot.h> #include <linux/ftrace.h> #include <linux/perf_event.h> #include <linux/oom.h> #include <linux/kmod.h> #include <linux/capability.h> #include <linux/binfmts.h> #include <linux/sched/sysctl.h> #include <linux/mount.h> #include <linux/userfaultfd_k.h> #include <linux/pid.h> #include "../lib/kstrtox.h" #include <linux/uaccess.h> #include <asm/processor.h> #ifdef CONFIG_X86 #include <asm/nmi.h> #include <asm/stacktrace.h> #include <asm/io.h> #endif #ifdef CONFIG_SPARC #include <asm/setup.h> #endif #ifdef CONFIG_RT_MUTEXES #include <linux/rtmutex.h> #endif /* shared constants to be used in various sysctls */ const int sysctl_vals[] = { 0, 1, 2, 3, 4, 100, 200, 1000, 3000, INT_MAX, 65535, -1 }; EXPORT_SYMBOL(sysctl_vals); const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX }; EXPORT_SYMBOL_GPL(sysctl_long_vals); #if defined(CONFIG_SYSCTL) /* Constants used for minimum and maximum */ #ifdef CONFIG_PERF_EVENTS static const int six_hundred_forty_kb = 640 * 1024; #endif static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; #ifdef CONFIG_PROC_SYSCTL /** * enum sysctl_writes_mode - supported sysctl write modes * * @SYSCTL_WRITES_LEGACY: each write syscall must fully contain the sysctl value * to be written, and multiple writes on the same sysctl file descriptor * will rewrite the sysctl value, regardless of file position. No warning * is issued when the initial position is not 0. * @SYSCTL_WRITES_WARN: same as above but warn when the initial file position is * not 0. * @SYSCTL_WRITES_STRICT: writes to numeric sysctl entries must always be at * file position 0 and the value must be fully contained in the buffer * sent to the write syscall. If dealing with strings respect the file * position, but restrict this to the max length of the buffer, anything * passed the max length will be ignored. Multiple writes will append * to the buffer. * * These write modes control how current file position affects the behavior of * updating sysctl values through the proc interface on each write. */ enum sysctl_writes_mode { SYSCTL_WRITES_LEGACY = -1, SYSCTL_WRITES_WARN = 0, SYSCTL_WRITES_STRICT = 1, }; static enum sysctl_writes_mode sysctl_writes_strict = SYSCTL_WRITES_STRICT; #endif /* CONFIG_PROC_SYSCTL */ #if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \ defined(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) int sysctl_legacy_va_layout; #endif #endif /* CONFIG_SYSCTL */ /* * /proc/sys support */ #ifdef CONFIG_PROC_SYSCTL static int _proc_do_string(char *data, int maxlen, int write, char *buffer, size_t *lenp, loff_t *ppos) { size_t len; char c, *p; if (!data || !maxlen || !*lenp) { *lenp = 0; return 0; } if (write) { if (sysctl_writes_strict == SYSCTL_WRITES_STRICT) { /* Only continue writes not past the end of buffer. */ len = strlen(data); if (len > maxlen - 1) len = maxlen - 1; if (*ppos > len) return 0; len = *ppos; } else { /* Start writing from beginning of buffer. */ len = 0; } *ppos += *lenp; p = buffer; while ((p - buffer) < *lenp && len < maxlen - 1) { c = *(p++); if (c == 0 || c == '\n') break; data[len++] = c; } data[len] = 0; } else { len = strlen(data); if (len > maxlen) len = maxlen; if (*ppos > len) { *lenp = 0; return 0; } data += *ppos; len -= *ppos; if (len > *lenp) len = *lenp; if (len) memcpy(buffer, data, len); if (len < *lenp) { buffer[len] = '\n'; len++; } *lenp = len; *ppos += len; } return 0; } static void warn_sysctl_write(const struct ctl_table *table) { pr_warn_once("%s wrote to %s when file position was not 0!\n" "This will not be supported in the future. To silence this\n" "warning, set kernel.sysctl_writes_strict = -1\n", current->comm, table->procname); } /** * proc_first_pos_non_zero_ignore - check if first position is allowed * @ppos: file position * @table: the sysctl table * * Returns true if the first position is non-zero and the sysctl_writes_strict * mode indicates this is not allowed for numeric input types. String proc * handlers can ignore the return value. */ static bool proc_first_pos_non_zero_ignore(loff_t *ppos, const struct ctl_table *table) { if (!*ppos) return false; switch (sysctl_writes_strict) { case SYSCTL_WRITES_STRICT: return true; case SYSCTL_WRITES_WARN: warn_sysctl_write(table); return false; default: return false; } } /** * proc_dostring - read a string sysctl * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes a string from/to the user buffer. If the kernel * buffer provided is not large enough to hold the string, the * string is truncated. The copied string is %NULL-terminated. * If the string is being read by the user process, it is copied * and a newline '\n' is added. It is truncated if the buffer is * not large enough. * * Returns 0 on success. */ int proc_dostring(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { if (write) proc_first_pos_non_zero_ignore(ppos, table); return _proc_do_string(table->data, table->maxlen, write, buffer, lenp, ppos); } static void proc_skip_spaces(char **buf, size_t *size) { while (*size) { if (!isspace(**buf)) break; (*size)--; (*buf)++; } } static void proc_skip_char(char **buf, size_t *size, const char v) { while (*size) { if (**buf != v) break; (*size)--; (*buf)++; } } /** * strtoul_lenient - parse an ASCII formatted integer from a buffer and only * fail on overflow * * @cp: kernel buffer containing the string to parse * @endp: pointer to store the trailing characters * @base: the base to use * @res: where the parsed integer will be stored * * In case of success 0 is returned and @res will contain the parsed integer, * @endp will hold any trailing characters. * This function will fail the parse on overflow. If there wasn't an overflow * the function will defer the decision what characters count as invalid to the * caller. */ static int strtoul_lenient(const char *cp, char **endp, unsigned int base, unsigned long *res) { unsigned long long result; unsigned int rv; cp = _parse_integer_fixup_radix(cp, &base); rv = _parse_integer(cp, base, &result); if ((rv & KSTRTOX_OVERFLOW) || (result != (unsigned long)result)) return -ERANGE; cp += rv; if (endp) *endp = (char *)cp; *res = (unsigned long)result; return 0; } #define TMPBUFLEN 22 /** * proc_get_long - reads an ASCII formatted integer from a user buffer * * @buf: a kernel buffer * @size: size of the kernel buffer * @val: this is where the number will be stored * @neg: set to %TRUE if number is negative * @perm_tr: a vector which contains the allowed trailers * @perm_tr_len: size of the perm_tr vector * @tr: pointer to store the trailer character * * In case of success %0 is returned and @buf and @size are updated with * the amount of bytes read. If @tr is non-NULL and a trailing * character exists (size is non-zero after returning from this * function), @tr is updated with the trailing character. */ static int proc_get_long(char **buf, size_t *size, unsigned long *val, bool *neg, const char *perm_tr, unsigned perm_tr_len, char *tr) { char *p, tmp[TMPBUFLEN]; ssize_t len = *size; if (len <= 0) return -EINVAL; if (len > TMPBUFLEN - 1) len = TMPBUFLEN - 1; memcpy(tmp, *buf, len); tmp[len] = 0; p = tmp; if (*p == '-' && *size > 1) { *neg = true; p++; } else *neg = false; if (!isdigit(*p)) return -EINVAL; if (strtoul_lenient(p, &p, 0, val)) return -EINVAL; len = p - tmp; /* We don't know if the next char is whitespace thus we may accept * invalid integers (e.g. 1234...a) or two integers instead of one * (e.g. 123...1). So lets not allow such large numbers. */ if (len == TMPBUFLEN - 1) return -EINVAL; if (len < *size && perm_tr_len && !memchr(perm_tr, *p, perm_tr_len)) return -EINVAL; if (tr && (len < *size)) *tr = *p; *buf += len; *size -= len; return 0; } /** * proc_put_long - converts an integer to a decimal ASCII formatted string * * @buf: the user buffer * @size: the size of the user buffer * @val: the integer to be converted * @neg: sign of the number, %TRUE for negative * * In case of success @buf and @size are updated with the amount of bytes * written. */ static void proc_put_long(void **buf, size_t *size, unsigned long val, bool neg) { int len; char tmp[TMPBUFLEN], *p = tmp; sprintf(p, "%s%lu", neg ? "-" : "", val); len = strlen(tmp); if (len > *size) len = *size; memcpy(*buf, tmp, len); *size -= len; *buf += len; } #undef TMPBUFLEN static void proc_put_char(void **buf, size_t *size, char c) { if (*size) { char **buffer = (char **)buf; **buffer = c; (*size)--; (*buffer)++; *buf = *buffer; } } static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { if (write) { if (*negp) { if (*lvalp > (unsigned long) INT_MAX + 1) return -EINVAL; WRITE_ONCE(*valp, -*lvalp); } else { if (*lvalp > (unsigned long) INT_MAX) return -EINVAL; WRITE_ONCE(*valp, *lvalp); } } else { int val = READ_ONCE(*valp); if (val < 0) { *negp = true; *lvalp = -(unsigned long)val; } else { *negp = false; *lvalp = (unsigned long)val; } } return 0; } static int do_proc_douintvec_conv(unsigned long *lvalp, unsigned int *valp, int write, void *data) { if (write) { if (*lvalp > UINT_MAX) return -EINVAL; WRITE_ONCE(*valp, *lvalp); } else { unsigned int val = READ_ONCE(*valp); *lvalp = (unsigned long)val; } return 0; } static const char proc_wspace_sep[] = { ' ', '\t', '\n' }; static int __do_proc_dointvec(void *tbl_data, const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(bool *negp, unsigned long *lvalp, int *valp, int write, void *data), void *data) { int *i, vleft, first = 1, err = 0; size_t left; char *p; if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } i = (int *) tbl_data; vleft = table->maxlen / sizeof(*i); left = *lenp; if (!conv) conv = do_proc_dointvec_conv; if (write) { if (proc_first_pos_non_zero_ignore(ppos, table)) goto out; if (left > PAGE_SIZE - 1) left = PAGE_SIZE - 1; p = buffer; } for (; left && vleft--; i++, first=0) { unsigned long lval; bool neg; if (write) { proc_skip_spaces(&p, &left); if (!left) break; err = proc_get_long(&p, &left, &lval, &neg, proc_wspace_sep, sizeof(proc_wspace_sep), NULL); if (err) break; if (conv(&neg, &lval, i, 1, data)) { err = -EINVAL; break; } } else { if (conv(&neg, &lval, i, 0, data)) { err = -EINVAL; break; } if (!first) proc_put_char(&buffer, &left, '\t'); proc_put_long(&buffer, &left, lval, neg); } } if (!write && !first && left && !err) proc_put_char(&buffer, &left, '\n'); if (write && !err && left) proc_skip_spaces(&p, &left); if (write && first) return err ? : -EINVAL; *lenp -= left; out: *ppos += *lenp; return err; } static int do_proc_dointvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(bool *negp, unsigned long *lvalp, int *valp, int write, void *data), void *data) { return __do_proc_dointvec(table->data, table, write, buffer, lenp, ppos, conv, data); } static int do_proc_douintvec_w(unsigned int *tbl_data, const struct ctl_table *table, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(unsigned long *lvalp, unsigned int *valp, int write, void *data), void *data) { unsigned long lval; int err = 0; size_t left; bool neg; char *p = buffer; left = *lenp; if (proc_first_pos_non_zero_ignore(ppos, table)) goto bail_early; if (left > PAGE_SIZE - 1) left = PAGE_SIZE - 1; proc_skip_spaces(&p, &left); if (!left) { err = -EINVAL; goto out_free; } err = proc_get_long(&p, &left, &lval, &neg, proc_wspace_sep, sizeof(proc_wspace_sep), NULL); if (err || neg) { err = -EINVAL; goto out_free; } if (conv(&lval, tbl_data, 1, data)) { err = -EINVAL; goto out_free; } if (!err && left) proc_skip_spaces(&p, &left); out_free: if (err) return -EINVAL; return 0; /* This is in keeping with old __do_proc_dointvec() */ bail_early: *ppos += *lenp; return err; } static int do_proc_douintvec_r(unsigned int *tbl_data, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(unsigned long *lvalp, unsigned int *valp, int write, void *data), void *data) { unsigned long lval; int err = 0; size_t left; left = *lenp; if (conv(&lval, tbl_data, 0, data)) { err = -EINVAL; goto out; } proc_put_long(&buffer, &left, lval, false); if (!left) goto out; proc_put_char(&buffer, &left, '\n'); out: *lenp -= left; *ppos += *lenp; return err; } static int __do_proc_douintvec(void *tbl_data, const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(unsigned long *lvalp, unsigned int *valp, int write, void *data), void *data) { unsigned int *i, vleft; if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } i = (unsigned int *) tbl_data; vleft = table->maxlen / sizeof(*i); /* * Arrays are not supported, keep this simple. *Do not* add * support for them. */ if (vleft != 1) { *lenp = 0; return -EINVAL; } if (!conv) conv = do_proc_douintvec_conv; if (write) return do_proc_douintvec_w(i, table, buffer, lenp, ppos, conv, data); return do_proc_douintvec_r(i, buffer, lenp, ppos, conv, data); } int do_proc_douintvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, int (*conv)(unsigned long *lvalp, unsigned int *valp, int write, void *data), void *data) { return __do_proc_douintvec(table->data, table, write, buffer, lenp, ppos, conv, data); } /** * proc_dobool - read/write a bool * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes one integer value from/to the user buffer, * treated as an ASCII string. * * table->data must point to a bool variable and table->maxlen must * be sizeof(bool). * * Returns 0 on success. */ int proc_dobool(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table tmp; bool *data = table->data; int res, val; /* Do not support arrays yet. */ if (table->maxlen != sizeof(bool)) return -EINVAL; tmp = *table; tmp.maxlen = sizeof(val); tmp.data = &val; val = READ_ONCE(*data); res = proc_dointvec(&tmp, write, buffer, lenp, ppos); if (res) return res; if (write) WRITE_ONCE(*data, val); return 0; } /** * proc_dointvec - read a vector of integers * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer * values from/to the user buffer, treated as an ASCII string. * * Returns 0 on success. */ int proc_dointvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); } /** * proc_douintvec - read a vector of unsigned integers * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer * values from/to the user buffer, treated as an ASCII string. * * Returns 0 on success. */ int proc_douintvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_douintvec(table, write, buffer, lenp, ppos, do_proc_douintvec_conv, NULL); } /* * Taint values can only be increased * This means we can safely use a temporary. */ static int proc_taint(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table t; unsigned long tmptaint = get_taint(); int err; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; t = *table; t.data = &tmptaint; err = proc_doulongvec_minmax(&t, write, buffer, lenp, ppos); if (err < 0) return err; if (write) { int i; /* * If we are relying on panic_on_taint not producing * false positives due to userspace input, bail out * before setting the requested taint flags. */ if (panic_on_taint_nousertaint && (tmptaint & panic_on_taint)) return -EINVAL; /* * Poor man's atomic or. Not worth adding a primitive * to everyone's atomic.h for this */ for (i = 0; i < TAINT_FLAGS_COUNT; i++) if ((1UL << i) & tmptaint) add_taint(i, LOCKDEP_STILL_OK); } return err; } /** * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * * The do_proc_dointvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl * parameters that use the proc_dointvec_minmax() handler. */ struct do_proc_dointvec_minmax_conv_param { int *min; int *max; }; static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { int tmp, ret; struct do_proc_dointvec_minmax_conv_param *param = data; /* * If writing, first do so via a temporary local int so we can * bounds-check it before touching *valp. */ int *ip = write ? &tmp : valp; ret = do_proc_dointvec_conv(negp, lvalp, ip, write, data); if (ret) return ret; if (write) { if ((param->min && *param->min > tmp) || (param->max && *param->max < tmp)) return -EINVAL; WRITE_ONCE(*valp, tmp); } return 0; } /** * proc_dointvec_minmax - read a vector of integers with min/max values * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer * values from/to the user buffer, treated as an ASCII string. * * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). * * Returns 0 on success or -EINVAL on write when the range check fails. */ int proc_dointvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct do_proc_dointvec_minmax_conv_param param = { .min = (int *) table->extra1, .max = (int *) table->extra2, }; return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_minmax_conv, ¶m); } /** * struct do_proc_douintvec_minmax_conv_param - proc_douintvec_minmax() range checking structure * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * * The do_proc_douintvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl * parameters that use the proc_douintvec_minmax() handler. */ struct do_proc_douintvec_minmax_conv_param { unsigned int *min; unsigned int *max; }; static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, unsigned int *valp, int write, void *data) { int ret; unsigned int tmp; struct do_proc_douintvec_minmax_conv_param *param = data; /* write via temporary local uint for bounds-checking */ unsigned int *up = write ? &tmp : valp; ret = do_proc_douintvec_conv(lvalp, up, write, data); if (ret) return ret; if (write) { if ((param->min && *param->min > tmp) || (param->max && *param->max < tmp)) return -ERANGE; WRITE_ONCE(*valp, tmp); } return 0; } /** * proc_douintvec_minmax - read a vector of unsigned ints with min/max values * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer * values from/to the user buffer, treated as an ASCII string. Negative * strings are not allowed. * * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). There is a final sanity * check for UINT_MAX to avoid having to support wrap around uses from * userspace. * * Returns 0 on success or -ERANGE on write when the range check fails. */ int proc_douintvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct do_proc_douintvec_minmax_conv_param param = { .min = (unsigned int *) table->extra1, .max = (unsigned int *) table->extra2, }; return do_proc_douintvec(table, write, buffer, lenp, ppos, do_proc_douintvec_minmax_conv, ¶m); } /** * proc_dou8vec_minmax - read a vector of unsigned chars with min/max values * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(u8) unsigned chars * values from/to the user buffer, treated as an ASCII string. Negative * strings are not allowed. * * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). * * Returns 0 on success or an error on write when the range check fails. */ int proc_dou8vec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table tmp; unsigned int min = 0, max = 255U, val; u8 *data = table->data; struct do_proc_douintvec_minmax_conv_param param = { .min = &min, .max = &max, }; int res; /* Do not support arrays yet. */ if (table->maxlen != sizeof(u8)) return -EINVAL; if (table->extra1) min = *(unsigned int *) table->extra1; if (table->extra2) max = *(unsigned int *) table->extra2; tmp = *table; tmp.maxlen = sizeof(val); tmp.data = &val; val = READ_ONCE(*data); res = do_proc_douintvec(&tmp, write, buffer, lenp, ppos, do_proc_douintvec_minmax_conv, ¶m); if (res) return res; if (write) WRITE_ONCE(*data, val); return 0; } EXPORT_SYMBOL_GPL(proc_dou8vec_minmax); #ifdef CONFIG_MAGIC_SYSRQ static int sysrq_sysctl_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int tmp, ret; tmp = sysrq_mask(); ret = __do_proc_dointvec(&tmp, table, write, buffer, lenp, ppos, NULL, NULL); if (ret || !write) return ret; if (write) sysrq_toggle_support(tmp); return 0; } #endif static int __do_proc_doulongvec_minmax(void *data, const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, unsigned long convmul, unsigned long convdiv) { unsigned long *i, *min, *max; int vleft, first = 1, err = 0; size_t left; char *p; if (!data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } i = data; min = table->extra1; max = table->extra2; vleft = table->maxlen / sizeof(unsigned long); left = *lenp; if (write) { if (proc_first_pos_non_zero_ignore(ppos, table)) goto out; if (left > PAGE_SIZE - 1) left = PAGE_SIZE - 1; p = buffer; } for (; left && vleft--; i++, first = 0) { unsigned long val; if (write) { bool neg; proc_skip_spaces(&p, &left); if (!left) break; err = proc_get_long(&p, &left, &val, &neg, proc_wspace_sep, sizeof(proc_wspace_sep), NULL); if (err || neg) { err = -EINVAL; break; } val = convmul * val / convdiv; if ((min && val < *min) || (max && val > *max)) { err = -EINVAL; break; } WRITE_ONCE(*i, val); } else { val = convdiv * READ_ONCE(*i) / convmul; if (!first) proc_put_char(&buffer, &left, '\t'); proc_put_long(&buffer, &left, val, false); } } if (!write && !first && left && !err) proc_put_char(&buffer, &left, '\n'); if (write && !err) proc_skip_spaces(&p, &left); if (write && first) return err ? : -EINVAL; *lenp -= left; out: *ppos += *lenp; return err; } static int do_proc_doulongvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos, unsigned long convmul, unsigned long convdiv) { return __do_proc_doulongvec_minmax(table->data, table, write, buffer, lenp, ppos, convmul, convdiv); } /** * proc_doulongvec_minmax - read a vector of long integers with min/max values * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long * values from/to the user buffer, treated as an ASCII string. * * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). * * Returns 0 on success. */ int proc_doulongvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_doulongvec_minmax(table, write, buffer, lenp, ppos, 1l, 1l); } /** * proc_doulongvec_ms_jiffies_minmax - read a vector of millisecond values with min/max values * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long * values from/to the user buffer, treated as an ASCII string. The values * are treated as milliseconds, and converted to jiffies when they are stored. * * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). * * Returns 0 on success. */ int proc_doulongvec_ms_jiffies_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_doulongvec_minmax(table, write, buffer, lenp, ppos, HZ, 1000l); } static int do_proc_dointvec_jiffies_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { if (write) { if (*lvalp > INT_MAX / HZ) return 1; if (*negp) WRITE_ONCE(*valp, -*lvalp * HZ); else WRITE_ONCE(*valp, *lvalp * HZ); } else { int val = READ_ONCE(*valp); unsigned long lval; if (val < 0) { *negp = true; lval = -(unsigned long)val; } else { *negp = false; lval = (unsigned long)val; } *lvalp = lval / HZ; } return 0; } static int do_proc_dointvec_userhz_jiffies_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { if (write) { if (USER_HZ < HZ && *lvalp > (LONG_MAX / HZ) * USER_HZ) return 1; *valp = clock_t_to_jiffies(*negp ? -*lvalp : *lvalp); } else { int val = *valp; unsigned long lval; if (val < 0) { *negp = true; lval = -(unsigned long)val; } else { *negp = false; lval = (unsigned long)val; } *lvalp = jiffies_to_clock_t(lval); } return 0; } static int do_proc_dointvec_ms_jiffies_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { if (write) { unsigned long jif = msecs_to_jiffies(*negp ? -*lvalp : *lvalp); if (jif > INT_MAX) return 1; WRITE_ONCE(*valp, (int)jif); } else { int val = READ_ONCE(*valp); unsigned long lval; if (val < 0) { *negp = true; lval = -(unsigned long)val; } else { *negp = false; lval = (unsigned long)val; } *lvalp = jiffies_to_msecs(lval); } return 0; } static int do_proc_dointvec_ms_jiffies_minmax_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) { int tmp, ret; struct do_proc_dointvec_minmax_conv_param *param = data; /* * If writing, first do so via a temporary local int so we can * bounds-check it before touching *valp. */ int *ip = write ? &tmp : valp; ret = do_proc_dointvec_ms_jiffies_conv(negp, lvalp, ip, write, data); if (ret) return ret; if (write) { if ((param->min && *param->min > tmp) || (param->max && *param->max < tmp)) return -EINVAL; *valp = tmp; } return 0; } /** * proc_dointvec_jiffies - read a vector of integers as seconds * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer * values from/to the user buffer, treated as an ASCII string. * The values read are assumed to be in seconds, and are converted into * jiffies. * * Returns 0 on success. */ int proc_dointvec_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_dointvec(table,write,buffer,lenp,ppos, do_proc_dointvec_jiffies_conv,NULL); } int proc_dointvec_ms_jiffies_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct do_proc_dointvec_minmax_conv_param param = { .min = (int *) table->extra1, .max = (int *) table->extra2, }; return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_ms_jiffies_minmax_conv, ¶m); } /** * proc_dointvec_userhz_jiffies - read a vector of integers as 1/USER_HZ seconds * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: pointer to the file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer * values from/to the user buffer, treated as an ASCII string. * The values read are assumed to be in 1/USER_HZ seconds, and * are converted into jiffies. * * Returns 0 on success. */ int proc_dointvec_userhz_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_userhz_jiffies_conv, NULL); } /** * proc_dointvec_ms_jiffies - read a vector of integers as 1 milliseconds * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * @ppos: the current position in the file * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer * values from/to the user buffer, treated as an ASCII string. * The values read are assumed to be in 1/1000 seconds, and * are converted into jiffies. * * Returns 0 on success. */ int proc_dointvec_ms_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_ms_jiffies_conv, NULL); } static int proc_do_cad_pid(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct pid *new_pid; pid_t tmp; int r; tmp = pid_vnr(cad_pid); r = __do_proc_dointvec(&tmp, table, write, buffer, lenp, ppos, NULL, NULL); if (r || !write) return r; new_pid = find_get_pid(tmp); if (!new_pid) return -ESRCH; put_pid(xchg(&cad_pid, new_pid)); return 0; } /** * proc_do_large_bitmap - read/write from/to a large bitmap * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer * @lenp: the size of the user buffer * @ppos: file position * * The bitmap is stored at table->data and the bitmap length (in bits) * in table->maxlen. * * We use a range comma separated format (e.g. 1,3-4,10-10) so that * large bitmaps may be represented in a compact manner. Writing into * the file will clear the bitmap then update it with the given input. * * Returns 0 on success. */ int proc_do_large_bitmap(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int err = 0; size_t left = *lenp; unsigned long bitmap_len = table->maxlen; unsigned long *bitmap = *(unsigned long **) table->data; unsigned long *tmp_bitmap = NULL; char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c; if (!bitmap || !bitmap_len || !left || (*ppos && !write)) { *lenp = 0; return 0; } if (write) { char *p = buffer; size_t skipped = 0; if (left > PAGE_SIZE - 1) { left = PAGE_SIZE - 1; /* How much of the buffer we'll skip this pass */ skipped = *lenp - left; } tmp_bitmap = bitmap_zalloc(bitmap_len, GFP_KERNEL); if (!tmp_bitmap) return -ENOMEM; proc_skip_char(&p, &left, '\n'); while (!err && left) { unsigned long val_a, val_b; bool neg; size_t saved_left; /* In case we stop parsing mid-number, we can reset */ saved_left = left; err = proc_get_long(&p, &left, &val_a, &neg, tr_a, sizeof(tr_a), &c); /* * If we consumed the entirety of a truncated buffer or * only one char is left (may be a "-"), then stop here, * reset, & come back for more. */ if ((left <= 1) && skipped) { left = saved_left; break; } if (err) break; if (val_a >= bitmap_len || neg) { err = -EINVAL; break; } val_b = val_a; if (left) { p++; left--; } if (c == '-') { err = proc_get_long(&p, &left, &val_b, &neg, tr_b, sizeof(tr_b), &c); /* * If we consumed all of a truncated buffer or * then stop here, reset, & come back for more. */ if (!left && skipped) { left = saved_left; break; } if (err) break; if (val_b >= bitmap_len || neg || val_a > val_b) { err = -EINVAL; break; } if (left) { p++; left--; } } bitmap_set(tmp_bitmap, val_a, val_b - val_a + 1); proc_skip_char(&p, &left, '\n'); } left += skipped; } else { unsigned long bit_a, bit_b = 0; bool first = 1; while (left) { bit_a = find_next_bit(bitmap, bitmap_len, bit_b); if (bit_a >= bitmap_len) break; bit_b = find_next_zero_bit(bitmap, bitmap_len, bit_a + 1) - 1; if (!first) proc_put_char(&buffer, &left, ','); proc_put_long(&buffer, &left, bit_a, false); if (bit_a != bit_b) { proc_put_char(&buffer, &left, '-'); proc_put_long(&buffer, &left, bit_b, false); } first = 0; bit_b++; } proc_put_char(&buffer, &left, '\n'); } if (!err) { if (write) { if (*ppos) bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len); else bitmap_copy(bitmap, tmp_bitmap, bitmap_len); } *lenp -= left; *ppos += *lenp; } bitmap_free(tmp_bitmap); return err; } #else /* CONFIG_PROC_SYSCTL */ int proc_dostring(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dobool(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_douintvec(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_douintvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dou8vec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec_ms_jiffies_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec_userhz_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_dointvec_ms_jiffies(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_doulongvec_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_doulongvec_ms_jiffies_minmax(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } int proc_do_large_bitmap(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { return -ENOSYS; } #endif /* CONFIG_PROC_SYSCTL */ #if defined(CONFIG_SYSCTL) int proc_do_static_key(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct static_key *key = (struct static_key *)table->data; static DEFINE_MUTEX(static_key_mutex); int val, ret; struct ctl_table tmp = { .data = &val, .maxlen = sizeof(val), .mode = table->mode, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; mutex_lock(&static_key_mutex); val = static_key_enabled(key); ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && !ret) { if (val) static_key_enable(key); else static_key_disable(key); } mutex_unlock(&static_key_mutex); return ret; } static struct ctl_table kern_table[] = { { .procname = "panic", .data = &panic_timeout, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #ifdef CONFIG_PROC_SYSCTL { .procname = "tainted", .maxlen = sizeof(long), .mode = 0644, .proc_handler = proc_taint, }, { .procname = "sysctl_writes_strict", .data = &sysctl_writes_strict, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_NEG_ONE, .extra2 = SYSCTL_ONE, }, #endif { .procname = "print-fatal-signals", .data = &print_fatal_signals, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #ifdef CONFIG_SPARC { .procname = "reboot-cmd", .data = reboot_command, .maxlen = 256, .mode = 0644, .proc_handler = proc_dostring, }, { .procname = "stop-a", .data = &stop_a_enabled, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "scons-poweroff", .data = &scons_pwroff, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_SPARC64 { .procname = "tsb-ratio", .data = &sysctl_tsb_ratio, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_PARISC { .procname = "soft-power", .data = &pwrsw_enabled, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW { .procname = "unaligned-trap", .data = &unaligned_enabled, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_STACK_TRACER { .procname = "stack_tracer_enabled", .data = &stack_tracer_enabled, .maxlen = sizeof(int), .mode = 0644, .proc_handler = stack_trace_sysctl, }, #endif #ifdef CONFIG_TRACING { .procname = "ftrace_dump_on_oops", .data = &ftrace_dump_on_oops, .maxlen = MAX_TRACER_SIZE, .mode = 0644, .proc_handler = proc_dostring, }, { .procname = "traceoff_on_warning", .data = &__disable_trace_on_warning, .maxlen = sizeof(__disable_trace_on_warning), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "tracepoint_printk", .data = &tracepoint_printk, .maxlen = sizeof(tracepoint_printk), .mode = 0644, .proc_handler = tracepoint_printk_sysctl, }, #endif #ifdef CONFIG_MODULES { .procname = "modprobe", .data = &modprobe_path, .maxlen = KMOD_PATH_LEN, .mode = 0644, .proc_handler = proc_dostring, }, { .procname = "modules_disabled", .data = &modules_disabled, .maxlen = sizeof(int), .mode = 0644, /* only handle a transition from default "0" to "1" */ .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_ONE, }, #endif #ifdef CONFIG_UEVENT_HELPER { .procname = "hotplug", .data = &uevent_helper, .maxlen = UEVENT_HELPER_PATH_LEN, .mode = 0644, .proc_handler = proc_dostring, }, #endif #ifdef CONFIG_MAGIC_SYSRQ { .procname = "sysrq", .data = NULL, .maxlen = sizeof (int), .mode = 0644, .proc_handler = sysrq_sysctl_handler, }, #endif #ifdef CONFIG_PROC_SYSCTL { .procname = "cad_pid", .data = NULL, .maxlen = sizeof (int), .mode = 0600, .proc_handler = proc_do_cad_pid, }, #endif { .procname = "threads-max", .data = NULL, .maxlen = sizeof(int), .mode = 0644, .proc_handler = sysctl_max_threads, }, { .procname = "overflowuid", .data = &overflowuid, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_MAXOLDUID, }, { .procname = "overflowgid", .data = &overflowgid, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_MAXOLDUID, }, #ifdef CONFIG_S390 { .procname = "userprocess_debug", .data = &show_unhandled_signals, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif { .procname = "pid_max", .data = &pid_max, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &pid_max_min, .extra2 = &pid_max_max, }, { .procname = "panic_on_oops", .data = &panic_on_oops, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "panic_print", .data = &panic_print, .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "ngroups_max", .data = (void *)&ngroups_max, .maxlen = sizeof (int), .mode = 0444, .proc_handler = proc_dointvec, }, { .procname = "cap_last_cap", .data = (void *)&cap_last_cap, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_dointvec, }, #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) { .procname = "unknown_nmi_panic", .data = &unknown_nmi_panic, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #if (defined(CONFIG_X86_32) || defined(CONFIG_PARISC)) && \ defined(CONFIG_DEBUG_STACKOVERFLOW) { .procname = "panic_on_stackoverflow", .data = &sysctl_panic_on_stackoverflow, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #if defined(CONFIG_X86) { .procname = "panic_on_unrecovered_nmi", .data = &panic_on_unrecovered_nmi, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "panic_on_io_nmi", .data = &panic_on_io_nmi, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "bootloader_type", .data = &bootloader_type, .maxlen = sizeof (int), .mode = 0444, .proc_handler = proc_dointvec, }, { .procname = "bootloader_version", .data = &bootloader_version, .maxlen = sizeof (int), .mode = 0444, .proc_handler = proc_dointvec, }, { .procname = "io_delay_type", .data = &io_delay_type, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #if defined(CONFIG_MMU) { .procname = "randomize_va_space", .data = &randomize_va_space, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #if defined(CONFIG_S390) && defined(CONFIG_SMP) { .procname = "spin_retry", .data = &spin_retry, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #if defined(CONFIG_ACPI_SLEEP) && defined(CONFIG_X86) { .procname = "acpi_video_flags", .data = &acpi_realmode_flags, .maxlen = sizeof (unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, #endif #ifdef CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN { .procname = "ignore-unaligned-usertrap", .data = &no_unaligned_warning, .maxlen = sizeof (int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_RT_MUTEXES { .procname = "max_lock_depth", .data = &max_lock_depth, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_PERF_EVENTS /* * User-space scripts rely on the existence of this file * as a feature check for perf_events being enabled. * * So it's an ABI, do not remove! */ { .procname = "perf_event_paranoid", .data = &sysctl_perf_event_paranoid, .maxlen = sizeof(sysctl_perf_event_paranoid), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "perf_event_mlock_kb", .data = &sysctl_perf_event_mlock, .maxlen = sizeof(sysctl_perf_event_mlock), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "perf_event_max_sample_rate", .data = &sysctl_perf_event_sample_rate, .maxlen = sizeof(sysctl_perf_event_sample_rate), .mode = 0644, .proc_handler = perf_event_max_sample_rate_handler, .extra1 = SYSCTL_ONE, }, { .procname = "perf_cpu_time_max_percent", .data = &sysctl_perf_cpu_time_max_percent, .maxlen = sizeof(sysctl_perf_cpu_time_max_percent), .mode = 0644, .proc_handler = perf_cpu_time_max_percent_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "perf_event_max_stack", .data = &sysctl_perf_event_max_stack, .maxlen = sizeof(sysctl_perf_event_max_stack), .mode = 0644, .proc_handler = perf_event_max_stack_handler, .extra1 = SYSCTL_ZERO, .extra2 = (void *)&six_hundred_forty_kb, }, { .procname = "perf_event_max_contexts_per_stack", .data = &sysctl_perf_event_max_contexts_per_stack, .maxlen = sizeof(sysctl_perf_event_max_contexts_per_stack), .mode = 0644, .proc_handler = perf_event_max_stack_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE_THOUSAND, }, #endif { .procname = "panic_on_warn", .data = &panic_on_warn, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, #ifdef CONFIG_TREE_RCU { .procname = "panic_on_rcu_stall", .data = &sysctl_panic_on_rcu_stall, .maxlen = sizeof(sysctl_panic_on_rcu_stall), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "max_rcu_stall_to_panic", .data = &sysctl_max_rcu_stall_to_panic, .maxlen = sizeof(sysctl_max_rcu_stall_to_panic), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_INT_MAX, }, #endif }; static struct ctl_table vm_table[] = { { .procname = "overcommit_memory", .data = &sysctl_overcommit_memory, .maxlen = sizeof(sysctl_overcommit_memory), .mode = 0644, .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, { .procname = "overcommit_ratio", .data = &sysctl_overcommit_ratio, .maxlen = sizeof(sysctl_overcommit_ratio), .mode = 0644, .proc_handler = overcommit_ratio_handler, }, { .procname = "overcommit_kbytes", .data = &sysctl_overcommit_kbytes, .maxlen = sizeof(sysctl_overcommit_kbytes), .mode = 0644, .proc_handler = overcommit_kbytes_handler, }, { .procname = "page-cluster", .data = &page_cluster, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = (void *)&page_cluster_max, }, { .procname = "dirtytime_expire_seconds", .data = &dirtytime_expire_interval, .maxlen = sizeof(dirtytime_expire_interval), .mode = 0644, .proc_handler = dirtytime_interval_handler, .extra1 = SYSCTL_ZERO, }, { .procname = "swappiness", .data = &vm_swappiness, .maxlen = sizeof(vm_swappiness), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO_HUNDRED, }, #ifdef CONFIG_NUMA { .procname = "numa_stat", .data = &sysctl_vm_numa_stat, .maxlen = sizeof(int), .mode = 0644, .proc_handler = sysctl_vm_numa_stat_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, #endif { .procname = "drop_caches", .data = &sysctl_drop_caches, .maxlen = sizeof(int), .mode = 0200, .proc_handler = drop_caches_sysctl_handler, .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_FOUR, }, { .procname = "page_lock_unfairness", .data = &sysctl_page_lock_unfairness, .maxlen = sizeof(sysctl_page_lock_unfairness), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #ifdef CONFIG_MMU { .procname = "max_map_count", .data = &sysctl_max_map_count, .maxlen = sizeof(sysctl_max_map_count), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #else { .procname = "nr_trim_pages", .data = &sysctl_nr_trim_pages, .maxlen = sizeof(sysctl_nr_trim_pages), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #endif { .procname = "vfs_cache_pressure", .data = &sysctl_vfs_cache_pressure, .maxlen = sizeof(sysctl_vfs_cache_pressure), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \ defined(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) { .procname = "legacy_va_layout", .data = &sysctl_legacy_va_layout, .maxlen = sizeof(sysctl_legacy_va_layout), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #endif #ifdef CONFIG_NUMA { .procname = "zone_reclaim_mode", .data = &node_reclaim_mode, .maxlen = sizeof(node_reclaim_mode), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #endif #ifdef CONFIG_SMP { .procname = "stat_interval", .data = &sysctl_stat_interval, .maxlen = sizeof(sysctl_stat_interval), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "stat_refresh", .data = NULL, .maxlen = 0, .mode = 0600, .proc_handler = vmstat_refresh, }, #endif #ifdef CONFIG_MMU { .procname = "mmap_min_addr", .data = &dac_mmap_min_addr, .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = mmap_min_addr_handler, }, #endif #if (defined(CONFIG_X86_32) && !defined(CONFIG_UML))|| \ (defined(CONFIG_SUPERH) && defined(CONFIG_VSYSCALL)) { .procname = "vdso_enabled", #ifdef CONFIG_X86_32 .data = &vdso32_enabled, .maxlen = sizeof(vdso32_enabled), #else .data = &vdso_enabled, .maxlen = sizeof(vdso_enabled), #endif .mode = 0644, .proc_handler = proc_dointvec, .extra1 = SYSCTL_ZERO, }, #endif { .procname = "user_reserve_kbytes", .data = &sysctl_user_reserve_kbytes, .maxlen = sizeof(sysctl_user_reserve_kbytes), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "admin_reserve_kbytes", .data = &sysctl_admin_reserve_kbytes, .maxlen = sizeof(sysctl_admin_reserve_kbytes), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, #ifdef CONFIG_HAVE_ARCH_MMAP_RND_BITS { .procname = "mmap_rnd_bits", .data = &mmap_rnd_bits, .maxlen = sizeof(mmap_rnd_bits), .mode = 0600, .proc_handler = proc_dointvec_minmax, .extra1 = (void *)&mmap_rnd_bits_min, .extra2 = (void *)&mmap_rnd_bits_max, }, #endif #ifdef CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS { .procname = "mmap_rnd_compat_bits", .data = &mmap_rnd_compat_bits, .maxlen = sizeof(mmap_rnd_compat_bits), .mode = 0600, .proc_handler = proc_dointvec_minmax, .extra1 = (void *)&mmap_rnd_compat_bits_min, .extra2 = (void *)&mmap_rnd_compat_bits_max, }, #endif }; int __init sysctl_init_bases(void) { register_sysctl_init("kernel", kern_table); register_sysctl_init("vm", vm_table); return 0; } #endif /* CONFIG_SYSCTL */ /* * No sense putting this after each symbol definition, twice, * exception granted :-) */ EXPORT_SYMBOL(proc_dobool); EXPORT_SYMBOL(proc_dointvec); EXPORT_SYMBOL(proc_douintvec); EXPORT_SYMBOL(proc_dointvec_jiffies); EXPORT_SYMBOL(proc_dointvec_minmax); EXPORT_SYMBOL_GPL(proc_douintvec_minmax); EXPORT_SYMBOL(proc_dointvec_userhz_jiffies); EXPORT_SYMBOL(proc_dointvec_ms_jiffies); EXPORT_SYMBOL(proc_dostring); EXPORT_SYMBOL(proc_doulongvec_minmax); EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax); EXPORT_SYMBOL(proc_do_large_bitmap); |
| 24 24 96 98 98 16 96 9 96 71 96 96 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 | // SPDX-License-Identifier: GPL-2.0 /* * Manage cache of swap slots to be used for and returned from * swap. * * Copyright(c) 2016 Intel Corporation. * * Author: Tim Chen <tim.c.chen@linux.intel.com> * * We allocate the swap slots from the global pool and put * it into local per cpu caches. This has the advantage * of no needing to acquire the swap_info lock every time * we need a new slot. * * There is also opportunity to simply return the slot * to local caches without needing to acquire swap_info * lock. We do not reuse the returned slots directly but * move them back to the global pool in a batch. This * allows the slots to coalesce and reduce fragmentation. * * The swap entry allocated is marked with SWAP_HAS_CACHE * flag in map_count that prevents it from being allocated * again from the global pool. * * The swap slots cache is protected by a mutex instead of * a spin lock as when we search for slots with scan_swap_map, * we can possibly sleep. */ #include <linux/swap_slots.h> #include <linux/cpu.h> #include <linux/cpumask.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/mutex.h> #include <linux/mm.h> static DEFINE_PER_CPU(struct swap_slots_cache, swp_slots); static bool swap_slot_cache_active; bool swap_slot_cache_enabled; static bool swap_slot_cache_initialized; static DEFINE_MUTEX(swap_slots_cache_mutex); /* Serialize swap slots cache enable/disable operations */ static DEFINE_MUTEX(swap_slots_cache_enable_mutex); static void __drain_swap_slots_cache(unsigned int type); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) #define SLOTS_CACHE 0x1 #define SLOTS_CACHE_RET 0x2 static void deactivate_swap_slots_cache(void) { mutex_lock(&swap_slots_cache_mutex); swap_slot_cache_active = false; __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); mutex_unlock(&swap_slots_cache_mutex); } static void reactivate_swap_slots_cache(void) { mutex_lock(&swap_slots_cache_mutex); swap_slot_cache_active = true; mutex_unlock(&swap_slots_cache_mutex); } /* Must not be called with cpu hot plug lock */ void disable_swap_slots_cache_lock(void) { mutex_lock(&swap_slots_cache_enable_mutex); swap_slot_cache_enabled = false; if (swap_slot_cache_initialized) { /* serialize with cpu hotplug operations */ cpus_read_lock(); __drain_swap_slots_cache(SLOTS_CACHE|SLOTS_CACHE_RET); cpus_read_unlock(); } } static void __reenable_swap_slots_cache(void) { swap_slot_cache_enabled = has_usable_swap(); } void reenable_swap_slots_cache_unlock(void) { __reenable_swap_slots_cache(); mutex_unlock(&swap_slots_cache_enable_mutex); } static bool check_cache_active(void) { long pages; if (!swap_slot_cache_enabled) return false; pages = get_nr_swap_pages(); if (!swap_slot_cache_active) { if (pages > num_online_cpus() * THRESHOLD_ACTIVATE_SWAP_SLOTS_CACHE) reactivate_swap_slots_cache(); goto out; } /* if global pool of slot caches too low, deactivate cache */ if (pages < num_online_cpus() * THRESHOLD_DEACTIVATE_SWAP_SLOTS_CACHE) deactivate_swap_slots_cache(); out: return swap_slot_cache_active; } static int alloc_swap_slot_cache(unsigned int cpu) { struct swap_slots_cache *cache; swp_entry_t *slots, *slots_ret; /* * Do allocation outside swap_slots_cache_mutex * as kvzalloc could trigger reclaim and folio_alloc_swap, * which can lock swap_slots_cache_mutex. */ slots = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), GFP_KERNEL); if (!slots) return -ENOMEM; slots_ret = kvcalloc(SWAP_SLOTS_CACHE_SIZE, sizeof(swp_entry_t), GFP_KERNEL); if (!slots_ret) { kvfree(slots); return -ENOMEM; } mutex_lock(&swap_slots_cache_mutex); cache = &per_cpu(swp_slots, cpu); if (cache->slots || cache->slots_ret) { /* cache already allocated */ mutex_unlock(&swap_slots_cache_mutex); kvfree(slots); kvfree(slots_ret); return 0; } if (!cache->lock_initialized) { mutex_init(&cache->alloc_lock); spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } cache->nr = 0; cache->cur = 0; cache->n_ret = 0; /* * We initialized alloc_lock and free_lock earlier. We use * !cache->slots or !cache->slots_ret to know if it is safe to acquire * the corresponding lock and use the cache. Memory barrier below * ensures the assumption. */ mb(); cache->slots = slots; cache->slots_ret = slots_ret; mutex_unlock(&swap_slots_cache_mutex); return 0; } static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, bool free_slots) { struct swap_slots_cache *cache; swp_entry_t *slots = NULL; cache = &per_cpu(swp_slots, cpu); if ((type & SLOTS_CACHE) && cache->slots) { mutex_lock(&cache->alloc_lock); swapcache_free_entries(cache->slots + cache->cur, cache->nr); cache->cur = 0; cache->nr = 0; if (free_slots && cache->slots) { kvfree(cache->slots); cache->slots = NULL; } mutex_unlock(&cache->alloc_lock); } if ((type & SLOTS_CACHE_RET) && cache->slots_ret) { spin_lock_irq(&cache->free_lock); swapcache_free_entries(cache->slots_ret, cache->n_ret); cache->n_ret = 0; if (free_slots && cache->slots_ret) { slots = cache->slots_ret; cache->slots_ret = NULL; } spin_unlock_irq(&cache->free_lock); kvfree(slots); } } static void __drain_swap_slots_cache(unsigned int type) { unsigned int cpu; /* * This function is called during * 1) swapoff, when we have to make sure no * left over slots are in cache when we remove * a swap device; * 2) disabling of swap slot cache, when we run low * on swap slots when allocating memory and need * to return swap slots to global pool. * * We cannot acquire cpu hot plug lock here as * this function can be invoked in the cpu * hot plug path: * cpu_up -> lock cpu_hotplug -> cpu hotplug state callback * -> memory allocation -> direct reclaim -> folio_alloc_swap * -> drain_swap_slots_cache * * Hence the loop over current online cpu below could miss cpu that * is being brought online but not yet marked as online. * That is okay as we do not schedule and run anything on a * cpu before it has been marked online. Hence, we will not * fill any swap slots in slots cache of such cpu. * There are no slots on such cpu that need to be drained. */ for_each_online_cpu(cpu) drain_slots_cache_cpu(cpu, type, false); } static int free_slot_cache(unsigned int cpu) { mutex_lock(&swap_slots_cache_mutex); drain_slots_cache_cpu(cpu, SLOTS_CACHE | SLOTS_CACHE_RET, true); mutex_unlock(&swap_slots_cache_mutex); return 0; } void enable_swap_slots_cache(void) { mutex_lock(&swap_slots_cache_enable_mutex); if (!swap_slot_cache_initialized) { int ret; ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", alloc_swap_slot_cache, free_slot_cache); if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " "without swap slots cache.\n", __func__)) goto out_unlock; swap_slot_cache_initialized = true; } __reenable_swap_slots_cache(); out_unlock: mutex_unlock(&swap_slots_cache_enable_mutex); } /* called with swap slot cache's alloc lock held */ static int refill_swap_slots_cache(struct swap_slots_cache *cache) { if (!use_swap_slot_cache) return 0; cache->cur = 0; if (swap_slot_cache_active) cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots, 0); return cache->nr; } void free_swap_slot(swp_entry_t entry) { struct swap_slots_cache *cache; /* Large folio swap slot is not covered. */ zswap_invalidate(entry); cache = raw_cpu_ptr(&swp_slots); if (likely(use_swap_slot_cache && cache->slots_ret)) { spin_lock_irq(&cache->free_lock); /* Swap slots cache may be deactivated before acquiring lock */ if (!use_swap_slot_cache || !cache->slots_ret) { spin_unlock_irq(&cache->free_lock); goto direct_free; } if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) { /* * Return slots to global pool. * The current swap_map value is SWAP_HAS_CACHE. * Set it to 0 to indicate it is available for * allocation in global pool */ swapcache_free_entries(cache->slots_ret, cache->n_ret); cache->n_ret = 0; } cache->slots_ret[cache->n_ret++] = entry; spin_unlock_irq(&cache->free_lock); } else { direct_free: swapcache_free_entries(&entry, 1); } } swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; struct swap_slots_cache *cache; entry.val = 0; if (folio_test_large(folio)) { if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, folio_order(folio)); goto out; } /* * Preemption is allowed here, because we may sleep * in refill_swap_slots_cache(). But it is safe, because * accesses to the per-CPU data structure are protected by the * mutex cache->alloc_lock. * * The alloc path here does not touch cache->slots_ret * so cache->free_lock is not taken. */ cache = raw_cpu_ptr(&swp_slots); if (likely(check_cache_active() && cache->slots)) { mutex_lock(&cache->alloc_lock); if (cache->slots) { repeat: if (cache->nr) { entry = cache->slots[cache->cur]; cache->slots[cache->cur++].val = 0; cache->nr--; } else if (refill_swap_slots_cache(cache)) { goto repeat; } } mutex_unlock(&cache->alloc_lock); if (entry.val) goto out; } get_swap_pages(1, &entry, 0); out: if (mem_cgroup_try_charge_swap(folio, entry)) { put_swap_folio(folio, entry); entry.val = 0; } return entry; } |
| 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 | // SPDX-License-Identifier: GPL-2.0 #include <linux/ceph/ceph_debug.h> #include <linux/bug.h> #include <linux/err.h> #include <linux/random.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/ceph/messenger.h> #include <linux/ceph/decode.h> #include "mdsmap.h" #include "mds_client.h" #include "super.h" #define CEPH_MDS_IS_READY(i, ignore_laggy) \ (m->m_info[i].state > 0 && ignore_laggy ? true : !m->m_info[i].laggy) static int __mdsmap_get_random_mds(struct ceph_mdsmap *m, bool ignore_laggy) { int n = 0; int i, j; /* count */ for (i = 0; i < m->possible_max_rank; i++) if (CEPH_MDS_IS_READY(i, ignore_laggy)) n++; if (n == 0) return -1; /* pick */ n = get_random_u32_below(n); for (j = 0, i = 0; i < m->possible_max_rank; i++) { if (CEPH_MDS_IS_READY(i, ignore_laggy)) j++; if (j > n) break; } return i; } /* * choose a random mds that is "up" (i.e. has a state > 0), or -1. */ int ceph_mdsmap_get_random_mds(struct ceph_mdsmap *m) { int mds; mds = __mdsmap_get_random_mds(m, false); if (mds == m->possible_max_rank || mds == -1) mds = __mdsmap_get_random_mds(m, true); return mds == m->possible_max_rank ? -1 : mds; } #define __decode_and_drop_type(p, end, type, bad) \ do { \ if (*p + sizeof(type) > end) \ goto bad; \ *p += sizeof(type); \ } while (0) #define __decode_and_drop_set(p, end, type, bad) \ do { \ u32 n; \ size_t need; \ ceph_decode_32_safe(p, end, n, bad); \ need = sizeof(type) * n; \ ceph_decode_need(p, end, need, bad); \ *p += need; \ } while (0) #define __decode_and_drop_map(p, end, ktype, vtype, bad) \ do { \ u32 n; \ size_t need; \ ceph_decode_32_safe(p, end, n, bad); \ need = (sizeof(ktype) + sizeof(vtype)) * n; \ ceph_decode_need(p, end, need, bad); \ *p += need; \ } while (0) static int __decode_and_drop_compat_set(void **p, void* end) { int i; /* compat, ro_compat, incompat*/ for (i = 0; i < 3; i++) { u32 n; ceph_decode_need(p, end, sizeof(u64) + sizeof(u32), bad); /* mask */ *p += sizeof(u64); /* names (map<u64, string>) */ n = ceph_decode_32(p); while (n-- > 0) { u32 len; ceph_decode_need(p, end, sizeof(u64) + sizeof(u32), bad); *p += sizeof(u64); len = ceph_decode_32(p); ceph_decode_need(p, end, len, bad); *p += len; } } return 0; bad: return -1; } /* * Decode an MDS map * * Ignore any fields we don't care about (there are quite a few of * them). */ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p, void *end, bool msgr2) { struct ceph_client *cl = mdsc->fsc->client; struct ceph_mdsmap *m; const void *start = *p; int i, j, n; int err; u8 mdsmap_v; u16 mdsmap_ev; u32 target; m = kzalloc(sizeof(*m), GFP_NOFS); if (!m) return ERR_PTR(-ENOMEM); ceph_decode_need(p, end, 1 + 1, bad); mdsmap_v = ceph_decode_8(p); *p += sizeof(u8); /* mdsmap_cv */ if (mdsmap_v >= 4) { u32 mdsmap_len; ceph_decode_32_safe(p, end, mdsmap_len, bad); if (end < *p + mdsmap_len) goto bad; end = *p + mdsmap_len; } ceph_decode_need(p, end, 8*sizeof(u32) + sizeof(u64), bad); m->m_epoch = ceph_decode_32(p); m->m_client_epoch = ceph_decode_32(p); m->m_last_failure = ceph_decode_32(p); m->m_root = ceph_decode_32(p); m->m_session_timeout = ceph_decode_32(p); m->m_session_autoclose = ceph_decode_32(p); m->m_max_file_size = ceph_decode_64(p); m->m_max_mds = ceph_decode_32(p); /* * pick out the active nodes as the m_num_active_mds, the * m_num_active_mds maybe larger than m_max_mds when decreasing * the max_mds in cluster side, in other case it should less * than or equal to m_max_mds. */ m->m_num_active_mds = n = ceph_decode_32(p); /* * the possible max rank, it maybe larger than the m_num_active_mds, * for example if the mds_max == 2 in the cluster, when the MDS(0) * was laggy and being replaced by a new MDS, we will temporarily * receive a new mds map with n_num_mds == 1 and the active MDS(1), * and the mds rank >= m_num_active_mds. */ m->possible_max_rank = max(m->m_num_active_mds, m->m_max_mds); m->m_info = kcalloc(m->possible_max_rank, sizeof(*m->m_info), GFP_NOFS); if (!m->m_info) goto nomem; /* pick out active nodes from mds_info (state > 0) */ for (i = 0; i < n; i++) { u64 global_id; u32 namelen; s32 mds, inc, state; u8 info_v; void *info_end = NULL; struct ceph_entity_addr addr; u32 num_export_targets; void *pexport_targets = NULL; struct ceph_timespec laggy_since; struct ceph_mds_info *info; bool laggy; ceph_decode_need(p, end, sizeof(u64) + 1, bad); global_id = ceph_decode_64(p); info_v= ceph_decode_8(p); if (info_v >= 4) { u32 info_len; ceph_decode_need(p, end, 1 + sizeof(u32), bad); *p += sizeof(u8); /* info_cv */ info_len = ceph_decode_32(p); info_end = *p + info_len; if (info_end > end) goto bad; } ceph_decode_need(p, end, sizeof(u64) + sizeof(u32), bad); *p += sizeof(u64); namelen = ceph_decode_32(p); /* skip mds name */ *p += namelen; ceph_decode_32_safe(p, end, mds, bad); ceph_decode_32_safe(p, end, inc, bad); ceph_decode_32_safe(p, end, state, bad); *p += sizeof(u64); /* state_seq */ if (info_v >= 8) err = ceph_decode_entity_addrvec(p, end, msgr2, &addr); else err = ceph_decode_entity_addr(p, end, &addr); if (err) goto corrupt; ceph_decode_copy_safe(p, end, &laggy_since, sizeof(laggy_since), bad); laggy = laggy_since.tv_sec != 0 || laggy_since.tv_nsec != 0; *p += sizeof(u32); ceph_decode_32_safe(p, end, namelen, bad); *p += namelen; if (info_v >= 2) { ceph_decode_32_safe(p, end, num_export_targets, bad); pexport_targets = *p; *p += num_export_targets * sizeof(u32); } else { num_export_targets = 0; } if (info_end && *p != info_end) { if (*p > info_end) goto bad; *p = info_end; } doutc(cl, "%d/%d %lld mds%d.%d %s %s%s\n", i+1, n, global_id, mds, inc, ceph_pr_addr(&addr), ceph_mds_state_name(state), laggy ? "(laggy)" : ""); if (mds < 0 || mds >= m->possible_max_rank) { pr_warn_client(cl, "got incorrect mds(%d)\n", mds); continue; } if (state <= 0) { doutc(cl, "got incorrect state(%s)\n", ceph_mds_state_name(state)); continue; } info = &m->m_info[mds]; info->global_id = global_id; info->state = state; info->addr = addr; info->laggy = laggy; info->num_export_targets = num_export_targets; if (num_export_targets) { info->export_targets = kcalloc(num_export_targets, sizeof(u32), GFP_NOFS); if (!info->export_targets) goto nomem; for (j = 0; j < num_export_targets; j++) { target = ceph_decode_32(&pexport_targets); info->export_targets[j] = target; } } else { info->export_targets = NULL; } } /* pg_pools */ ceph_decode_32_safe(p, end, n, bad); m->m_num_data_pg_pools = n; m->m_data_pg_pools = kcalloc(n, sizeof(u64), GFP_NOFS); if (!m->m_data_pg_pools) goto nomem; ceph_decode_need(p, end, sizeof(u64)*(n+1), bad); for (i = 0; i < n; i++) m->m_data_pg_pools[i] = ceph_decode_64(p); m->m_cas_pg_pool = ceph_decode_64(p); m->m_enabled = m->m_epoch > 1; mdsmap_ev = 1; if (mdsmap_v >= 2) { ceph_decode_16_safe(p, end, mdsmap_ev, bad_ext); } if (mdsmap_ev >= 3) { if (__decode_and_drop_compat_set(p, end) < 0) goto bad_ext; } /* metadata_pool */ if (mdsmap_ev < 5) { __decode_and_drop_type(p, end, u32, bad_ext); } else { __decode_and_drop_type(p, end, u64, bad_ext); } /* created + modified + tableserver */ __decode_and_drop_type(p, end, struct ceph_timespec, bad_ext); __decode_and_drop_type(p, end, struct ceph_timespec, bad_ext); __decode_and_drop_type(p, end, u32, bad_ext); /* in */ { int num_laggy = 0; ceph_decode_32_safe(p, end, n, bad_ext); ceph_decode_need(p, end, sizeof(u32) * n, bad_ext); for (i = 0; i < n; i++) { s32 mds = ceph_decode_32(p); if (mds >= 0 && mds < m->possible_max_rank) { if (m->m_info[mds].laggy) num_laggy++; } } m->m_num_laggy = num_laggy; if (n > m->possible_max_rank) { void *new_m_info = krealloc(m->m_info, n * sizeof(*m->m_info), GFP_NOFS | __GFP_ZERO); if (!new_m_info) goto nomem; m->m_info = new_m_info; } m->possible_max_rank = n; } /* inc */ __decode_and_drop_map(p, end, u32, u32, bad_ext); /* up */ __decode_and_drop_map(p, end, u32, u64, bad_ext); /* failed */ __decode_and_drop_set(p, end, u32, bad_ext); /* stopped */ __decode_and_drop_set(p, end, u32, bad_ext); if (mdsmap_ev >= 4) { /* last_failure_osd_epoch */ __decode_and_drop_type(p, end, u32, bad_ext); } if (mdsmap_ev >= 6) { /* ever_allowed_snaps */ __decode_and_drop_type(p, end, u8, bad_ext); /* explicitly_allowed_snaps */ __decode_and_drop_type(p, end, u8, bad_ext); } if (mdsmap_ev >= 7) { /* inline_data_enabled */ __decode_and_drop_type(p, end, u8, bad_ext); } if (mdsmap_ev >= 8) { /* enabled */ ceph_decode_8_safe(p, end, m->m_enabled, bad_ext); /* fs_name */ ceph_decode_skip_string(p, end, bad_ext); } /* damaged */ if (mdsmap_ev >= 9) { size_t need; ceph_decode_32_safe(p, end, n, bad_ext); need = sizeof(u32) * n; ceph_decode_need(p, end, need, bad_ext); *p += need; m->m_damaged = n > 0; } else { m->m_damaged = false; } if (mdsmap_ev >= 17) { /* balancer */ ceph_decode_skip_string(p, end, bad_ext); /* standby_count_wanted */ ceph_decode_skip_32(p, end, bad_ext); /* old_max_mds */ ceph_decode_skip_32(p, end, bad_ext); /* min_compat_client */ ceph_decode_skip_8(p, end, bad_ext); /* required_client_features */ ceph_decode_skip_set(p, end, 64, bad_ext); /* bal_rank_mask */ ceph_decode_skip_string(p, end, bad_ext); } if (mdsmap_ev >= 18) { ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext); } bad_ext: doutc(cl, "m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n", !!m->m_enabled, !!m->m_damaged, m->m_num_laggy); *p = end; doutc(cl, "success epoch %u\n", m->m_epoch); return m; nomem: err = -ENOMEM; goto out_err; corrupt: pr_err_client(cl, "corrupt mdsmap\n"); print_hex_dump(KERN_DEBUG, "mdsmap: ", DUMP_PREFIX_OFFSET, 16, 1, start, end - start, true); out_err: ceph_mdsmap_destroy(m); return ERR_PTR(err); bad: err = -EINVAL; goto corrupt; } void ceph_mdsmap_destroy(struct ceph_mdsmap *m) { int i; if (m->m_info) { for (i = 0; i < m->possible_max_rank; i++) kfree(m->m_info[i].export_targets); kfree(m->m_info); } kfree(m->m_data_pg_pools); kfree(m); } bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m) { int i, nr_active = 0; if (!m->m_enabled) return false; if (m->m_damaged) return false; if (m->m_num_laggy == m->m_num_active_mds) return false; for (i = 0; i < m->possible_max_rank; i++) { if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE) nr_active++; } return nr_active > 0; } |
| 60 60 9 16 78 26 100 2 83 7 118 1 108 117 83 76 81 73 26 51 10 18 1 4 1 14 67 72 3 3 7 7 7 1 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 | // SPDX-License-Identifier: GPL-2.0-only /* * DCCP connection tracking protocol helper * * Copyright (c) 2005, 2006, 2008 Patrick McHardy <kaber@trash.net> */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/sysctl.h> #include <linux/spinlock.h> #include <linux/skbuff.h> #include <linux/dccp.h> #include <linux/slab.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <linux/netfilter/nfnetlink_conntrack.h> #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_log.h> /* Timeouts are based on values from RFC4340: * * - REQUEST: * * 8.1.2. Client Request * * A client MAY give up on its DCCP-Requests after some time * (3 minutes, for example). * * - RESPOND: * * 8.1.3. Server Response * * It MAY also leave the RESPOND state for CLOSED after a timeout of * not less than 4MSL (8 minutes); * * - PARTOPEN: * * 8.1.5. Handshake Completion * * If the client remains in PARTOPEN for more than 4MSL (8 minutes), * it SHOULD reset the connection with Reset Code 2, "Aborted". * * - OPEN: * * The DCCP timestamp overflows after 11.9 hours. If the connection * stays idle this long the sequence number won't be recognized * as valid anymore. * * - CLOSEREQ/CLOSING: * * 8.3. Termination * * The retransmission timer should initially be set to go off in two * round-trip times and should back off to not less than once every * 64 seconds ... * * - TIMEWAIT: * * 4.3. States * * A server or client socket remains in this state for 2MSL (4 minutes) * after the connection has been town down, ... */ #define DCCP_MSL (2 * 60 * HZ) #ifdef CONFIG_NF_CONNTRACK_PROCFS static const char * const dccp_state_names[] = { [CT_DCCP_NONE] = "NONE", [CT_DCCP_REQUEST] = "REQUEST", [CT_DCCP_RESPOND] = "RESPOND", [CT_DCCP_PARTOPEN] = "PARTOPEN", [CT_DCCP_OPEN] = "OPEN", [CT_DCCP_CLOSEREQ] = "CLOSEREQ", [CT_DCCP_CLOSING] = "CLOSING", [CT_DCCP_TIMEWAIT] = "TIMEWAIT", [CT_DCCP_IGNORE] = "IGNORE", [CT_DCCP_INVALID] = "INVALID", }; #endif #define sNO CT_DCCP_NONE #define sRQ CT_DCCP_REQUEST #define sRS CT_DCCP_RESPOND #define sPO CT_DCCP_PARTOPEN #define sOP CT_DCCP_OPEN #define sCR CT_DCCP_CLOSEREQ #define sCG CT_DCCP_CLOSING #define sTW CT_DCCP_TIMEWAIT #define sIG CT_DCCP_IGNORE #define sIV CT_DCCP_INVALID /* * DCCP state transition table * * The assumption is the same as for TCP tracking: * * We are the man in the middle. All the packets go through us but might * get lost in transit to the destination. It is assumed that the destination * can't receive segments we haven't seen. * * The following states exist: * * NONE: Initial state, expecting Request * REQUEST: Request seen, waiting for Response from server * RESPOND: Response from server seen, waiting for Ack from client * PARTOPEN: Ack after Response seen, waiting for packet other than Response, * Reset or Sync from server * OPEN: Packet other than Response, Reset or Sync seen * CLOSEREQ: CloseReq from server seen, expecting Close from client * CLOSING: Close seen, expecting Reset * TIMEWAIT: Reset seen * IGNORE: Not determinable whether packet is valid * * Some states exist only on one side of the connection: REQUEST, RESPOND, * PARTOPEN, CLOSEREQ. For the other side these states are equivalent to * the one it was in before. * * Packets are marked as ignored (sIG) if we don't know if they're valid * (for example a reincarnation of a connection we didn't notice is dead * already) and the server may send back a connection closing Reset or a * Response. They're also used for Sync/SyncAck packets, which we don't * care about. */ static const u_int8_t dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = { [CT_DCCP_ROLE_CLIENT] = { [DCCP_PKT_REQUEST] = { /* * sNO -> sRQ Regular Request * sRQ -> sRQ Retransmitted Request or reincarnation * sRS -> sRS Retransmitted Request (apparently Response * got lost after we saw it) or reincarnation * sPO -> sIG Ignore, conntrack might be out of sync * sOP -> sIG Ignore, conntrack might be out of sync * sCR -> sIG Ignore, conntrack might be out of sync * sCG -> sIG Ignore, conntrack might be out of sync * sTW -> sRQ Reincarnation * * sNO, sRQ, sRS, sPO. sOP, sCR, sCG, sTW, */ sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sRQ, }, [DCCP_PKT_RESPONSE] = { /* * sNO -> sIV Invalid * sRQ -> sIG Ignore, might be response to ignored Request * sRS -> sIG Ignore, might be response to ignored Request * sPO -> sIG Ignore, might be response to ignored Request * sOP -> sIG Ignore, might be response to ignored Request * sCR -> sIG Ignore, might be response to ignored Request * sCG -> sIG Ignore, might be response to ignored Request * sTW -> sIV Invalid, reincarnation in reverse direction * goes through sRQ * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIV, }, [DCCP_PKT_ACK] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sPO Ack for Response, move to PARTOPEN (8.1.5.) * sPO -> sPO Retransmitted Ack for Response, remain in PARTOPEN * sOP -> sOP Regular ACK, remain in OPEN * sCR -> sCR Ack in CLOSEREQ MAY be processed (8.3.) * sCG -> sCG Ack in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sPO, sPO, sOP, sCR, sCG, sIV }, [DCCP_PKT_DATA] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sIV MUST use DataAck in PARTOPEN state (8.1.5.) * sOP -> sOP Regular Data packet * sCR -> sCR Data in CLOSEREQ MAY be processed (8.3.) * sCG -> sCG Data in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sIV, sOP, sCR, sCG, sIV, }, [DCCP_PKT_DATAACK] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sPO Ack for Response, move to PARTOPEN (8.1.5.) * sPO -> sPO Remain in PARTOPEN state * sOP -> sOP Regular DataAck packet in OPEN state * sCR -> sCR DataAck in CLOSEREQ MAY be processed (8.3.) * sCG -> sCG DataAck in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sPO, sPO, sOP, sCR, sCG, sIV }, [DCCP_PKT_CLOSEREQ] = { /* * CLOSEREQ may only be sent by the server. * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }, [DCCP_PKT_CLOSE] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sCG Client-initiated close * sOP -> sCG Client-initiated close * sCR -> sCG Close in response to CloseReq (8.3.) * sCG -> sCG Retransmit * sTW -> sIV Late retransmit, already in TIME_WAIT * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sCG, sCG, sCG, sIV, sIV }, [DCCP_PKT_RESET] = { /* * sNO -> sIV No connection * sRQ -> sTW Sync received or timeout, SHOULD send Reset (8.1.1.) * sRS -> sTW Response received without Request * sPO -> sTW Timeout, SHOULD send Reset (8.1.5.) * sOP -> sTW Connection reset * sCR -> sTW Connection reset * sCG -> sTW Connection reset * sTW -> sIG Ignore (don't refresh timer) * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sTW, sTW, sTW, sTW, sTW, sTW, sIG }, [DCCP_PKT_SYNC] = { /* * We currently ignore Sync packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, [DCCP_PKT_SYNCACK] = { /* * We currently ignore SyncAck packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, }, [CT_DCCP_ROLE_SERVER] = { [DCCP_PKT_REQUEST] = { /* * sNO -> sIV Invalid * sRQ -> sIG Ignore, conntrack might be out of sync * sRS -> sIG Ignore, conntrack might be out of sync * sPO -> sIG Ignore, conntrack might be out of sync * sOP -> sIG Ignore, conntrack might be out of sync * sCR -> sIG Ignore, conntrack might be out of sync * sCG -> sIG Ignore, conntrack might be out of sync * sTW -> sRQ Reincarnation, must reverse roles * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sRQ }, [DCCP_PKT_RESPONSE] = { /* * sNO -> sIV Response without Request * sRQ -> sRS Response to clients Request * sRS -> sRS Retransmitted Response (8.1.3. SHOULD NOT) * sPO -> sIG Response to an ignored Request or late retransmit * sOP -> sIG Ignore, might be response to ignored Request * sCR -> sIG Ignore, might be response to ignored Request * sCG -> sIG Ignore, might be response to ignored Request * sTW -> sIV Invalid, Request from client in sTW moves to sRQ * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sRS, sRS, sIG, sIG, sIG, sIG, sIV }, [DCCP_PKT_ACK] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sOP Enter OPEN state (8.1.5.) * sOP -> sOP Regular Ack in OPEN state * sCR -> sIV Waiting for Close from client * sCG -> sCG Ack in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV }, [DCCP_PKT_DATA] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sOP Enter OPEN state (8.1.5.) * sOP -> sOP Regular Data packet in OPEN state * sCR -> sIV Waiting for Close from client * sCG -> sCG Data in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV }, [DCCP_PKT_DATAACK] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sOP Enter OPEN state (8.1.5.) * sOP -> sOP Regular DataAck in OPEN state * sCR -> sIV Waiting for Close from client * sCG -> sCG Data in CLOSING MAY be processed (8.3.) * sTW -> sIV * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV }, [DCCP_PKT_CLOSEREQ] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sOP -> sCR Move directly to CLOSEREQ (8.1.5.) * sOP -> sCR CloseReq in OPEN state * sCR -> sCR Retransmit * sCG -> sCR Simultaneous close, client sends another Close * sTW -> sIV Already closed * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sCR, sCR, sCR, sCR, sIV }, [DCCP_PKT_CLOSE] = { /* * sNO -> sIV No connection * sRQ -> sIV No connection * sRS -> sIV No connection * sPO -> sOP -> sCG Move direcly to CLOSING * sOP -> sCG Move to CLOSING * sCR -> sIV Close after CloseReq is invalid * sCG -> sCG Retransmit * sTW -> sIV Already closed * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIV, sIV, sCG, sCG, sIV, sCG, sIV }, [DCCP_PKT_RESET] = { /* * sNO -> sIV No connection * sRQ -> sTW Reset in response to Request * sRS -> sTW Timeout, SHOULD send Reset (8.1.3.) * sPO -> sTW Timeout, SHOULD send Reset (8.1.3.) * sOP -> sTW * sCR -> sTW * sCG -> sTW * sTW -> sIG Ignore (don't refresh timer) * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW, sTW */ sIV, sTW, sTW, sTW, sTW, sTW, sTW, sTW, sIG }, [DCCP_PKT_SYNC] = { /* * We currently ignore Sync packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, [DCCP_PKT_SYNCACK] = { /* * We currently ignore SyncAck packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, }, }; static noinline bool dccp_new(struct nf_conn *ct, const struct sk_buff *skb, const struct dccp_hdr *dh, const struct nf_hook_state *hook_state) { struct net *net = nf_ct_net(ct); struct nf_dccp_net *dn; const char *msg; u_int8_t state; state = dccp_state_table[CT_DCCP_ROLE_CLIENT][dh->dccph_type][CT_DCCP_NONE]; switch (state) { default: dn = nf_dccp_pernet(net); if (dn->dccp_loose == 0) { msg = "not picking up existing connection "; goto out_invalid; } break; case CT_DCCP_REQUEST: break; case CT_DCCP_INVALID: msg = "invalid state transition "; goto out_invalid; } ct->proto.dccp.role[IP_CT_DIR_ORIGINAL] = CT_DCCP_ROLE_CLIENT; ct->proto.dccp.role[IP_CT_DIR_REPLY] = CT_DCCP_ROLE_SERVER; ct->proto.dccp.state = CT_DCCP_NONE; ct->proto.dccp.last_pkt = DCCP_PKT_REQUEST; ct->proto.dccp.last_dir = IP_CT_DIR_ORIGINAL; ct->proto.dccp.handshake_seq = 0; return true; out_invalid: nf_ct_l4proto_log_invalid(skb, ct, hook_state, "%s", msg); return false; } static u64 dccp_ack_seq(const struct dccp_hdr *dh) { const struct dccp_hdr_ack_bits *dhack; dhack = (void *)dh + __dccp_basic_hdr_len(dh); return ((u64)ntohs(dhack->dccph_ack_nr_high) << 32) + ntohl(dhack->dccph_ack_nr_low); } static bool dccp_error(const struct dccp_hdr *dh, struct sk_buff *skb, unsigned int dataoff, const struct nf_hook_state *state) { static const unsigned long require_seq48 = 1 << DCCP_PKT_REQUEST | 1 << DCCP_PKT_RESPONSE | 1 << DCCP_PKT_CLOSEREQ | 1 << DCCP_PKT_CLOSE | 1 << DCCP_PKT_RESET | 1 << DCCP_PKT_SYNC | 1 << DCCP_PKT_SYNCACK; unsigned int dccp_len = skb->len - dataoff; unsigned int cscov; const char *msg; u8 type; BUILD_BUG_ON(DCCP_PKT_INVALID >= BITS_PER_LONG); if (dh->dccph_doff * 4 < sizeof(struct dccp_hdr) || dh->dccph_doff * 4 > dccp_len) { msg = "nf_ct_dccp: truncated/malformed packet "; goto out_invalid; } cscov = dccp_len; if (dh->dccph_cscov) { cscov = (dh->dccph_cscov - 1) * 4; if (cscov > dccp_len) { msg = "nf_ct_dccp: bad checksum coverage "; goto out_invalid; } } if (state->hook == NF_INET_PRE_ROUTING && state->net->ct.sysctl_checksum && nf_checksum_partial(skb, state->hook, dataoff, cscov, IPPROTO_DCCP, state->pf)) { msg = "nf_ct_dccp: bad checksum "; goto out_invalid; } type = dh->dccph_type; if (type >= DCCP_PKT_INVALID) { msg = "nf_ct_dccp: reserved packet type "; goto out_invalid; } if (test_bit(type, &require_seq48) && !dh->dccph_x) { msg = "nf_ct_dccp: type lacks 48bit sequence numbers"; goto out_invalid; } return false; out_invalid: nf_l4proto_log_invalid(skb, state, IPPROTO_DCCP, "%s", msg); return true; } struct nf_conntrack_dccp_buf { struct dccp_hdr dh; /* generic header part */ struct dccp_hdr_ext ext; /* optional depending dh->dccph_x */ union { /* depends on header type */ struct dccp_hdr_ack_bits ack; struct dccp_hdr_request req; struct dccp_hdr_response response; struct dccp_hdr_reset rst; } u; }; static struct dccp_hdr * dccp_header_pointer(const struct sk_buff *skb, int offset, const struct dccp_hdr *dh, struct nf_conntrack_dccp_buf *buf) { unsigned int hdrlen = __dccp_hdr_len(dh); if (hdrlen > sizeof(*buf)) return NULL; return skb_header_pointer(skb, offset, hdrlen, buf); } int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); struct nf_conntrack_dccp_buf _dh; u_int8_t type, old_state, new_state; enum ct_dccp_roles role; unsigned int *timeouts; struct dccp_hdr *dh; dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); if (!dh) return -NF_ACCEPT; if (dccp_error(dh, skb, dataoff, state)) return -NF_ACCEPT; /* pull again, including possible 48 bit sequences and subtype header */ dh = dccp_header_pointer(skb, dataoff, dh, &_dh); if (!dh) return -NF_ACCEPT; type = dh->dccph_type; if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state)) return -NF_ACCEPT; if (type == DCCP_PKT_RESET && !test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { /* Tear down connection immediately if only reply is a RESET */ nf_ct_kill_acct(ct, ctinfo, skb); return NF_ACCEPT; } spin_lock_bh(&ct->lock); role = ct->proto.dccp.role[dir]; old_state = ct->proto.dccp.state; new_state = dccp_state_table[role][type][old_state]; switch (new_state) { case CT_DCCP_REQUEST: if (old_state == CT_DCCP_TIMEWAIT && role == CT_DCCP_ROLE_SERVER) { /* Reincarnation in the reverse direction: reopen and * reverse client/server roles. */ ct->proto.dccp.role[dir] = CT_DCCP_ROLE_CLIENT; ct->proto.dccp.role[!dir] = CT_DCCP_ROLE_SERVER; } break; case CT_DCCP_RESPOND: if (old_state == CT_DCCP_REQUEST) ct->proto.dccp.handshake_seq = dccp_hdr_seq(dh); break; case CT_DCCP_PARTOPEN: if (old_state == CT_DCCP_RESPOND && type == DCCP_PKT_ACK && dccp_ack_seq(dh) == ct->proto.dccp.handshake_seq) set_bit(IPS_ASSURED_BIT, &ct->status); break; case CT_DCCP_IGNORE: /* * Connection tracking might be out of sync, so we ignore * packets that might establish a new connection and resync * if the server responds with a valid Response. */ if (ct->proto.dccp.last_dir == !dir && ct->proto.dccp.last_pkt == DCCP_PKT_REQUEST && type == DCCP_PKT_RESPONSE) { ct->proto.dccp.role[!dir] = CT_DCCP_ROLE_CLIENT; ct->proto.dccp.role[dir] = CT_DCCP_ROLE_SERVER; ct->proto.dccp.handshake_seq = dccp_hdr_seq(dh); new_state = CT_DCCP_RESPOND; break; } ct->proto.dccp.last_dir = dir; ct->proto.dccp.last_pkt = type; spin_unlock_bh(&ct->lock); nf_ct_l4proto_log_invalid(skb, ct, state, "%s", "invalid packet"); return NF_ACCEPT; case CT_DCCP_INVALID: spin_unlock_bh(&ct->lock); nf_ct_l4proto_log_invalid(skb, ct, state, "%s", "invalid state transition"); return -NF_ACCEPT; } ct->proto.dccp.last_dir = dir; ct->proto.dccp.last_pkt = type; ct->proto.dccp.state = new_state; spin_unlock_bh(&ct->lock); if (new_state != old_state) nf_conntrack_event_cache(IPCT_PROTOINFO, ct); timeouts = nf_ct_timeout_lookup(ct); if (!timeouts) timeouts = nf_dccp_pernet(nf_ct_net(ct))->dccp_timeout; nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[new_state]); return NF_ACCEPT; } static bool dccp_can_early_drop(const struct nf_conn *ct) { switch (ct->proto.dccp.state) { case CT_DCCP_CLOSEREQ: case CT_DCCP_CLOSING: case CT_DCCP_TIMEWAIT: return true; default: break; } return false; } #ifdef CONFIG_NF_CONNTRACK_PROCFS static void dccp_print_conntrack(struct seq_file *s, struct nf_conn *ct) { seq_printf(s, "%s ", dccp_state_names[ct->proto.dccp.state]); } #endif #if IS_ENABLED(CONFIG_NF_CT_NETLINK) static int dccp_to_nlattr(struct sk_buff *skb, struct nlattr *nla, struct nf_conn *ct, bool destroy) { struct nlattr *nest_parms; spin_lock_bh(&ct->lock); nest_parms = nla_nest_start(skb, CTA_PROTOINFO_DCCP); if (!nest_parms) goto nla_put_failure; if (nla_put_u8(skb, CTA_PROTOINFO_DCCP_STATE, ct->proto.dccp.state)) goto nla_put_failure; if (destroy) goto skip_state; if (nla_put_u8(skb, CTA_PROTOINFO_DCCP_ROLE, ct->proto.dccp.role[IP_CT_DIR_ORIGINAL]) || nla_put_be64(skb, CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ, cpu_to_be64(ct->proto.dccp.handshake_seq), CTA_PROTOINFO_DCCP_PAD)) goto nla_put_failure; skip_state: nla_nest_end(skb, nest_parms); spin_unlock_bh(&ct->lock); return 0; nla_put_failure: spin_unlock_bh(&ct->lock); return -1; } static const struct nla_policy dccp_nla_policy[CTA_PROTOINFO_DCCP_MAX + 1] = { [CTA_PROTOINFO_DCCP_STATE] = { .type = NLA_U8 }, [CTA_PROTOINFO_DCCP_ROLE] = { .type = NLA_U8 }, [CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ] = { .type = NLA_U64 }, [CTA_PROTOINFO_DCCP_PAD] = { .type = NLA_UNSPEC }, }; #define DCCP_NLATTR_SIZE ( \ NLA_ALIGN(NLA_HDRLEN + 1) + \ NLA_ALIGN(NLA_HDRLEN + 1) + \ NLA_ALIGN(NLA_HDRLEN + sizeof(u64)) + \ NLA_ALIGN(NLA_HDRLEN + 0)) static int nlattr_to_dccp(struct nlattr *cda[], struct nf_conn *ct) { struct nlattr *attr = cda[CTA_PROTOINFO_DCCP]; struct nlattr *tb[CTA_PROTOINFO_DCCP_MAX + 1]; int err; if (!attr) return 0; err = nla_parse_nested_deprecated(tb, CTA_PROTOINFO_DCCP_MAX, attr, dccp_nla_policy, NULL); if (err < 0) return err; if (!tb[CTA_PROTOINFO_DCCP_STATE] || !tb[CTA_PROTOINFO_DCCP_ROLE] || nla_get_u8(tb[CTA_PROTOINFO_DCCP_ROLE]) > CT_DCCP_ROLE_MAX || nla_get_u8(tb[CTA_PROTOINFO_DCCP_STATE]) >= CT_DCCP_IGNORE) { return -EINVAL; } spin_lock_bh(&ct->lock); ct->proto.dccp.state = nla_get_u8(tb[CTA_PROTOINFO_DCCP_STATE]); if (nla_get_u8(tb[CTA_PROTOINFO_DCCP_ROLE]) == CT_DCCP_ROLE_CLIENT) { ct->proto.dccp.role[IP_CT_DIR_ORIGINAL] = CT_DCCP_ROLE_CLIENT; ct->proto.dccp.role[IP_CT_DIR_REPLY] = CT_DCCP_ROLE_SERVER; } else { ct->proto.dccp.role[IP_CT_DIR_ORIGINAL] = CT_DCCP_ROLE_SERVER; ct->proto.dccp.role[IP_CT_DIR_REPLY] = CT_DCCP_ROLE_CLIENT; } if (tb[CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ]) { ct->proto.dccp.handshake_seq = be64_to_cpu(nla_get_be64(tb[CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ])); } spin_unlock_bh(&ct->lock); return 0; } #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nfnetlink_cttimeout.h> static int dccp_timeout_nlattr_to_obj(struct nlattr *tb[], struct net *net, void *data) { struct nf_dccp_net *dn = nf_dccp_pernet(net); unsigned int *timeouts = data; int i; if (!timeouts) timeouts = dn->dccp_timeout; /* set default DCCP timeouts. */ for (i=0; i<CT_DCCP_MAX; i++) timeouts[i] = dn->dccp_timeout[i]; /* there's a 1:1 mapping between attributes and protocol states. */ for (i=CTA_TIMEOUT_DCCP_UNSPEC+1; i<CTA_TIMEOUT_DCCP_MAX+1; i++) { if (tb[i]) { timeouts[i] = ntohl(nla_get_be32(tb[i])) * HZ; } } timeouts[CTA_TIMEOUT_DCCP_UNSPEC] = timeouts[CTA_TIMEOUT_DCCP_REQUEST]; return 0; } static int dccp_timeout_obj_to_nlattr(struct sk_buff *skb, const void *data) { const unsigned int *timeouts = data; int i; for (i=CTA_TIMEOUT_DCCP_UNSPEC+1; i<CTA_TIMEOUT_DCCP_MAX+1; i++) { if (nla_put_be32(skb, i, htonl(timeouts[i] / HZ))) goto nla_put_failure; } return 0; nla_put_failure: return -ENOSPC; } static const struct nla_policy dccp_timeout_nla_policy[CTA_TIMEOUT_DCCP_MAX+1] = { [CTA_TIMEOUT_DCCP_REQUEST] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_RESPOND] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_PARTOPEN] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_OPEN] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_CLOSEREQ] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_CLOSING] = { .type = NLA_U32 }, [CTA_TIMEOUT_DCCP_TIMEWAIT] = { .type = NLA_U32 }, }; #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ void nf_conntrack_dccp_init_net(struct net *net) { struct nf_dccp_net *dn = nf_dccp_pernet(net); /* default values */ dn->dccp_loose = 1; dn->dccp_timeout[CT_DCCP_REQUEST] = 2 * DCCP_MSL; dn->dccp_timeout[CT_DCCP_RESPOND] = 4 * DCCP_MSL; dn->dccp_timeout[CT_DCCP_PARTOPEN] = 4 * DCCP_MSL; dn->dccp_timeout[CT_DCCP_OPEN] = 12 * 3600 * HZ; dn->dccp_timeout[CT_DCCP_CLOSEREQ] = 64 * HZ; dn->dccp_timeout[CT_DCCP_CLOSING] = 64 * HZ; dn->dccp_timeout[CT_DCCP_TIMEWAIT] = 2 * DCCP_MSL; /* timeouts[0] is unused, make it same as SYN_SENT so * ->timeouts[0] contains 'new' timeout, like udp or icmp. */ dn->dccp_timeout[CT_DCCP_NONE] = dn->dccp_timeout[CT_DCCP_REQUEST]; } const struct nf_conntrack_l4proto nf_conntrack_l4proto_dccp = { .l4proto = IPPROTO_DCCP, .can_early_drop = dccp_can_early_drop, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = dccp_print_conntrack, #endif #if IS_ENABLED(CONFIG_NF_CT_NETLINK) .nlattr_size = DCCP_NLATTR_SIZE, .to_nlattr = dccp_to_nlattr, .from_nlattr = nlattr_to_dccp, .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr, .nlattr_tuple_size = nf_ct_port_nlattr_tuple_size, .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple, .nla_policy = nf_ct_port_nla_policy, #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT .ctnl_timeout = { .nlattr_to_obj = dccp_timeout_nlattr_to_obj, .obj_to_nlattr = dccp_timeout_obj_to_nlattr, .nlattr_max = CTA_TIMEOUT_DCCP_MAX, .obj_size = sizeof(unsigned int) * CT_DCCP_MAX, .nla_policy = dccp_timeout_nla_policy, }, #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ }; |
| 2 1 1 1 4 4 4 2 1 1 1 1 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2016 Tom Herbert <tom@herbertland.com> */ #include <linux/skbuff.h> #include <linux/skbuff_ref.h> #include <linux/workqueue.h> #include <net/strparser.h> #include <net/tcp.h> #include <net/sock.h> #include <net/tls.h> #include "tls.h" static struct workqueue_struct *tls_strp_wq; static void tls_strp_abort_strp(struct tls_strparser *strp, int err) { if (strp->stopped) return; strp->stopped = 1; /* Report an error on the lower socket */ WRITE_ONCE(strp->sk->sk_err, -err); /* Paired with smp_rmb() in tcp_poll() */ smp_wmb(); sk_error_report(strp->sk); } static void tls_strp_anchor_free(struct tls_strparser *strp) { struct skb_shared_info *shinfo = skb_shinfo(strp->anchor); DEBUG_NET_WARN_ON_ONCE(atomic_read(&shinfo->dataref) != 1); if (!strp->copy_mode) shinfo->frag_list = NULL; consume_skb(strp->anchor); strp->anchor = NULL; } static struct sk_buff * tls_strp_skb_copy(struct tls_strparser *strp, struct sk_buff *in_skb, int offset, int len) { struct sk_buff *skb; int i, err; skb = alloc_skb_with_frags(0, len, TLS_PAGE_ORDER, &err, strp->sk->sk_allocation); if (!skb) return NULL; for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; WARN_ON_ONCE(skb_copy_bits(in_skb, offset, skb_frag_address(frag), skb_frag_size(frag))); offset += skb_frag_size(frag); } skb->len = len; skb->data_len = len; skb_copy_header(skb, in_skb); return skb; } /* Create a new skb with the contents of input copied to its page frags */ static struct sk_buff *tls_strp_msg_make_copy(struct tls_strparser *strp) { struct strp_msg *rxm; struct sk_buff *skb; skb = tls_strp_skb_copy(strp, strp->anchor, strp->stm.offset, strp->stm.full_len); if (!skb) return NULL; rxm = strp_msg(skb); rxm->offset = 0; return skb; } /* Steal the input skb, input msg is invalid after calling this function */ struct sk_buff *tls_strp_msg_detach(struct tls_sw_context_rx *ctx) { struct tls_strparser *strp = &ctx->strp; #ifdef CONFIG_TLS_DEVICE DEBUG_NET_WARN_ON_ONCE(!strp->anchor->decrypted); #else /* This function turns an input into an output, * that can only happen if we have offload. */ WARN_ON(1); #endif if (strp->copy_mode) { struct sk_buff *skb; /* Replace anchor with an empty skb, this is a little * dangerous but __tls_cur_msg() warns on empty skbs * so hopefully we'll catch abuses. */ skb = alloc_skb(0, strp->sk->sk_allocation); if (!skb) return NULL; swap(strp->anchor, skb); return skb; } return tls_strp_msg_make_copy(strp); } /* Force the input skb to be in copy mode. The data ownership remains * with the input skb itself (meaning unpause will wipe it) but it can * be modified. */ int tls_strp_msg_cow(struct tls_sw_context_rx *ctx) { struct tls_strparser *strp = &ctx->strp; struct sk_buff *skb; if (strp->copy_mode) return 0; skb = tls_strp_msg_make_copy(strp); if (!skb) return -ENOMEM; tls_strp_anchor_free(strp); strp->anchor = skb; tcp_read_done(strp->sk, strp->stm.full_len); strp->copy_mode = 1; return 0; } /* Make a clone (in the skb sense) of the input msg to keep a reference * to the underlying data. The reference-holding skbs get placed on * @dst. */ int tls_strp_msg_hold(struct tls_strparser *strp, struct sk_buff_head *dst) { struct skb_shared_info *shinfo = skb_shinfo(strp->anchor); if (strp->copy_mode) { struct sk_buff *skb; WARN_ON_ONCE(!shinfo->nr_frags); /* We can't skb_clone() the anchor, it gets wiped by unpause */ skb = alloc_skb(0, strp->sk->sk_allocation); if (!skb) return -ENOMEM; __skb_queue_tail(dst, strp->anchor); strp->anchor = skb; } else { struct sk_buff *iter, *clone; int chunk, len, offset; offset = strp->stm.offset; len = strp->stm.full_len; iter = shinfo->frag_list; while (len > 0) { if (iter->len <= offset) { offset -= iter->len; goto next; } chunk = iter->len - offset; offset = 0; clone = skb_clone(iter, strp->sk->sk_allocation); if (!clone) return -ENOMEM; __skb_queue_tail(dst, clone); len -= chunk; next: iter = iter->next; } } return 0; } static void tls_strp_flush_anchor_copy(struct tls_strparser *strp) { struct skb_shared_info *shinfo = skb_shinfo(strp->anchor); int i; DEBUG_NET_WARN_ON_ONCE(atomic_read(&shinfo->dataref) != 1); for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], false); shinfo->nr_frags = 0; if (strp->copy_mode) { kfree_skb_list(shinfo->frag_list); shinfo->frag_list = NULL; } strp->copy_mode = 0; strp->mixed_decrypted = 0; } static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb, struct sk_buff *in_skb, unsigned int offset, size_t in_len) { size_t len, chunk; skb_frag_t *frag; int sz; frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE]; len = in_len; /* First make sure we got the header */ if (!strp->stm.full_len) { /* Assume one page is more than enough for headers */ chunk = min_t(size_t, len, PAGE_SIZE - skb_frag_size(frag)); WARN_ON_ONCE(skb_copy_bits(in_skb, offset, skb_frag_address(frag) + skb_frag_size(frag), chunk)); skb->len += chunk; skb->data_len += chunk; skb_frag_size_add(frag, chunk); sz = tls_rx_msg_size(strp, skb); if (sz < 0) return sz; /* We may have over-read, sz == 0 is guaranteed under-read */ if (unlikely(sz && sz < skb->len)) { int over = skb->len - sz; WARN_ON_ONCE(over > chunk); skb->len -= over; skb->data_len -= over; skb_frag_size_add(frag, -over); chunk -= over; } frag++; len -= chunk; offset += chunk; strp->stm.full_len = sz; if (!strp->stm.full_len) goto read_done; } /* Load up more data */ while (len && strp->stm.full_len > skb->len) { chunk = min_t(size_t, len, strp->stm.full_len - skb->len); chunk = min_t(size_t, chunk, PAGE_SIZE - skb_frag_size(frag)); WARN_ON_ONCE(skb_copy_bits(in_skb, offset, skb_frag_address(frag) + skb_frag_size(frag), chunk)); skb->len += chunk; skb->data_len += chunk; skb_frag_size_add(frag, chunk); frag++; len -= chunk; offset += chunk; } read_done: return in_len - len; } static int tls_strp_copyin_skb(struct tls_strparser *strp, struct sk_buff *skb, struct sk_buff *in_skb, unsigned int offset, size_t in_len) { struct sk_buff *nskb, *first, *last; struct skb_shared_info *shinfo; size_t chunk; int sz; if (strp->stm.full_len) chunk = strp->stm.full_len - skb->len; else chunk = TLS_MAX_PAYLOAD_SIZE + PAGE_SIZE; chunk = min(chunk, in_len); nskb = tls_strp_skb_copy(strp, in_skb, offset, chunk); if (!nskb) return -ENOMEM; shinfo = skb_shinfo(skb); if (!shinfo->frag_list) { shinfo->frag_list = nskb; nskb->prev = nskb; } else { first = shinfo->frag_list; last = first->prev; last->next = nskb; first->prev = nskb; } skb->len += chunk; skb->data_len += chunk; if (!strp->stm.full_len) { sz = tls_rx_msg_size(strp, skb); if (sz < 0) return sz; /* We may have over-read, sz == 0 is guaranteed under-read */ if (unlikely(sz && sz < skb->len)) { int over = skb->len - sz; WARN_ON_ONCE(over > chunk); skb->len -= over; skb->data_len -= over; __pskb_trim(nskb, nskb->len - over); chunk -= over; } strp->stm.full_len = sz; } return chunk; } static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb, unsigned int offset, size_t in_len) { struct tls_strparser *strp = (struct tls_strparser *)desc->arg.data; struct sk_buff *skb; int ret; if (strp->msg_ready) return 0; skb = strp->anchor; if (!skb->len) skb_copy_decrypted(skb, in_skb); else strp->mixed_decrypted |= !!skb_cmp_decrypted(skb, in_skb); if (IS_ENABLED(CONFIG_TLS_DEVICE) && strp->mixed_decrypted) ret = tls_strp_copyin_skb(strp, skb, in_skb, offset, in_len); else ret = tls_strp_copyin_frag(strp, skb, in_skb, offset, in_len); if (ret < 0) { desc->error = ret; ret = 0; } if (strp->stm.full_len && strp->stm.full_len == skb->len) { desc->count = 0; WRITE_ONCE(strp->msg_ready, 1); tls_rx_msg_ready(strp); } return ret; } static int tls_strp_read_copyin(struct tls_strparser *strp) { read_descriptor_t desc; desc.arg.data = strp; desc.error = 0; desc.count = 1; /* give more than one skb per call */ /* sk should be locked here, so okay to do read_sock */ tcp_read_sock(strp->sk, &desc, tls_strp_copyin); return desc.error; } static int tls_strp_read_copy(struct tls_strparser *strp, bool qshort) { struct skb_shared_info *shinfo; struct page *page; int need_spc, len; /* If the rbuf is small or rcv window has collapsed to 0 we need * to read the data out. Otherwise the connection will stall. * Without pressure threshold of INT_MAX will never be ready. */ if (likely(qshort && !tcp_epollin_ready(strp->sk, INT_MAX))) return 0; shinfo = skb_shinfo(strp->anchor); shinfo->frag_list = NULL; /* If we don't know the length go max plus page for cipher overhead */ need_spc = strp->stm.full_len ?: TLS_MAX_PAYLOAD_SIZE + PAGE_SIZE; for (len = need_spc; len > 0; len -= PAGE_SIZE) { page = alloc_page(strp->sk->sk_allocation); if (!page) { tls_strp_flush_anchor_copy(strp); return -ENOMEM; } skb_fill_page_desc(strp->anchor, shinfo->nr_frags++, page, 0, 0); } strp->copy_mode = 1; strp->stm.offset = 0; strp->anchor->len = 0; strp->anchor->data_len = 0; strp->anchor->truesize = round_up(need_spc, PAGE_SIZE); tls_strp_read_copyin(strp); return 0; } static bool tls_strp_check_queue_ok(struct tls_strparser *strp) { unsigned int len = strp->stm.offset + strp->stm.full_len; struct sk_buff *first, *skb; u32 seq; first = skb_shinfo(strp->anchor)->frag_list; skb = first; seq = TCP_SKB_CB(first)->seq; /* Make sure there's no duplicate data in the queue, * and the decrypted status matches. */ while (skb->len < len) { seq += skb->len; len -= skb->len; skb = skb->next; if (TCP_SKB_CB(skb)->seq != seq) return false; if (skb_cmp_decrypted(first, skb)) return false; } return true; } static void tls_strp_load_anchor_with_queue(struct tls_strparser *strp, int len) { struct tcp_sock *tp = tcp_sk(strp->sk); struct sk_buff *first; u32 offset; first = tcp_recv_skb(strp->sk, tp->copied_seq, &offset); if (WARN_ON_ONCE(!first)) return; /* Bestow the state onto the anchor */ strp->anchor->len = offset + len; strp->anchor->data_len = offset + len; strp->anchor->truesize = offset + len; skb_shinfo(strp->anchor)->frag_list = first; skb_copy_header(strp->anchor, first); strp->anchor->destructor = NULL; strp->stm.offset = offset; } void tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh) { struct strp_msg *rxm; struct tls_msg *tlm; DEBUG_NET_WARN_ON_ONCE(!strp->msg_ready); DEBUG_NET_WARN_ON_ONCE(!strp->stm.full_len); if (!strp->copy_mode && force_refresh) { if (WARN_ON(tcp_inq(strp->sk) < strp->stm.full_len)) return; tls_strp_load_anchor_with_queue(strp, strp->stm.full_len); } rxm = strp_msg(strp->anchor); rxm->full_len = strp->stm.full_len; rxm->offset = strp->stm.offset; tlm = tls_msg(strp->anchor); tlm->control = strp->mark; } /* Called with lock held on lower socket */ static int tls_strp_read_sock(struct tls_strparser *strp) { int sz, inq; inq = tcp_inq(strp->sk); if (inq < 1) return 0; if (unlikely(strp->copy_mode)) return tls_strp_read_copyin(strp); if (inq < strp->stm.full_len) return tls_strp_read_copy(strp, true); if (!strp->stm.full_len) { tls_strp_load_anchor_with_queue(strp, inq); sz = tls_rx_msg_size(strp, strp->anchor); if (sz < 0) { tls_strp_abort_strp(strp, sz); return sz; } strp->stm.full_len = sz; if (!strp->stm.full_len || inq < strp->stm.full_len) return tls_strp_read_copy(strp, true); } if (!tls_strp_check_queue_ok(strp)) return tls_strp_read_copy(strp, false); WRITE_ONCE(strp->msg_ready, 1); tls_rx_msg_ready(strp); return 0; } void tls_strp_check_rcv(struct tls_strparser *strp) { if (unlikely(strp->stopped) || strp->msg_ready) return; if (tls_strp_read_sock(strp) == -ENOMEM) queue_work(tls_strp_wq, &strp->work); } /* Lower sock lock held */ void tls_strp_data_ready(struct tls_strparser *strp) { /* This check is needed to synchronize with do_tls_strp_work. * do_tls_strp_work acquires a process lock (lock_sock) whereas * the lock held here is bh_lock_sock. The two locks can be * held by different threads at the same time, but bh_lock_sock * allows a thread in BH context to safely check if the process * lock is held. In this case, if the lock is held, queue work. */ if (sock_owned_by_user_nocheck(strp->sk)) { queue_work(tls_strp_wq, &strp->work); return; } tls_strp_check_rcv(strp); } static void tls_strp_work(struct work_struct *w) { struct tls_strparser *strp = container_of(w, struct tls_strparser, work); lock_sock(strp->sk); tls_strp_check_rcv(strp); release_sock(strp->sk); } void tls_strp_msg_done(struct tls_strparser *strp) { WARN_ON(!strp->stm.full_len); if (likely(!strp->copy_mode)) tcp_read_done(strp->sk, strp->stm.full_len); else tls_strp_flush_anchor_copy(strp); WRITE_ONCE(strp->msg_ready, 0); memset(&strp->stm, 0, sizeof(strp->stm)); tls_strp_check_rcv(strp); } void tls_strp_stop(struct tls_strparser *strp) { strp->stopped = 1; } int tls_strp_init(struct tls_strparser *strp, struct sock *sk) { memset(strp, 0, sizeof(*strp)); strp->sk = sk; strp->anchor = alloc_skb(0, GFP_KERNEL); if (!strp->anchor) return -ENOMEM; INIT_WORK(&strp->work, tls_strp_work); return 0; } /* strp must already be stopped so that tls_strp_recv will no longer be called. * Note that tls_strp_done is not called with the lower socket held. */ void tls_strp_done(struct tls_strparser *strp) { WARN_ON(!strp->stopped); cancel_work_sync(&strp->work); tls_strp_anchor_free(strp); } int __init tls_strp_dev_init(void) { tls_strp_wq = create_workqueue("tls-strp"); if (unlikely(!tls_strp_wq)) return -ENOMEM; return 0; } void tls_strp_dev_exit(void) { destroy_workqueue(tls_strp_wq); } |
| 6 45 46 254 226 2 73 217 86 285 363 218 66 249 220 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _FAT_H #define _FAT_H #include <linux/buffer_head.h> #include <linux/nls.h> #include <linux/hash.h> #include <linux/ratelimit.h> #include <linux/msdos_fs.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> /* * vfat shortname flags */ #define VFAT_SFN_DISPLAY_LOWER 0x0001 /* convert to lowercase for display */ #define VFAT_SFN_DISPLAY_WIN95 0x0002 /* emulate win95 rule for display */ #define VFAT_SFN_DISPLAY_WINNT 0x0004 /* emulate winnt rule for display */ #define VFAT_SFN_CREATE_WIN95 0x0100 /* emulate win95 rule for create */ #define VFAT_SFN_CREATE_WINNT 0x0200 /* emulate winnt rule for create */ #define FAT_ERRORS_CONT 1 /* ignore error and continue */ #define FAT_ERRORS_PANIC 2 /* panic on error */ #define FAT_ERRORS_RO 3 /* remount r/o on error */ #define FAT_NFS_STALE_RW 1 /* NFS RW support, can cause ESTALE */ #define FAT_NFS_NOSTALE_RO 2 /* NFS RO support, no ESTALE issue */ struct fat_mount_options { kuid_t fs_uid; kgid_t fs_gid; unsigned short fs_fmask; unsigned short fs_dmask; unsigned short codepage; /* Codepage for shortname conversions */ int time_offset; /* Offset of timestamps from UTC (in minutes) */ char *iocharset; /* Charset used for filename input/display */ unsigned short shortname; /* flags for shortname display/create rule */ unsigned char name_check; /* r = relaxed, n = normal, s = strict */ unsigned char errors; /* On error: continue, panic, remount-ro */ unsigned char nfs; /* NFS support: nostale_ro, stale_rw */ unsigned short allow_utime;/* permission for setting the [am]time */ unsigned quiet:1, /* set = fake successful chmods and chowns */ showexec:1, /* set = only set x bit for com/exe/bat */ sys_immutable:1, /* set = system files are immutable */ dotsOK:1, /* set = hidden and system files are named '.filename' */ isvfat:1, /* 0=no vfat long filename support, 1=vfat support */ utf8:1, /* Use of UTF-8 character set (Default) */ unicode_xlate:1, /* create escape sequences for unhandled Unicode */ numtail:1, /* Does first alias have a numeric '~1' type tail? */ flush:1, /* write things quickly */ nocase:1, /* Does this need case conversion? 0=need case conversion*/ usefree:1, /* Use free_clusters for FAT32 */ tz_set:1, /* Filesystem timestamps' offset set */ rodir:1, /* allow ATTR_RO for directory */ discard:1, /* Issue discard requests on deletions */ dos1xfloppy:1, /* Assume default BPB for DOS 1.x floppies */ debug:1; /* Not currently used */ }; #define FAT_HASH_BITS 8 #define FAT_HASH_SIZE (1UL << FAT_HASH_BITS) /* * MS-DOS file system in-core superblock data */ struct msdos_sb_info { unsigned short sec_per_clus; /* sectors/cluster */ unsigned short cluster_bits; /* log2(cluster_size) */ unsigned int cluster_size; /* cluster size */ unsigned char fats, fat_bits; /* number of FATs, FAT bits (12,16 or 32) */ unsigned short fat_start; unsigned long fat_length; /* FAT start & length (sec.) */ unsigned long dir_start; unsigned short dir_entries; /* root dir start & entries */ unsigned long data_start; /* first data sector */ unsigned long max_cluster; /* maximum cluster number */ unsigned long root_cluster; /* first cluster of the root directory */ unsigned long fsinfo_sector; /* sector number of FAT32 fsinfo */ struct mutex fat_lock; struct mutex nfs_build_inode_lock; struct mutex s_lock; unsigned int prev_free; /* previously allocated cluster number */ unsigned int free_clusters; /* -1 if undefined */ unsigned int free_clus_valid; /* is free_clusters valid? */ struct fat_mount_options options; struct nls_table *nls_disk; /* Codepage used on disk */ struct nls_table *nls_io; /* Charset used for input and display */ const void *dir_ops; /* Opaque; default directory operations */ int dir_per_block; /* dir entries per block */ int dir_per_block_bits; /* log2(dir_per_block) */ unsigned int vol_id; /*volume ID*/ int fatent_shift; const struct fatent_operations *fatent_ops; struct inode *fat_inode; struct inode *fsinfo_inode; struct ratelimit_state ratelimit; spinlock_t inode_hash_lock; struct hlist_head inode_hashtable[FAT_HASH_SIZE]; spinlock_t dir_hash_lock; struct hlist_head dir_hashtable[FAT_HASH_SIZE]; unsigned int dirty; /* fs state before mount */ struct rcu_head rcu; }; #define FAT_CACHE_VALID 0 /* special case for valid cache */ /* * MS-DOS file system inode data in memory */ struct msdos_inode_info { spinlock_t cache_lru_lock; struct list_head cache_lru; int nr_caches; /* for avoiding the race between fat_free() and fat_get_cluster() */ unsigned int cache_valid_id; /* NOTE: mmu_private is 64bits, so must hold ->i_mutex to access */ loff_t mmu_private; /* physically allocated size */ int i_start; /* first cluster or 0 */ int i_logstart; /* logical first cluster */ int i_attrs; /* unused attribute bits */ loff_t i_pos; /* on-disk position of directory entry or 0 */ struct hlist_node i_fat_hash; /* hash by i_location */ struct hlist_node i_dir_hash; /* hash by i_logstart */ struct rw_semaphore truncate_lock; /* protect bmap against truncate */ struct timespec64 i_crtime; /* File creation (birth) time */ struct inode vfs_inode; }; struct fat_slot_info { loff_t i_pos; /* on-disk position of directory entry */ loff_t slot_off; /* offset for slot or de start */ int nr_slots; /* number of slots + 1(de) in filename */ struct msdos_dir_entry *de; struct buffer_head *bh; }; static inline struct msdos_sb_info *MSDOS_SB(struct super_block *sb) { return sb->s_fs_info; } /* * Functions that determine the variant of the FAT file system (i.e., * whether this is FAT12, FAT16 or FAT32. */ static inline bool is_fat12(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 12; } static inline bool is_fat16(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 16; } static inline bool is_fat32(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 32; } /* Maximum number of clusters */ static inline u32 max_fat(struct super_block *sb) { struct msdos_sb_info *sbi = MSDOS_SB(sb); return is_fat32(sbi) ? MAX_FAT32 : is_fat16(sbi) ? MAX_FAT16 : MAX_FAT12; } static inline struct msdos_inode_info *MSDOS_I(struct inode *inode) { return container_of(inode, struct msdos_inode_info, vfs_inode); } /* * If ->i_mode can't hold S_IWUGO (i.e. ATTR_RO), we use ->i_attrs to * save ATTR_RO instead of ->i_mode. * * If it's directory and !sbi->options.rodir, ATTR_RO isn't read-only * bit, it's just used as flag for app. */ static inline int fat_mode_can_hold_ro(struct inode *inode) { struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb); umode_t mask; if (S_ISDIR(inode->i_mode)) { if (!sbi->options.rodir) return 0; mask = ~sbi->options.fs_dmask; } else mask = ~sbi->options.fs_fmask; if (!(mask & S_IWUGO)) return 0; return 1; } /* Convert attribute bits and a mask to the UNIX mode. */ static inline umode_t fat_make_mode(struct msdos_sb_info *sbi, u8 attrs, umode_t mode) { if (attrs & ATTR_RO && !((attrs & ATTR_DIR) && !sbi->options.rodir)) mode &= ~S_IWUGO; if (attrs & ATTR_DIR) return (mode & ~sbi->options.fs_dmask) | S_IFDIR; else return (mode & ~sbi->options.fs_fmask) | S_IFREG; } /* Return the FAT attribute byte for this inode */ static inline u8 fat_make_attrs(struct inode *inode) { u8 attrs = MSDOS_I(inode)->i_attrs; if (S_ISDIR(inode->i_mode)) attrs |= ATTR_DIR; if (fat_mode_can_hold_ro(inode) && !(inode->i_mode & S_IWUGO)) attrs |= ATTR_RO; return attrs; } static inline void fat_save_attrs(struct inode *inode, u8 attrs) { if (fat_mode_can_hold_ro(inode)) MSDOS_I(inode)->i_attrs = attrs & ATTR_UNUSED; else MSDOS_I(inode)->i_attrs = attrs & (ATTR_UNUSED | ATTR_RO); } static inline unsigned char fat_checksum(const __u8 *name) { unsigned char s = name[0]; s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2]; s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4]; s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6]; s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8]; s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10]; return s; } static inline sector_t fat_clus_to_blknr(struct msdos_sb_info *sbi, int clus) { return ((sector_t)clus - FAT_START_ENT) * sbi->sec_per_clus + sbi->data_start; } static inline void fat_get_blknr_offset(struct msdos_sb_info *sbi, loff_t i_pos, sector_t *blknr, int *offset) { *blknr = i_pos >> sbi->dir_per_block_bits; *offset = i_pos & (sbi->dir_per_block - 1); } static inline loff_t fat_i_pos_read(struct msdos_sb_info *sbi, struct inode *inode) { loff_t i_pos; #if BITS_PER_LONG == 32 spin_lock(&sbi->inode_hash_lock); #endif i_pos = MSDOS_I(inode)->i_pos; #if BITS_PER_LONG == 32 spin_unlock(&sbi->inode_hash_lock); #endif return i_pos; } static inline void fat16_towchar(wchar_t *dst, const __u8 *src, size_t len) { #ifdef __BIG_ENDIAN while (len--) { *dst++ = src[0] | (src[1] << 8); src += 2; } #else memcpy(dst, src, len * 2); #endif } static inline int fat_get_start(const struct msdos_sb_info *sbi, const struct msdos_dir_entry *de) { int cluster = le16_to_cpu(de->start); if (is_fat32(sbi)) cluster |= (le16_to_cpu(de->starthi) << 16); return cluster; } static inline void fat_set_start(struct msdos_dir_entry *de, int cluster) { de->start = cpu_to_le16(cluster); de->starthi = cpu_to_le16(cluster >> 16); } static inline void fatwchar_to16(__u8 *dst, const wchar_t *src, size_t len) { #ifdef __BIG_ENDIAN while (len--) { dst[0] = *src & 0x00FF; dst[1] = (*src & 0xFF00) >> 8; dst += 2; src++; } #else memcpy(dst, src, len * 2); #endif } /* fat/cache.c */ extern void fat_cache_inval_inode(struct inode *inode); extern int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus); extern int fat_get_mapped_cluster(struct inode *inode, sector_t sector, sector_t last_block, unsigned long *mapped_blocks, sector_t *bmap); extern int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, unsigned long *mapped_blocks, int create, bool from_bmap); /* fat/dir.c */ extern const struct file_operations fat_dir_operations; extern int fat_search_long(struct inode *inode, const unsigned char *name, int name_len, struct fat_slot_info *sinfo); extern int fat_dir_empty(struct inode *dir); extern int fat_subdirs(struct inode *dir); extern int fat_scan(struct inode *dir, const unsigned char *name, struct fat_slot_info *sinfo); extern int fat_scan_logstart(struct inode *dir, int i_logstart, struct fat_slot_info *sinfo); extern int fat_get_dotdot_entry(struct inode *dir, struct buffer_head **bh, struct msdos_dir_entry **de); extern int fat_alloc_new_dir(struct inode *dir, struct timespec64 *ts); extern int fat_add_entries(struct inode *dir, void *slots, int nr_slots, struct fat_slot_info *sinfo); extern int fat_remove_entries(struct inode *dir, struct fat_slot_info *sinfo); /* fat/fatent.c */ struct fat_entry { int entry; union { u8 *ent12_p[2]; __le16 *ent16_p; __le32 *ent32_p; } u; int nr_bhs; struct buffer_head *bhs[2]; struct inode *fat_inode; }; static inline void fatent_init(struct fat_entry *fatent) { fatent->nr_bhs = 0; fatent->entry = 0; fatent->u.ent32_p = NULL; fatent->bhs[0] = fatent->bhs[1] = NULL; fatent->fat_inode = NULL; } static inline void fatent_set_entry(struct fat_entry *fatent, int entry) { fatent->entry = entry; fatent->u.ent32_p = NULL; } static inline void fatent_brelse(struct fat_entry *fatent) { int i; fatent->u.ent32_p = NULL; for (i = 0; i < fatent->nr_bhs; i++) brelse(fatent->bhs[i]); fatent->nr_bhs = 0; fatent->bhs[0] = fatent->bhs[1] = NULL; fatent->fat_inode = NULL; } static inline bool fat_valid_entry(struct msdos_sb_info *sbi, int entry) { return FAT_START_ENT <= entry && entry < sbi->max_cluster; } extern void fat_ent_access_init(struct super_block *sb); extern int fat_ent_read(struct inode *inode, struct fat_entry *fatent, int entry); extern int fat_ent_write(struct inode *inode, struct fat_entry *fatent, int new, int wait); extern int fat_alloc_clusters(struct inode *inode, int *cluster, int nr_cluster); extern int fat_free_clusters(struct inode *inode, int cluster); extern int fat_count_free_clusters(struct super_block *sb); extern int fat_trim_fs(struct inode *inode, struct fstrim_range *range); /* fat/file.c */ extern long fat_generic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); extern const struct file_operations fat_file_operations; extern const struct inode_operations fat_file_inode_operations; extern int fat_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr); extern void fat_truncate_blocks(struct inode *inode, loff_t offset); extern int fat_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); extern int fat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync); /* fat/inode.c */ extern int fat_block_truncate_page(struct inode *inode, loff_t from); extern void fat_attach(struct inode *inode, loff_t i_pos); extern void fat_detach(struct inode *inode); extern struct inode *fat_iget(struct super_block *sb, loff_t i_pos); extern struct inode *fat_build_inode(struct super_block *sb, struct msdos_dir_entry *de, loff_t i_pos); extern int fat_sync_inode(struct inode *inode); extern int fat_fill_super(struct super_block *sb, struct fs_context *fc, void (*setup)(struct super_block *)); extern int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de); extern int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2); extern const struct fs_parameter_spec fat_param_spec[]; int fat_init_fs_context(struct fs_context *fc, bool is_vfat); void fat_free_fc(struct fs_context *fc); int fat_parse_param(struct fs_context *fc, struct fs_parameter *param, bool is_vfat); int fat_reconfigure(struct fs_context *fc); static inline unsigned long fat_dir_hash(int logstart) { return hash_32(logstart, FAT_HASH_BITS); } extern int fat_add_cluster(struct inode *inode); /* fat/misc.c */ extern __printf(3, 4) __cold void __fat_fs_error(struct super_block *sb, int report, const char *fmt, ...); #define fat_fs_error(sb, fmt, args...) \ __fat_fs_error(sb, 1, fmt , ## args) #define fat_fs_error_ratelimit(sb, fmt, args...) \ __fat_fs_error(sb, __ratelimit(&MSDOS_SB(sb)->ratelimit), fmt , ## args) #define FAT_PRINTK_PREFIX "%sFAT-fs (%s): " #define fat_msg(sb, level, fmt, args...) \ do { \ printk_index_subsys_emit(FAT_PRINTK_PREFIX, level, fmt, ##args);\ _fat_msg(sb, level, fmt, ##args); \ } while (0) __printf(3, 4) __cold void _fat_msg(struct super_block *sb, const char *level, const char *fmt, ...); #define fat_msg_ratelimit(sb, level, fmt, args...) \ do { \ if (__ratelimit(&MSDOS_SB(sb)->ratelimit)) \ fat_msg(sb, level, fmt, ## args); \ } while (0) extern int fat_clusters_flush(struct super_block *sb); extern int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster); extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 __time, __le16 __date, u8 time_cs); extern void fat_time_unix2fat(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 *time, __le16 *date, u8 *time_cs); extern struct timespec64 fat_truncate_atime(const struct msdos_sb_info *sbi, const struct timespec64 *ts); extern struct timespec64 fat_truncate_mtime(const struct msdos_sb_info *sbi, const struct timespec64 *ts); extern int fat_truncate_time(struct inode *inode, struct timespec64 *now, int flags); extern int fat_update_time(struct inode *inode, int flags); extern int fat_sync_bhs(struct buffer_head **bhs, int nr_bhs); int fat_cache_init(void); void fat_cache_destroy(void); /* fat/nfs.c */ extern const struct export_operations fat_export_ops; extern const struct export_operations fat_export_ops_nostale; /* helper for printk */ typedef unsigned long long llu; #endif /* !_FAT_H */ |
| 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | // SPDX-License-Identifier: GPL-2.0 #ifndef __KVM_X86_MMU_TDP_MMU_H #define __KVM_X86_MMU_TDP_MMU_H #include <linux/kvm_host.h> #include "spte.h" void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) { return refcount_inc_not_zero(&root->tdp_mmu_root_count); } void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root); bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush); bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, const struct kvm_memory_slot *slot, int min_level); bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, const struct kvm_memory_slot *slot); void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, unsigned long mask, bool wrprot); void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *slot); bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, int min_level); void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t start, gfn_t end, int target_level, bool shared); static inline void kvm_tdp_mmu_walk_lockless_begin(void) { rcu_read_lock(); } static inline void kvm_tdp_mmu_walk_lockless_end(void) { rcu_read_unlock(); } int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level); u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *spte); #ifdef CONFIG_X86_64 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; } #else static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } #endif #endif /* __KVM_X86_MMU_TDP_MMU_H */ |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 23 23 1 1 1 1 1 1 1 1 1 1 2 2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 | // SPDX-License-Identifier: GPL-2.0 /* * net/tipc/crypto.c: TIPC crypto for key handling & packet en/decryption * * Copyright (c) 2019, Ericsson AB * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the names of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * Alternatively, this software may be distributed under the terms of the * GNU General Public License ("GPL") version 2 as published by the Free * Software Foundation. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include <crypto/aead.h> #include <crypto/aes.h> #include <crypto/rng.h> #include "crypto.h" #include "msg.h" #include "bcast.h" #define TIPC_TX_GRACE_PERIOD msecs_to_jiffies(5000) /* 5s */ #define TIPC_TX_LASTING_TIME msecs_to_jiffies(10000) /* 10s */ #define TIPC_RX_ACTIVE_LIM msecs_to_jiffies(3000) /* 3s */ #define TIPC_RX_PASSIVE_LIM msecs_to_jiffies(15000) /* 15s */ #define TIPC_MAX_TFMS_DEF 10 #define TIPC_MAX_TFMS_LIM 1000 #define TIPC_REKEYING_INTV_DEF (60 * 24) /* default: 1 day */ /* * TIPC Key ids */ enum { KEY_MASTER = 0, KEY_MIN = KEY_MASTER, KEY_1 = 1, KEY_2, KEY_3, KEY_MAX = KEY_3, }; /* * TIPC Crypto statistics */ enum { STAT_OK, STAT_NOK, STAT_ASYNC, STAT_ASYNC_OK, STAT_ASYNC_NOK, STAT_BADKEYS, /* tx only */ STAT_BADMSGS = STAT_BADKEYS, /* rx only */ STAT_NOKEYS, STAT_SWITCHES, MAX_STATS, }; /* TIPC crypto statistics' header */ static const char *hstats[MAX_STATS] = {"ok", "nok", "async", "async_ok", "async_nok", "badmsgs", "nokeys", "switches"}; /* Max TFMs number per key */ int sysctl_tipc_max_tfms __read_mostly = TIPC_MAX_TFMS_DEF; /* Key exchange switch, default: on */ int sysctl_tipc_key_exchange_enabled __read_mostly = 1; /* * struct tipc_key - TIPC keys' status indicator * * 7 6 5 4 3 2 1 0 * +-----+-----+-----+-----+-----+-----+-----+-----+ * key: | (reserved)|passive idx| active idx|pending idx| * +-----+-----+-----+-----+-----+-----+-----+-----+ */ struct tipc_key { #define KEY_BITS (2) #define KEY_MASK ((1 << KEY_BITS) - 1) union { struct { #if defined(__LITTLE_ENDIAN_BITFIELD) u8 pending:2, active:2, passive:2, /* rx only */ reserved:2; #elif defined(__BIG_ENDIAN_BITFIELD) u8 reserved:2, passive:2, /* rx only */ active:2, pending:2; #else #error "Please fix <asm/byteorder.h>" #endif } __packed; u8 keys; }; }; /** * struct tipc_tfm - TIPC TFM structure to form a list of TFMs * @tfm: cipher handle/key * @list: linked list of TFMs */ struct tipc_tfm { struct crypto_aead *tfm; struct list_head list; }; /** * struct tipc_aead - TIPC AEAD key structure * @tfm_entry: per-cpu pointer to one entry in TFM list * @crypto: TIPC crypto owns this key * @cloned: reference to the source key in case cloning * @users: the number of the key users (TX/RX) * @salt: the key's SALT value * @authsize: authentication tag size (max = 16) * @mode: crypto mode is applied to the key * @hint: a hint for user key * @rcu: struct rcu_head * @key: the aead key * @gen: the key's generation * @seqno: the key seqno (cluster scope) * @refcnt: the key reference counter */ struct tipc_aead { #define TIPC_AEAD_HINT_LEN (5) struct tipc_tfm * __percpu *tfm_entry; struct tipc_crypto *crypto; struct tipc_aead *cloned; atomic_t users; u32 salt; u8 authsize; u8 mode; char hint[2 * TIPC_AEAD_HINT_LEN + 1]; struct rcu_head rcu; struct tipc_aead_key *key; u16 gen; atomic64_t seqno ____cacheline_aligned; refcount_t refcnt ____cacheline_aligned; } ____cacheline_aligned; /** * struct tipc_crypto_stats - TIPC Crypto statistics * @stat: array of crypto statistics */ struct tipc_crypto_stats { unsigned int stat[MAX_STATS]; }; /** * struct tipc_crypto - TIPC TX/RX crypto structure * @net: struct net * @node: TIPC node (RX) * @aead: array of pointers to AEAD keys for encryption/decryption * @peer_rx_active: replicated peer RX active key index * @key_gen: TX/RX key generation * @key: the key states * @skey_mode: session key's mode * @skey: received session key * @wq: common workqueue on TX crypto * @work: delayed work sched for TX/RX * @key_distr: key distributing state * @rekeying_intv: rekeying interval (in minutes) * @stats: the crypto statistics * @name: the crypto name * @sndnxt: the per-peer sndnxt (TX) * @timer1: general timer 1 (jiffies) * @timer2: general timer 2 (jiffies) * @working: the crypto is working or not * @key_master: flag indicates if master key exists * @legacy_user: flag indicates if a peer joins w/o master key (for bwd comp.) * @nokey: no key indication * @flags: combined flags field * @lock: tipc_key lock */ struct tipc_crypto { struct net *net; struct tipc_node *node; struct tipc_aead __rcu *aead[KEY_MAX + 1]; atomic_t peer_rx_active; u16 key_gen; struct tipc_key key; u8 skey_mode; struct tipc_aead_key *skey; struct workqueue_struct *wq; struct delayed_work work; #define KEY_DISTR_SCHED 1 #define KEY_DISTR_COMPL 2 atomic_t key_distr; u32 rekeying_intv; struct tipc_crypto_stats __percpu *stats; char name[48]; atomic64_t sndnxt ____cacheline_aligned; unsigned long timer1; unsigned long timer2; union { struct { u8 working:1; u8 key_master:1; u8 legacy_user:1; u8 nokey: 1; }; u8 flags; }; spinlock_t lock; /* crypto lock */ } ____cacheline_aligned; /* struct tipc_crypto_tx_ctx - TX context for callbacks */ struct tipc_crypto_tx_ctx { struct tipc_aead *aead; struct tipc_bearer *bearer; struct tipc_media_addr dst; }; /* struct tipc_crypto_rx_ctx - RX context for callbacks */ struct tipc_crypto_rx_ctx { struct tipc_aead *aead; struct tipc_bearer *bearer; }; static struct tipc_aead *tipc_aead_get(struct tipc_aead __rcu *aead); static inline void tipc_aead_put(struct tipc_aead *aead); static void tipc_aead_free(struct rcu_head *rp); static int tipc_aead_users(struct tipc_aead __rcu *aead); static void tipc_aead_users_inc(struct tipc_aead __rcu *aead, int lim); static void tipc_aead_users_dec(struct tipc_aead __rcu *aead, int lim); static void tipc_aead_users_set(struct tipc_aead __rcu *aead, int val); static struct crypto_aead *tipc_aead_tfm_next(struct tipc_aead *aead); static int tipc_aead_init(struct tipc_aead **aead, struct tipc_aead_key *ukey, u8 mode); static int tipc_aead_clone(struct tipc_aead **dst, struct tipc_aead *src); static void *tipc_aead_mem_alloc(struct crypto_aead *tfm, unsigned int crypto_ctx_size, u8 **iv, struct aead_request **req, struct scatterlist **sg, int nsg); static int tipc_aead_encrypt(struct tipc_aead *aead, struct sk_buff *skb, struct tipc_bearer *b, struct tipc_media_addr *dst, struct tipc_node *__dnode); static void tipc_aead_encrypt_done(void *data, int err); static int tipc_aead_decrypt(struct net *net, struct tipc_aead *aead, struct sk_buff *skb, struct tipc_bearer *b); static void tipc_aead_decrypt_done(void *data, int err); static inline int tipc_ehdr_size(struct tipc_ehdr *ehdr); static int tipc_ehdr_build(struct net *net, struct tipc_aead *aead, u8 tx_key, struct sk_buff *skb, struct tipc_crypto *__rx); static inline void tipc_crypto_key_set_state(struct tipc_crypto *c, u8 new_passive, u8 new_active, u8 new_pending); static int tipc_crypto_key_attach(struct tipc_crypto *c, struct tipc_aead *aead, u8 pos, bool master_key); static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending); static struct tipc_aead *tipc_crypto_key_pick_tx(struct tipc_crypto *tx, struct tipc_crypto *rx, struct sk_buff *skb, u8 tx_key); static void tipc_crypto_key_synch(struct tipc_crypto *rx, struct sk_buff *skb); static int tipc_crypto_key_revoke(struct net *net, u8 tx_key); static inline void tipc_crypto_clone_msg(struct net *net, struct sk_buff *_skb, struct tipc_bearer *b, struct tipc_media_addr *dst, struct tipc_node *__dnode, u8 type); static void tipc_crypto_rcv_complete(struct net *net, struct tipc_aead *aead, struct tipc_bearer *b, struct sk_buff **skb, int err); static void tipc_crypto_do_cmd(struct net *net, int cmd); static char *tipc_crypto_key_dump(struct tipc_crypto *c, char *buf); static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new, char *buf); static int tipc_crypto_key_xmit(struct net *net, struct tipc_aead_key *skey, u16 gen, u8 mode, u32 dnode); static bool tipc_crypto_key_rcv(struct tipc_crypto *rx, struct tipc_msg *hdr); static void tipc_crypto_work_tx(struct work_struct *work); static void tipc_crypto_work_rx(struct work_struct *work); static int tipc_aead_key_generate(struct tipc_aead_key *skey); #define is_tx(crypto) (!(crypto)->node) #define is_rx(crypto) (!is_tx(crypto)) #define key_next(cur) ((cur) % KEY_MAX + 1) #define tipc_aead_rcu_ptr(rcu_ptr, lock) \ rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock) \ do { \ struct tipc_aead *__tmp = rcu_dereference_protected((rcu_ptr), \ lockdep_is_held(lock)); \ rcu_assign_pointer((rcu_ptr), (ptr)); \ tipc_aead_put(__tmp); \ } while (0) #define tipc_crypto_key_detach(rcu_ptr, lock) \ tipc_aead_rcu_replace((rcu_ptr), NULL, lock) /** * tipc_aead_key_validate - Validate a AEAD user key * @ukey: pointer to user key data * @info: netlink info pointer */ int tipc_aead_key_validate(struct tipc_aead_key *ukey, struct genl_info *info) { int keylen; /* Check if algorithm exists */ if (unlikely(!crypto_has_alg(ukey->alg_name, 0, 0))) { GENL_SET_ERR_MSG(info, "unable to load the algorithm (module existed?)"); return -ENODEV; } /* Currently, we only support the "gcm(aes)" cipher algorithm */ if (strcmp(ukey->alg_name, "gcm(aes)")) { GENL_SET_ERR_MSG(info, "not supported yet the algorithm"); return -ENOTSUPP; } /* Check if key size is correct */ keylen = ukey->keylen - TIPC_AES_GCM_SALT_SIZE; if (unlikely(keylen != TIPC_AES_GCM_KEY_SIZE_128 && keylen != TIPC_AES_GCM_KEY_SIZE_192 && keylen != TIPC_AES_GCM_KEY_SIZE_256)) { GENL_SET_ERR_MSG(info, "incorrect key length (20, 28 or 36 octets?)"); return -EKEYREJECTED; } return 0; } /** * tipc_aead_key_generate - Generate new session key * @skey: input/output key with new content * * Return: 0 in case of success, otherwise < 0 */ static int tipc_aead_key_generate(struct tipc_aead_key *skey) { int rc = 0; /* Fill the key's content with a random value via RNG cipher */ rc = crypto_get_default_rng(); if (likely(!rc)) { rc = crypto_rng_get_bytes(crypto_default_rng, skey->key, skey->keylen); crypto_put_default_rng(); } return rc; } static struct tipc_aead *tipc_aead_get(struct tipc_aead __rcu *aead) { struct tipc_aead *tmp; rcu_read_lock(); tmp = rcu_dereference(aead); if (unlikely(!tmp || !refcount_inc_not_zero(&tmp->refcnt))) tmp = NULL; rcu_read_unlock(); return tmp; } static inline void tipc_aead_put(struct tipc_aead *aead) { if (aead && refcount_dec_and_test(&aead->refcnt)) call_rcu(&aead->rcu, tipc_aead_free); } /** * tipc_aead_free - Release AEAD key incl. all the TFMs in the list * @rp: rcu head pointer */ static void tipc_aead_free(struct rcu_head *rp) { struct tipc_aead *aead = container_of(rp, struct tipc_aead, rcu); struct tipc_tfm *tfm_entry, *head, *tmp; if (aead->cloned) { tipc_aead_put(aead->cloned); } else { head = *get_cpu_ptr(aead->tfm_entry); put_cpu_ptr(aead->tfm_entry); list_for_each_entry_safe(tfm_entry, tmp, &head->list, list) { crypto_free_aead(tfm_entry->tfm); list_del(&tfm_entry->list); kfree(tfm_entry); } /* Free the head */ crypto_free_aead(head->tfm); list_del(&head->list); kfree(head); } free_percpu(aead->tfm_entry); kfree_sensitive(aead->key); kfree(aead); } static int tipc_aead_users(struct tipc_aead __rcu *aead) { struct tipc_aead *tmp; int users = 0; rcu_read_lock(); tmp = rcu_dereference(aead); if (tmp) users = atomic_read(&tmp->users); rcu_read_unlock(); return users; } static void tipc_aead_users_inc(struct tipc_aead __rcu *aead, int lim) { struct tipc_aead *tmp; rcu_read_lock(); tmp = rcu_dereference(aead); if (tmp) atomic_add_unless(&tmp->users, 1, lim); rcu_read_unlock(); } static void tipc_aead_users_dec(struct tipc_aead __rcu *aead, int lim) { struct tipc_aead *tmp; rcu_read_lock(); tmp = rcu_dereference(aead); if (tmp) atomic_add_unless(&rcu_dereference(aead)->users, -1, lim); rcu_read_unlock(); } static void tipc_aead_users_set(struct tipc_aead __rcu *aead, int val) { struct tipc_aead *tmp; int cur; rcu_read_lock(); tmp = rcu_dereference(aead); if (tmp) { do { cur = atomic_read(&tmp->users); if (cur == val) break; } while (atomic_cmpxchg(&tmp->users, cur, val) != cur); } rcu_read_unlock(); } /** * tipc_aead_tfm_next - Move TFM entry to the next one in list and return it * @aead: the AEAD key pointer */ static struct crypto_aead *tipc_aead_tfm_next(struct tipc_aead *aead) { struct tipc_tfm **tfm_entry; struct crypto_aead *tfm; tfm_entry = get_cpu_ptr(aead->tfm_entry); *tfm_entry = list_next_entry(*tfm_entry, list); tfm = (*tfm_entry)->tfm; put_cpu_ptr(tfm_entry); return tfm; } /** * tipc_aead_init - Initiate TIPC AEAD * @aead: returned new TIPC AEAD key handle pointer * @ukey: pointer to user key data * @mode: the key mode * * Allocate a (list of) new cipher transformation (TFM) with the specific user * key data if valid. The number of the allocated TFMs can be set via the sysfs * "net/tipc/max_tfms" first. * Also, all the other AEAD data are also initialized. * * Return: 0 if the initiation is successful, otherwise: < 0 */ static int tipc_aead_init(struct tipc_aead **aead, struct tipc_aead_key *ukey, u8 mode) { struct tipc_tfm *tfm_entry, *head; struct crypto_aead *tfm; struct tipc_aead *tmp; int keylen, err, cpu; int tfm_cnt = 0; if (unlikely(*aead)) return -EEXIST; /* Allocate a new AEAD */ tmp = kzalloc(sizeof(*tmp), GFP_ATOMIC); if (unlikely(!tmp)) return -ENOMEM; /* The key consists of two parts: [AES-KEY][SALT] */ keylen = ukey->keylen - TIPC_AES_GCM_SALT_SIZE; /* Allocate per-cpu TFM entry pointer */ tmp->tfm_entry = alloc_percpu(struct tipc_tfm *); if (!tmp->tfm_entry) { kfree_sensitive(tmp); return -ENOMEM; } /* Make a list of TFMs with the user key data */ do { tfm = crypto_alloc_aead(ukey->alg_name, 0, 0); if (IS_ERR(tfm)) { err = PTR_ERR(tfm); break; } if (unlikely(!tfm_cnt && crypto_aead_ivsize(tfm) != TIPC_AES_GCM_IV_SIZE)) { crypto_free_aead(tfm); err = -ENOTSUPP; break; } err = crypto_aead_setauthsize(tfm, TIPC_AES_GCM_TAG_SIZE); err |= crypto_aead_setkey(tfm, ukey->key, keylen); if (unlikely(err)) { crypto_free_aead(tfm); break; } tfm_entry = kmalloc(sizeof(*tfm_entry), GFP_KERNEL); if (unlikely(!tfm_entry)) { crypto_free_aead(tfm); err = -ENOMEM; break; } INIT_LIST_HEAD(&tfm_entry->list); tfm_entry->tfm = tfm; /* First entry? */ if (!tfm_cnt) { head = tfm_entry; for_each_possible_cpu(cpu) { *per_cpu_ptr(tmp->tfm_entry, cpu) = head; } } else { list_add_tail(&tfm_entry->list, &head->list); } } while (++tfm_cnt < sysctl_tipc_max_tfms); /* Not any TFM is allocated? */ if (!tfm_cnt) { free_percpu(tmp->tfm_entry); kfree_sensitive(tmp); return err; } /* Form a hex string of some last bytes as the key's hint */ bin2hex(tmp->hint, ukey->key + keylen - TIPC_AEAD_HINT_LEN, TIPC_AEAD_HINT_LEN); /* Initialize the other data */ tmp->mode = mode; tmp->cloned = NULL; tmp->authsize = TIPC_AES_GCM_TAG_SIZE; tmp->key = kmemdup(ukey, tipc_aead_key_size(ukey), GFP_KERNEL); if (!tmp->key) { tipc_aead_free(&tmp->rcu); return -ENOMEM; } memcpy(&tmp->salt, ukey->key + keylen, TIPC_AES_GCM_SALT_SIZE); atomic_set(&tmp->users, 0); atomic64_set(&tmp->seqno, 0); refcount_set(&tmp->refcnt, 1); *aead = tmp; return 0; } /** * tipc_aead_clone - Clone a TIPC AEAD key * @dst: dest key for the cloning * @src: source key to clone from * * Make a "copy" of the source AEAD key data to the dest, the TFMs list is * common for the keys. * A reference to the source is hold in the "cloned" pointer for the later * freeing purposes. * * Note: this must be done in cluster-key mode only! * Return: 0 in case of success, otherwise < 0 */ static int tipc_aead_clone(struct tipc_aead **dst, struct tipc_aead *src) { struct tipc_aead *aead; int cpu; if (!src) return -ENOKEY; if (src->mode != CLUSTER_KEY) return -EINVAL; if (unlikely(*dst)) return -EEXIST; aead = kzalloc(sizeof(*aead), GFP_ATOMIC); if (unlikely(!aead)) return -ENOMEM; aead->tfm_entry = alloc_percpu_gfp(struct tipc_tfm *, GFP_ATOMIC); if (unlikely(!aead->tfm_entry)) { kfree_sensitive(aead); return -ENOMEM; } for_each_possible_cpu(cpu) { *per_cpu_ptr(aead->tfm_entry, cpu) = *per_cpu_ptr(src->tfm_entry, cpu); } memcpy(aead->hint, src->hint, sizeof(src->hint)); aead->mode = src->mode; aead->salt = src->salt; aead->authsize = src->authsize; atomic_set(&aead->users, 0); atomic64_set(&aead->seqno, 0); refcount_set(&aead->refcnt, 1); WARN_ON(!refcount_inc_not_zero(&src->refcnt)); aead->cloned = src; *dst = aead; return 0; } /** * tipc_aead_mem_alloc - Allocate memory for AEAD request operations * @tfm: cipher handle to be registered with the request * @crypto_ctx_size: size of crypto context for callback * @iv: returned pointer to IV data * @req: returned pointer to AEAD request data * @sg: returned pointer to SG lists * @nsg: number of SG lists to be allocated * * Allocate memory to store the crypto context data, AEAD request, IV and SG * lists, the memory layout is as follows: * crypto_ctx || iv || aead_req || sg[] * * Return: the pointer to the memory areas in case of success, otherwise NULL */ static void *tipc_aead_mem_alloc(struct crypto_aead *tfm, unsigned int crypto_ctx_size, u8 **iv, struct aead_request **req, struct scatterlist **sg, int nsg) { unsigned int iv_size, req_size; unsigned int len; u8 *mem; iv_size = crypto_aead_ivsize(tfm); req_size = sizeof(**req) + crypto_aead_reqsize(tfm); len = crypto_ctx_size; len += iv_size; len += crypto_aead_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); len = ALIGN(len, crypto_tfm_ctx_alignment()); len += req_size; len = ALIGN(len, __alignof__(struct scatterlist)); len += nsg * sizeof(**sg); mem = kmalloc(len, GFP_ATOMIC); if (!mem) return NULL; *iv = (u8 *)PTR_ALIGN(mem + crypto_ctx_size, crypto_aead_alignmask(tfm) + 1); *req = (struct aead_request *)PTR_ALIGN(*iv + iv_size, crypto_tfm_ctx_alignment()); *sg = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, __alignof__(struct scatterlist)); return (void *)mem; } /** * tipc_aead_encrypt - Encrypt a message * @aead: TIPC AEAD key for the message encryption * @skb: the input/output skb * @b: TIPC bearer where the message will be delivered after the encryption * @dst: the destination media address * @__dnode: TIPC dest node if "known" * * Return: * * 0 : if the encryption has completed * * -EINPROGRESS/-EBUSY : if a callback will be performed * * < 0 : the encryption has failed */ static int tipc_aead_encrypt(struct tipc_aead *aead, struct sk_buff *skb, struct tipc_bearer *b, struct tipc_media_addr *dst, struct tipc_node *__dnode) { struct crypto_aead *tfm = tipc_aead_tfm_next(aead); struct tipc_crypto_tx_ctx *tx_ctx; struct aead_request *req; struct sk_buff *trailer; struct scatterlist *sg; struct tipc_ehdr *ehdr; int ehsz, len, tailen, nsg, rc; void *ctx; u32 salt; u8 *iv; /* Make sure message len at least 4-byte aligned */ len = ALIGN(skb->len, 4); tailen = len - skb->len + aead->authsize; /* Expand skb tail for authentication tag: * As for simplicity, we'd have made sure skb having enough tailroom * for authentication tag @skb allocation. Even when skb is nonlinear * but there is no frag_list, it should be still fine! * Otherwise, we must cow it to be a writable buffer with the tailroom. */ SKB_LINEAR_ASSERT(skb); if (tailen > skb_tailroom(skb)) { pr_debug("TX(): skb tailroom is not enough: %d, requires: %d\n", skb_tailroom(skb), tailen); } nsg = skb_cow_data(skb, tailen, &trailer); if (unlikely(nsg < 0)) { pr_err("TX: skb_cow_data() returned %d\n", nsg); return nsg; } pskb_put(skb, trailer, tailen); /* Allocate memory for the AEAD operation */ ctx = tipc_aead_mem_alloc(tfm, sizeof(*tx_ctx), &iv, &req, &sg, nsg); if (unlikely(!ctx)) return -ENOMEM; TIPC_SKB_CB(skb)->crypto_ctx = ctx; /* Map skb to the sg lists */ sg_init_table(sg, nsg); rc = skb_to_sgvec(skb, sg, 0, skb->len); if (unlikely(rc < 0)) { pr_err("TX: skb_to_sgvec() returned %d, nsg %d!\n", rc, nsg); goto exit; } /* Prepare IV: [SALT (4 octets)][SEQNO (8 octets)] * In case we're in cluster-key mode, SALT is varied by xor-ing with * the source address (or w0 of id), otherwise with the dest address * if dest is known. */ ehdr = (struct tipc_ehdr *)skb->data; salt = aead->salt; if (aead->mode == CLUSTER_KEY) salt ^= __be32_to_cpu(ehdr->addr); else if (__dnode) salt ^= tipc_node_get_addr(__dnode); memcpy(iv, &salt, 4); memcpy(iv + 4, (u8 *)&ehdr->seqno, 8); /* Prepare request */ ehsz = tipc_ehdr_size(ehdr); aead_request_set_tfm(req, tfm); aead_request_set_ad(req, ehsz); aead_request_set_crypt(req, sg, sg, len - ehsz, iv); /* Set callback function & data */ aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, tipc_aead_encrypt_done, skb); tx_ctx = (struct tipc_crypto_tx_ctx *)ctx; tx_ctx->aead = aead; tx_ctx->bearer = b; memcpy(&tx_ctx->dst, dst, sizeof(*dst)); /* Hold bearer */ if (unlikely(!tipc_bearer_hold(b))) { rc = -ENODEV; goto exit; } /* Now, do encrypt */ rc = crypto_aead_encrypt(req); if (rc == -EINPROGRESS || rc == -EBUSY) return rc; tipc_bearer_put(b); exit: kfree(ctx); TIPC_SKB_CB(skb)->crypto_ctx = NULL; return rc; } static void tipc_aead_encrypt_done(void *data, int err) { struct sk_buff *skb = data; struct tipc_crypto_tx_ctx *tx_ctx = TIPC_SKB_CB(skb)->crypto_ctx; struct tipc_bearer *b = tx_ctx->bearer; struct tipc_aead *aead = tx_ctx->aead; struct tipc_crypto *tx = aead->crypto; struct net *net = tx->net; switch (err) { case 0: this_cpu_inc(tx->stats->stat[STAT_ASYNC_OK]); rcu_read_lock(); if (likely(test_bit(0, &b->up))) b->media->send_msg(net, skb, b, &tx_ctx->dst); else kfree_skb(skb); rcu_read_unlock(); break; case -EINPROGRESS: return; default: this_cpu_inc(tx->stats->stat[STAT_ASYNC_NOK]); kfree_skb(skb); break; } kfree(tx_ctx); tipc_bearer_put(b); tipc_aead_put(aead); } /** * tipc_aead_decrypt - Decrypt an encrypted message * @net: struct net * @aead: TIPC AEAD for the message decryption * @skb: the input/output skb * @b: TIPC bearer where the message has been received * * Return: * * 0 : if the decryption has completed * * -EINPROGRESS/-EBUSY : if a callback will be performed * * < 0 : the decryption has failed */ static int tipc_aead_decrypt(struct net *net, struct tipc_aead *aead, struct sk_buff *skb, struct tipc_bearer *b) { struct tipc_crypto_rx_ctx *rx_ctx; struct aead_request *req; struct crypto_aead *tfm; struct sk_buff *unused; struct scatterlist *sg; struct tipc_ehdr *ehdr; int ehsz, nsg, rc; void *ctx; u32 salt; u8 *iv; if (unlikely(!aead)) return -ENOKEY; nsg = skb_cow_data(skb, 0, &unused); if (unlikely(nsg < 0)) { pr_err("RX: skb_cow_data() returned %d\n", nsg); return nsg; } /* Allocate memory for the AEAD operation */ tfm = tipc_aead_tfm_next(aead); ctx = tipc_aead_mem_alloc(tfm, sizeof(*rx_ctx), &iv, &req, &sg, nsg); if (unlikely(!ctx)) return -ENOMEM; TIPC_SKB_CB(skb)->crypto_ctx = ctx; /* Map skb to the sg lists */ sg_init_table(sg, nsg); rc = skb_to_sgvec(skb, sg, 0, skb->len); if (unlikely(rc < 0)) { pr_err("RX: skb_to_sgvec() returned %d, nsg %d\n", rc, nsg); goto exit; } /* Reconstruct IV: */ ehdr = (struct tipc_ehdr *)skb->data; salt = aead->salt; if (aead->mode == CLUSTER_KEY) salt ^= __be32_to_cpu(ehdr->addr); else if (ehdr->destined) salt ^= tipc_own_addr(net); memcpy(iv, &salt, 4); memcpy(iv + 4, (u8 *)&ehdr->seqno, 8); /* Prepare request */ ehsz = tipc_ehdr_size(ehdr); aead_request_set_tfm(req, tfm); aead_request_set_ad(req, ehsz); aead_request_set_crypt(req, sg, sg, skb->len - ehsz, iv); /* Set callback function & data */ aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, tipc_aead_decrypt_done, skb); rx_ctx = (struct tipc_crypto_rx_ctx *)ctx; rx_ctx->aead = aead; rx_ctx->bearer = b; /* Hold bearer */ if (unlikely(!tipc_bearer_hold(b))) { rc = -ENODEV; goto exit; } /* Now, do decrypt */ rc = crypto_aead_decrypt(req); if (rc == -EINPROGRESS || rc == -EBUSY) return rc; tipc_bearer_put(b); exit: kfree(ctx); TIPC_SKB_CB(skb)->crypto_ctx = NULL; return rc; } static void tipc_aead_decrypt_done(void *data, int err) { struct sk_buff *skb = data; struct tipc_crypto_rx_ctx *rx_ctx = TIPC_SKB_CB(skb)->crypto_ctx; struct tipc_bearer *b = rx_ctx->bearer; struct tipc_aead *aead = rx_ctx->aead; struct tipc_crypto_stats __percpu *stats = aead->crypto->stats; struct net *net = aead->crypto->net; switch (err) { case 0: this_cpu_inc(stats->stat[STAT_ASYNC_OK]); break; case -EINPROGRESS: return; default: this_cpu_inc(stats->stat[STAT_ASYNC_NOK]); break; } kfree(rx_ctx); tipc_crypto_rcv_complete(net, aead, b, &skb, err); if (likely(skb)) { if (likely(test_bit(0, &b->up))) tipc_rcv(net, skb, b); else kfree_skb(skb); } tipc_bearer_put(b); } static inline int tipc_ehdr_size(struct tipc_ehdr *ehdr) { return (ehdr->user != LINK_CONFIG) ? EHDR_SIZE : EHDR_CFG_SIZE; } /** * tipc_ehdr_validate - Validate an encryption message * @skb: the message buffer * * Return: "true" if this is a valid encryption message, otherwise "false" */ bool tipc_ehdr_validate(struct sk_buff *skb) { struct tipc_ehdr *ehdr; int ehsz; if (unlikely(!pskb_may_pull(skb, EHDR_MIN_SIZE))) return false; ehdr = (struct tipc_ehdr *)skb->data; if (unlikely(ehdr->version != TIPC_EVERSION)) return false; ehsz = tipc_ehdr_size(ehdr); if (unlikely(!pskb_may_pull(skb, ehsz))) return false; if (unlikely(skb->len <= ehsz + TIPC_AES_GCM_TAG_SIZE)) return false; return true; } /** * tipc_ehdr_build - Build TIPC encryption message header * @net: struct net * @aead: TX AEAD key to be used for the message encryption * @tx_key: key id used for the message encryption * @skb: input/output message skb * @__rx: RX crypto handle if dest is "known" * * Return: the header size if the building is successful, otherwise < 0 */ static int tipc_ehdr_build(struct net *net, struct tipc_aead *aead, u8 tx_key, struct sk_buff *skb, struct tipc_crypto *__rx) { struct tipc_msg *hdr = buf_msg(skb); struct tipc_ehdr *ehdr; u32 user = msg_user(hdr); u64 seqno; int ehsz; /* Make room for encryption header */ ehsz = (user != LINK_CONFIG) ? EHDR_SIZE : EHDR_CFG_SIZE; WARN_ON(skb_headroom(skb) < ehsz); ehdr = (struct tipc_ehdr *)skb_push(skb, ehsz); /* Obtain a seqno first: * Use the key seqno (= cluster wise) if dest is unknown or we're in * cluster key mode, otherwise it's better for a per-peer seqno! */ if (!__rx || aead->mode == CLUSTER_KEY) seqno = atomic64_inc_return(&aead->seqno); else seqno = atomic64_inc_return(&__rx->sndnxt); /* Revoke the key if seqno is wrapped around */ if (unlikely(!seqno)) return tipc_crypto_key_revoke(net, tx_key); /* Word 1-2 */ ehdr->seqno = cpu_to_be64(seqno); /* Words 0, 3- */ ehdr->version = TIPC_EVERSION; ehdr->user = 0; ehdr->keepalive = 0; ehdr->tx_key = tx_key; ehdr->destined = (__rx) ? 1 : 0; ehdr->rx_key_active = (__rx) ? __rx->key.active : 0; ehdr->rx_nokey = (__rx) ? __rx->nokey : 0; ehdr->master_key = aead->crypto->key_master; ehdr->reserved_1 = 0; ehdr->reserved_2 = 0; switch (user) { case LINK_CONFIG: ehdr->user = LINK_CONFIG; memcpy(ehdr->id, tipc_own_id(net), NODE_ID_LEN); break; default: if (user == LINK_PROTOCOL && msg_type(hdr) == STATE_MSG) { ehdr->user = LINK_PROTOCOL; ehdr->keepalive = msg_is_keepalive(hdr); } ehdr->addr = hdr->hdr[3]; break; } return ehsz; } static inline void tipc_crypto_key_set_state(struct tipc_crypto *c, u8 new_passive, u8 new_active, u8 new_pending) { struct tipc_key old = c->key; char buf[32]; c->key.keys = ((new_passive & KEY_MASK) << (KEY_BITS * 2)) | ((new_active & KEY_MASK) << (KEY_BITS)) | ((new_pending & KEY_MASK)); pr_debug("%s: key changing %s ::%pS\n", c->name, tipc_key_change_dump(old, c->key, buf), __builtin_return_address(0)); } /** * tipc_crypto_key_init - Initiate a new user / AEAD key * @c: TIPC crypto to which new key is attached * @ukey: the user key * @mode: the key mode (CLUSTER_KEY or PER_NODE_KEY) * @master_key: specify this is a cluster master key * * A new TIPC AEAD key will be allocated and initiated with the specified user * key, then attached to the TIPC crypto. * * Return: new key id in case of success, otherwise: < 0 */ int tipc_crypto_key_init(struct tipc_crypto *c, struct tipc_aead_key *ukey, u8 mode, bool master_key) { struct tipc_aead *aead = NULL; int rc = 0; /* Initiate with the new user key */ rc = tipc_aead_init(&aead, ukey, mode); /* Attach it to the crypto */ if (likely(!rc)) { rc = tipc_crypto_key_attach(c, aead, 0, master_key); if (rc < 0) tipc_aead_free(&aead->rcu); } return rc; } /** * tipc_crypto_key_attach - Attach a new AEAD key to TIPC crypto * @c: TIPC crypto to which the new AEAD key is attached * @aead: the new AEAD key pointer * @pos: desired slot in the crypto key array, = 0 if any! * @master_key: specify this is a cluster master key * * Return: new key id in case of success, otherwise: -EBUSY */ static int tipc_crypto_key_attach(struct tipc_crypto *c, struct tipc_aead *aead, u8 pos, bool master_key) { struct tipc_key key; int rc = -EBUSY; u8 new_key; spin_lock_bh(&c->lock); key = c->key; if (master_key) { new_key = KEY_MASTER; goto attach; } if (key.active && key.passive) goto exit; if (key.pending) { if (tipc_aead_users(c->aead[key.pending]) > 0) goto exit; /* if (pos): ok with replacing, will be aligned when needed */ /* Replace it */ new_key = key.pending; } else { if (pos) { if (key.active && pos != key_next(key.active)) { key.passive = pos; new_key = pos; goto attach; } else if (!key.active && !key.passive) { key.pending = pos; new_key = pos; goto attach; } } key.pending = key_next(key.active ?: key.passive); new_key = key.pending; } attach: aead->crypto = c; aead->gen = (is_tx(c)) ? ++c->key_gen : c->key_gen; tipc_aead_rcu_replace(c->aead[new_key], aead, &c->lock); if (likely(c->key.keys != key.keys)) tipc_crypto_key_set_state(c, key.passive, key.active, key.pending); c->working = 1; c->nokey = 0; c->key_master |= master_key; rc = new_key; exit: spin_unlock_bh(&c->lock); return rc; } void tipc_crypto_key_flush(struct tipc_crypto *c) { struct tipc_crypto *tx, *rx; int k; spin_lock_bh(&c->lock); if (is_rx(c)) { /* Try to cancel pending work */ rx = c; tx = tipc_net(rx->net)->crypto_tx; if (cancel_delayed_work(&rx->work)) { kfree(rx->skey); rx->skey = NULL; atomic_xchg(&rx->key_distr, 0); tipc_node_put(rx->node); } /* RX stopping => decrease TX key users if any */ k = atomic_xchg(&rx->peer_rx_active, 0); if (k) { tipc_aead_users_dec(tx->aead[k], 0); /* Mark the point TX key users changed */ tx->timer1 = jiffies; } } c->flags = 0; tipc_crypto_key_set_state(c, 0, 0, 0); for (k = KEY_MIN; k <= KEY_MAX; k++) tipc_crypto_key_detach(c->aead[k], &c->lock); atomic64_set(&c->sndnxt, 0); spin_unlock_bh(&c->lock); } /** * tipc_crypto_key_try_align - Align RX keys if possible * @rx: RX crypto handle * @new_pending: new pending slot if aligned (= TX key from peer) * * Peer has used an unknown key slot, this only happens when peer has left and * rejoned, or we are newcomer. * That means, there must be no active key but a pending key at unaligned slot. * If so, we try to move the pending key to the new slot. * Note: A potential passive key can exist, it will be shifted correspondingly! * * Return: "true" if key is successfully aligned, otherwise "false" */ static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending) { struct tipc_aead *tmp1, *tmp2 = NULL; struct tipc_key key; bool aligned = false; u8 new_passive = 0; int x; spin_lock(&rx->lock); key = rx->key; if (key.pending == new_pending) { aligned = true; goto exit; } if (key.active) goto exit; if (!key.pending) goto exit; if (tipc_aead_users(rx->aead[key.pending]) > 0) goto exit; /* Try to "isolate" this pending key first */ tmp1 = tipc_aead_rcu_ptr(rx->aead[key.pending], &rx->lock); if (!refcount_dec_if_one(&tmp1->refcnt)) goto exit; rcu_assign_pointer(rx->aead[key.pending], NULL); /* Move passive key if any */ if (key.passive) { tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, lockdep_is_held(&rx->lock)); x = (key.passive - key.pending + new_pending) % KEY_MAX; new_passive = (x <= 0) ? x + KEY_MAX : x; } /* Re-allocate the key(s) */ tipc_crypto_key_set_state(rx, new_passive, 0, new_pending); rcu_assign_pointer(rx->aead[new_pending], tmp1); if (new_passive) rcu_assign_pointer(rx->aead[new_passive], tmp2); refcount_set(&tmp1->refcnt, 1); aligned = true; pr_info_ratelimited("%s: key[%d] -> key[%d]\n", rx->name, key.pending, new_pending); exit: spin_unlock(&rx->lock); return aligned; } /** * tipc_crypto_key_pick_tx - Pick one TX key for message decryption * @tx: TX crypto handle * @rx: RX crypto handle (can be NULL) * @skb: the message skb which will be decrypted later * @tx_key: peer TX key id * * This function looks up the existing TX keys and pick one which is suitable * for the message decryption, that must be a cluster key and not used before * on the same message (i.e. recursive). * * Return: the TX AEAD key handle in case of success, otherwise NULL */ static struct tipc_aead *tipc_crypto_key_pick_tx(struct tipc_crypto *tx, struct tipc_crypto *rx, struct sk_buff *skb, u8 tx_key) { struct tipc_skb_cb *skb_cb = TIPC_SKB_CB(skb); struct tipc_aead *aead = NULL; struct tipc_key key = tx->key; u8 k, i = 0; /* Initialize data if not yet */ if (!skb_cb->tx_clone_deferred) { skb_cb->tx_clone_deferred = 1; memset(&skb_cb->tx_clone_ctx, 0, sizeof(skb_cb->tx_clone_ctx)); } skb_cb->tx_clone_ctx.rx = rx; if (++skb_cb->tx_clone_ctx.recurs > 2) return NULL; /* Pick one TX key */ spin_lock(&tx->lock); if (tx_key == KEY_MASTER) { aead = tipc_aead_rcu_ptr(tx->aead[KEY_MASTER], &tx->lock); goto done; } do { k = (i == 0) ? key.pending : ((i == 1) ? key.active : key.passive); if (!k) continue; aead = tipc_aead_rcu_ptr(tx->aead[k], &tx->lock); if (!aead) continue; if (aead->mode != CLUSTER_KEY || aead == skb_cb->tx_clone_ctx.last) { aead = NULL; continue; } /* Ok, found one cluster key */ skb_cb->tx_clone_ctx.last = aead; WARN_ON(skb->next); skb->next = skb_clone(skb, GFP_ATOMIC); if (unlikely(!skb->next)) pr_warn("Failed to clone skb for next round if any\n"); break; } while (++i < 3); done: if (likely(aead)) WARN_ON(!refcount_inc_not_zero(&aead->refcnt)); spin_unlock(&tx->lock); return aead; } /** * tipc_crypto_key_synch: Synch own key data according to peer key status * @rx: RX crypto handle * @skb: TIPCv2 message buffer (incl. the ehdr from peer) * * This function updates the peer node related data as the peer RX active key * has changed, so the number of TX keys' users on this node are increased and * decreased correspondingly. * * It also considers if peer has no key, then we need to make own master key * (if any) taking over i.e. starting grace period and also trigger key * distributing process. * * The "per-peer" sndnxt is also reset when the peer key has switched. */ static void tipc_crypto_key_synch(struct tipc_crypto *rx, struct sk_buff *skb) { struct tipc_ehdr *ehdr = (struct tipc_ehdr *)skb_network_header(skb); struct tipc_crypto *tx = tipc_net(rx->net)->crypto_tx; struct tipc_msg *hdr = buf_msg(skb); u32 self = tipc_own_addr(rx->net); u8 cur, new; unsigned long delay; /* Update RX 'key_master' flag according to peer, also mark "legacy" if * a peer has no master key. */ rx->key_master = ehdr->master_key; if (!rx->key_master) tx->legacy_user = 1; /* For later cases, apply only if message is destined to this node */ if (!ehdr->destined || msg_short(hdr) || msg_destnode(hdr) != self) return; /* Case 1: Peer has no keys, let's make master key take over */ if (ehdr->rx_nokey) { /* Set or extend grace period */ tx->timer2 = jiffies; /* Schedule key distributing for the peer if not yet */ if (tx->key.keys && !atomic_cmpxchg(&rx->key_distr, 0, KEY_DISTR_SCHED)) { get_random_bytes(&delay, 2); delay %= 5; delay = msecs_to_jiffies(500 * ++delay); if (queue_delayed_work(tx->wq, &rx->work, delay)) tipc_node_get(rx->node); } } else { /* Cancel a pending key distributing if any */ atomic_xchg(&rx->key_distr, 0); } /* Case 2: Peer RX active key has changed, let's update own TX users */ cur = atomic_read(&rx->peer_rx_active); new = ehdr->rx_key_active; if (tx->key.keys && cur != new && atomic_cmpxchg(&rx->peer_rx_active, cur, new) == cur) { if (new) tipc_aead_users_inc(tx->aead[new], INT_MAX); if (cur) tipc_aead_users_dec(tx->aead[cur], 0); atomic64_set(&rx->sndnxt, 0); /* Mark the point TX key users changed */ tx->timer1 = jiffies; pr_debug("%s: key users changed %d-- %d++, peer %s\n", tx->name, cur, new, rx->name); } } static int tipc_crypto_key_revoke(struct net *net, u8 tx_key) { struct tipc_crypto *tx = tipc_net(net)->crypto_tx; struct tipc_key key; spin_lock_bh(&tx->lock); key = tx->key; WARN_ON(!key.active || tx_key != key.active); /* Free the active key */ tipc_crypto_key_set_state(tx, key.passive, 0, key.pending); tipc_crypto_key_detach(tx->aead[key.active], &tx->lock); spin_unlock_bh(&tx->lock); pr_warn("%s: key is revoked\n", tx->name); return -EKEYREVOKED; } int tipc_crypto_start(struct tipc_crypto **crypto, struct net *net, struct tipc_node *node) { struct tipc_crypto *c; if (*crypto) return -EEXIST; /* Allocate crypto */ c = kzalloc(sizeof(*c), GFP_ATOMIC); if (!c) return -ENOMEM; /* Allocate workqueue on TX */ if (!node) { c->wq = alloc_ordered_workqueue("tipc_crypto", 0); if (!c->wq) { kfree(c); return -ENOMEM; } } /* Allocate statistic structure */ c->stats = alloc_percpu_gfp(struct tipc_crypto_stats, GFP_ATOMIC); if (!c->stats) { if (c->wq) destroy_workqueue(c->wq); kfree_sensitive(c); return -ENOMEM; } c->flags = 0; c->net = net; c->node = node; get_random_bytes(&c->key_gen, 2); tipc_crypto_key_set_state(c, 0, 0, 0); atomic_set(&c->key_distr, 0); atomic_set(&c->peer_rx_active, 0); atomic64_set(&c->sndnxt, 0); c->timer1 = jiffies; c->timer2 = jiffies; c->rekeying_intv = TIPC_REKEYING_INTV_DEF; spin_lock_init(&c->lock); scnprintf(c->name, 48, "%s(%s)", (is_rx(c)) ? "RX" : "TX", (is_rx(c)) ? tipc_node_get_id_str(c->node) : tipc_own_id_string(c->net)); if (is_rx(c)) INIT_DELAYED_WORK(&c->work, tipc_crypto_work_rx); else INIT_DELAYED_WORK(&c->work, tipc_crypto_work_tx); *crypto = c; return 0; } void tipc_crypto_stop(struct tipc_crypto **crypto) { struct tipc_crypto *c = *crypto; u8 k; if (!c) return; /* Flush any queued works & destroy wq */ if (is_tx(c)) { c->rekeying_intv = 0; cancel_delayed_work_sync(&c->work); destroy_workqueue(c->wq); } /* Release AEAD keys */ rcu_read_lock(); for (k = KEY_MIN; k <= KEY_MAX; k++) tipc_aead_put(rcu_dereference(c->aead[k])); rcu_read_unlock(); pr_debug("%s: has been stopped\n", c->name); /* Free this crypto statistics */ free_percpu(c->stats); *crypto = NULL; kfree_sensitive(c); } void tipc_crypto_timeout(struct tipc_crypto *rx) { struct tipc_net *tn = tipc_net(rx->net); struct tipc_crypto *tx = tn->crypto_tx; struct tipc_key key; int cmd; /* TX pending: taking all users & stable -> active */ spin_lock(&tx->lock); key = tx->key; if (key.active && tipc_aead_users(tx->aead[key.active]) > 0) goto s1; if (!key.pending || tipc_aead_users(tx->aead[key.pending]) <= 0) goto s1; if (time_before(jiffies, tx->timer1 + TIPC_TX_LASTING_TIME)) goto s1; tipc_crypto_key_set_state(tx, key.passive, key.pending, 0); if (key.active) tipc_crypto_key_detach(tx->aead[key.active], &tx->lock); this_cpu_inc(tx->stats->stat[STAT_SWITCHES]); pr_info("%s: key[%d] is activated\n", tx->name, key.pending); s1: spin_unlock(&tx->lock); /* RX pending: having user -> active */ spin_lock(&rx->lock); key = rx->key; if (!key.pending || tipc_aead_users(rx->aead[key.pending]) <= 0) goto s2; if (key.active) key.passive = key.active; key.active = key.pending; rx->timer2 = jiffies; tipc_crypto_key_set_state(rx, key.passive, key.active, 0); this_cpu_inc(rx->stats->stat[STAT_SWITCHES]); pr_info("%s: key[%d] is activated\n", rx->name, key.pending); goto s5; s2: /* RX pending: not working -> remove */ if (!key.pending || tipc_aead_users(rx->aead[key.pending]) > -10) goto s3; tipc_crypto_key_set_state(rx, key.passive, key.active, 0); tipc_crypto_key_detach(rx->aead[key.pending], &rx->lock); pr_debug("%s: key[%d] is removed\n", rx->name, key.pending); goto s5; s3: /* RX active: timed out or no user -> pending */ if (!key.active) goto s4; if (time_before(jiffies, rx->timer1 + TIPC_RX_ACTIVE_LIM) && tipc_aead_users(rx->aead[key.active]) > 0) goto s4; if (key.pending) key.passive = key.active; else key.pending = key.active; rx->timer2 = jiffies; tipc_crypto_key_set_state(rx, key.passive, 0, key.pending); tipc_aead_users_set(rx->aead[key.pending], 0); pr_debug("%s: key[%d] is deactivated\n", rx->name, key.active); goto s5; s4: /* RX passive: outdated or not working -> free */ if (!key.passive) goto s5; if (time_before(jiffies, rx->timer2 + TIPC_RX_PASSIVE_LIM) && tipc_aead_users(rx->aead[key.passive]) > -10) goto s5; tipc_crypto_key_set_state(rx, 0, key.active, key.pending); tipc_crypto_key_detach(rx->aead[key.passive], &rx->lock); pr_debug("%s: key[%d] is freed\n", rx->name, key.passive); s5: spin_unlock(&rx->lock); /* Relax it here, the flag will be set again if it really is, but only * when we are not in grace period for safety! */ if (time_after(jiffies, tx->timer2 + TIPC_TX_GRACE_PERIOD)) tx->legacy_user = 0; /* Limit max_tfms & do debug commands if needed */ if (likely(sysctl_tipc_max_tfms <= TIPC_MAX_TFMS_LIM)) return; cmd = sysctl_tipc_max_tfms; sysctl_tipc_max_tfms = TIPC_MAX_TFMS_DEF; tipc_crypto_do_cmd(rx->net, cmd); } static inline void tipc_crypto_clone_msg(struct net *net, struct sk_buff *_skb, struct tipc_bearer *b, struct tipc_media_addr *dst, struct tipc_node *__dnode, u8 type) { struct sk_buff *skb; skb = skb_clone(_skb, GFP_ATOMIC); if (skb) { TIPC_SKB_CB(skb)->xmit_type = type; tipc_crypto_xmit(net, &skb, b, dst, __dnode); if (skb) b->media->send_msg(net, skb, b, dst); } } /** * tipc_crypto_xmit - Build & encrypt TIPC message for xmit * @net: struct net * @skb: input/output message skb pointer * @b: bearer used for xmit later * @dst: destination media address * @__dnode: destination node for reference if any * * First, build an encryption message header on the top of the message, then * encrypt the original TIPC message by using the pending, master or active * key with this preference order. * If the encryption is successful, the encrypted skb is returned directly or * via the callback. * Otherwise, the skb is freed! * * Return: * * 0 : the encryption has succeeded (or no encryption) * * -EINPROGRESS/-EBUSY : the encryption is ongoing, a callback will be made * * -ENOKEK : the encryption has failed due to no key * * -EKEYREVOKED : the encryption has failed due to key revoked * * -ENOMEM : the encryption has failed due to no memory * * < 0 : the encryption has failed due to other reasons */ int tipc_crypto_xmit(struct net *net, struct sk_buff **skb, struct tipc_bearer *b, struct tipc_media_addr *dst, struct tipc_node *__dnode) { struct tipc_crypto *__rx = tipc_node_crypto_rx(__dnode); struct tipc_crypto *tx = tipc_net(net)->crypto_tx; struct tipc_crypto_stats __percpu *stats = tx->stats; struct tipc_msg *hdr = buf_msg(*skb); struct tipc_key key = tx->key; struct tipc_aead *aead = NULL; u32 user = msg_user(hdr); u32 type = msg_type(hdr); int rc = -ENOKEY; u8 tx_key = 0; /* No encryption? */ if (!tx->working) return 0; /* Pending key if peer has active on it or probing time */ if (unlikely(key.pending)) { tx_key = key.pending; if (!tx->key_master && !key.active) goto encrypt; if (__rx && atomic_read(&__rx->peer_rx_active) == tx_key) goto encrypt; if (TIPC_SKB_CB(*skb)->xmit_type == SKB_PROBING) { pr_debug("%s: probing for key[%d]\n", tx->name, key.pending); goto encrypt; } if (user == LINK_CONFIG || user == LINK_PROTOCOL) tipc_crypto_clone_msg(net, *skb, b, dst, __dnode, SKB_PROBING); } /* Master key if this is a *vital* message or in grace period */ if (tx->key_master) { tx_key = KEY_MASTER; if (!key.active) goto encrypt; if (TIPC_SKB_CB(*skb)->xmit_type == SKB_GRACING) { pr_debug("%s: gracing for msg (%d %d)\n", tx->name, user, type); goto encrypt; } if (user == LINK_CONFIG || (user == LINK_PROTOCOL && type == RESET_MSG) || (user == MSG_CRYPTO && type == KEY_DISTR_MSG) || time_before(jiffies, tx->timer2 + TIPC_TX_GRACE_PERIOD)) { if (__rx && __rx->key_master && !atomic_read(&__rx->peer_rx_active)) goto encrypt; if (!__rx) { if (likely(!tx->legacy_user)) goto encrypt; tipc_crypto_clone_msg(net, *skb, b, dst, __dnode, SKB_GRACING); } } } /* Else, use the active key if any */ if (likely(key.active)) { tx_key = key.active; goto encrypt; } goto exit; encrypt: aead = tipc_aead_get(tx->aead[tx_key]); if (unlikely(!aead)) goto exit; rc = tipc_ehdr_build(net, aead, tx_key, *skb, __rx); if (likely(rc > 0)) rc = tipc_aead_encrypt(aead, *skb, b, dst, __dnode); exit: switch (rc) { case 0: this_cpu_inc(stats->stat[STAT_OK]); break; case -EINPROGRESS: case -EBUSY: this_cpu_inc(stats->stat[STAT_ASYNC]); *skb = NULL; return rc; default: this_cpu_inc(stats->stat[STAT_NOK]); if (rc == -ENOKEY) this_cpu_inc(stats->stat[STAT_NOKEYS]); else if (rc == -EKEYREVOKED) this_cpu_inc(stats->stat[STAT_BADKEYS]); kfree_skb(*skb); *skb = NULL; break; } tipc_aead_put(aead); return rc; } /** * tipc_crypto_rcv - Decrypt an encrypted TIPC message from peer * @net: struct net * @rx: RX crypto handle * @skb: input/output message skb pointer * @b: bearer where the message has been received * * If the decryption is successful, the decrypted skb is returned directly or * as the callback, the encryption header and auth tag will be trimed out * before forwarding to tipc_rcv() via the tipc_crypto_rcv_complete(). * Otherwise, the skb will be freed! * Note: RX key(s) can be re-aligned, or in case of no key suitable, TX * cluster key(s) can be taken for decryption (- recursive). * * Return: * * 0 : the decryption has successfully completed * * -EINPROGRESS/-EBUSY : the decryption is ongoing, a callback will be made * * -ENOKEY : the decryption has failed due to no key * * -EBADMSG : the decryption has failed due to bad message * * -ENOMEM : the decryption has failed due to no memory * * < 0 : the decryption has failed due to other reasons */ int tipc_crypto_rcv(struct net *net, struct tipc_crypto *rx, struct sk_buff **skb, struct tipc_bearer *b) { struct tipc_crypto *tx = tipc_net(net)->crypto_tx; struct tipc_crypto_stats __percpu *stats; struct tipc_aead *aead = NULL; struct tipc_key key; int rc = -ENOKEY; u8 tx_key, n; tx_key = ((struct tipc_ehdr *)(*skb)->data)->tx_key; /* New peer? * Let's try with TX key (i.e. cluster mode) & verify the skb first! */ if (unlikely(!rx || tx_key == KEY_MASTER)) goto pick_tx; /* Pick RX key according to TX key if any */ key = rx->key; if (tx_key == key.active || tx_key == key.pending || tx_key == key.passive) goto decrypt; /* Unknown key, let's try to align RX key(s) */ if (tipc_crypto_key_try_align(rx, tx_key)) goto decrypt; pick_tx: /* No key suitable? Try to pick one from TX... */ aead = tipc_crypto_key_pick_tx(tx, rx, *skb, tx_key); if (aead) goto decrypt; goto exit; decrypt: rcu_read_lock(); if (!aead) aead = tipc_aead_get(rx->aead[tx_key]); rc = tipc_aead_decrypt(net, aead, *skb, b); rcu_read_unlock(); exit: stats = ((rx) ?: tx)->stats; switch (rc) { case 0: this_cpu_inc(stats->stat[STAT_OK]); break; case -EINPROGRESS: case -EBUSY: this_cpu_inc(stats->stat[STAT_ASYNC]); *skb = NULL; return rc; default: this_cpu_inc(stats->stat[STAT_NOK]); if (rc == -ENOKEY) { kfree_skb(*skb); *skb = NULL; if (rx) { /* Mark rx->nokey only if we dont have a * pending received session key, nor a newer * one i.e. in the next slot. */ n = key_next(tx_key); rx->nokey = !(rx->skey || rcu_access_pointer(rx->aead[n])); pr_debug_ratelimited("%s: nokey %d, key %d/%x\n", rx->name, rx->nokey, tx_key, rx->key.keys); tipc_node_put(rx->node); } this_cpu_inc(stats->stat[STAT_NOKEYS]); return rc; } else if (rc == -EBADMSG) { this_cpu_inc(stats->stat[STAT_BADMSGS]); } break; } tipc_crypto_rcv_complete(net, aead, b, skb, rc); return rc; } static void tipc_crypto_rcv_complete(struct net *net, struct tipc_aead *aead, struct tipc_bearer *b, struct sk_buff **skb, int err) { struct tipc_skb_cb *skb_cb = TIPC_SKB_CB(*skb); struct tipc_crypto *rx = aead->crypto; struct tipc_aead *tmp = NULL; struct tipc_ehdr *ehdr; struct tipc_node *n; /* Is this completed by TX? */ if (unlikely(is_tx(aead->crypto))) { rx = skb_cb->tx_clone_ctx.rx; pr_debug("TX->RX(%s): err %d, aead %p, skb->next %p, flags %x\n", (rx) ? tipc_node_get_id_str(rx->node) : "-", err, aead, (*skb)->next, skb_cb->flags); pr_debug("skb_cb [recurs %d, last %p], tx->aead [%p %p %p]\n", skb_cb->tx_clone_ctx.recurs, skb_cb->tx_clone_ctx.last, aead->crypto->aead[1], aead->crypto->aead[2], aead->crypto->aead[3]); if (unlikely(err)) { if (err == -EBADMSG && (*skb)->next) tipc_rcv(net, (*skb)->next, b); goto free_skb; } if (likely((*skb)->next)) { kfree_skb((*skb)->next); (*skb)->next = NULL; } ehdr = (struct tipc_ehdr *)(*skb)->data; if (!rx) { WARN_ON(ehdr->user != LINK_CONFIG); n = tipc_node_create(net, 0, ehdr->id, 0xffffu, 0, true); rx = tipc_node_crypto_rx(n); if (unlikely(!rx)) goto free_skb; } /* Ignore cloning if it was TX master key */ if (ehdr->tx_key == KEY_MASTER) goto rcv; if (tipc_aead_clone(&tmp, aead) < 0) goto rcv; WARN_ON(!refcount_inc_not_zero(&tmp->refcnt)); if (tipc_crypto_key_attach(rx, tmp, ehdr->tx_key, false) < 0) { tipc_aead_free(&tmp->rcu); goto rcv; } tipc_aead_put(aead); aead = tmp; } if (unlikely(err)) { tipc_aead_users_dec((struct tipc_aead __force __rcu *)aead, INT_MIN); goto free_skb; } /* Set the RX key's user */ tipc_aead_users_set((struct tipc_aead __force __rcu *)aead, 1); /* Mark this point, RX works */ rx->timer1 = jiffies; rcv: /* Remove ehdr & auth. tag prior to tipc_rcv() */ ehdr = (struct tipc_ehdr *)(*skb)->data; /* Mark this point, RX passive still works */ if (rx->key.passive && ehdr->tx_key == rx->key.passive) rx->timer2 = jiffies; skb_reset_network_header(*skb); skb_pull(*skb, tipc_ehdr_size(ehdr)); if (pskb_trim(*skb, (*skb)->len - aead->authsize)) goto free_skb; /* Validate TIPCv2 message */ if (unlikely(!tipc_msg_validate(skb))) { pr_err_ratelimited("Packet dropped after decryption!\n"); goto free_skb; } /* Ok, everything's fine, try to synch own keys according to peers' */ tipc_crypto_key_synch(rx, *skb); /* Re-fetch skb cb as skb might be changed in tipc_msg_validate */ skb_cb = TIPC_SKB_CB(*skb); /* Mark skb decrypted */ skb_cb->decrypted = 1; /* Clear clone cxt if any */ if (likely(!skb_cb->tx_clone_deferred)) goto exit; skb_cb->tx_clone_deferred = 0; memset(&skb_cb->tx_clone_ctx, 0, sizeof(skb_cb->tx_clone_ctx)); goto exit; free_skb: kfree_skb(*skb); *skb = NULL; exit: tipc_aead_put(aead); if (rx) tipc_node_put(rx->node); } static void tipc_crypto_do_cmd(struct net *net, int cmd) { struct tipc_net *tn = tipc_net(net); struct tipc_crypto *tx = tn->crypto_tx, *rx; struct list_head *p; unsigned int stat; int i, j, cpu; char buf[200]; /* Currently only one command is supported */ switch (cmd) { case 0xfff1: goto print_stats; default: return; } print_stats: /* Print a header */ pr_info("\n=============== TIPC Crypto Statistics ===============\n\n"); /* Print key status */ pr_info("Key status:\n"); pr_info("TX(%7.7s)\n%s", tipc_own_id_string(net), tipc_crypto_key_dump(tx, buf)); rcu_read_lock(); for (p = tn->node_list.next; p != &tn->node_list; p = p->next) { rx = tipc_node_crypto_rx_by_list(p); pr_info("RX(%7.7s)\n%s", tipc_node_get_id_str(rx->node), tipc_crypto_key_dump(rx, buf)); } rcu_read_unlock(); /* Print crypto statistics */ for (i = 0, j = 0; i < MAX_STATS; i++) j += scnprintf(buf + j, 200 - j, "|%11s ", hstats[i]); pr_info("Counter %s", buf); memset(buf, '-', 115); buf[115] = '\0'; pr_info("%s\n", buf); j = scnprintf(buf, 200, "TX(%7.7s) ", tipc_own_id_string(net)); for_each_possible_cpu(cpu) { for (i = 0; i < MAX_STATS; i++) { stat = per_cpu_ptr(tx->stats, cpu)->stat[i]; j += scnprintf(buf + j, 200 - j, "|%11d ", stat); } pr_info("%s", buf); j = scnprintf(buf, 200, "%12s", " "); } rcu_read_lock(); for (p = tn->node_list.next; p != &tn->node_list; p = p->next) { rx = tipc_node_crypto_rx_by_list(p); j = scnprintf(buf, 200, "RX(%7.7s) ", tipc_node_get_id_str(rx->node)); for_each_possible_cpu(cpu) { for (i = 0; i < MAX_STATS; i++) { stat = per_cpu_ptr(rx->stats, cpu)->stat[i]; j += scnprintf(buf + j, 200 - j, "|%11d ", stat); } pr_info("%s", buf); j = scnprintf(buf, 200, "%12s", " "); } } rcu_read_unlock(); pr_info("\n======================== Done ========================\n"); } static char *tipc_crypto_key_dump(struct tipc_crypto *c, char *buf) { struct tipc_key key = c->key; struct tipc_aead *aead; int k, i = 0; char *s; for (k = KEY_MIN; k <= KEY_MAX; k++) { if (k == KEY_MASTER) { if (is_rx(c)) continue; if (time_before(jiffies, c->timer2 + TIPC_TX_GRACE_PERIOD)) s = "ACT"; else s = "PAS"; } else { if (k == key.passive) s = "PAS"; else if (k == key.active) s = "ACT"; else if (k == key.pending) s = "PEN"; else s = "-"; } i += scnprintf(buf + i, 200 - i, "\tKey%d: %s", k, s); rcu_read_lock(); aead = rcu_dereference(c->aead[k]); if (aead) i += scnprintf(buf + i, 200 - i, "{\"0x...%s\", \"%s\"}/%d:%d", aead->hint, (aead->mode == CLUSTER_KEY) ? "c" : "p", atomic_read(&aead->users), refcount_read(&aead->refcnt)); rcu_read_unlock(); i += scnprintf(buf + i, 200 - i, "\n"); } if (is_rx(c)) i += scnprintf(buf + i, 200 - i, "\tPeer RX active: %d\n", atomic_read(&c->peer_rx_active)); return buf; } static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new, char *buf) { struct tipc_key *key = &old; int k, i = 0; char *s; /* Output format: "[%s %s %s] -> [%s %s %s]", max len = 32 */ again: i += scnprintf(buf + i, 32 - i, "["); for (k = KEY_1; k <= KEY_3; k++) { if (k == key->passive) s = "pas"; else if (k == key->active) s = "act"; else if (k == key->pending) s = "pen"; else s = "-"; i += scnprintf(buf + i, 32 - i, (k != KEY_3) ? "%s " : "%s", s); } if (key != &new) { i += scnprintf(buf + i, 32 - i, "] -> "); key = &new; goto again; } i += scnprintf(buf + i, 32 - i, "]"); return buf; } /** * tipc_crypto_msg_rcv - Common 'MSG_CRYPTO' processing point * @net: the struct net * @skb: the receiving message buffer */ void tipc_crypto_msg_rcv(struct net *net, struct sk_buff *skb) { struct tipc_crypto *rx; struct tipc_msg *hdr; if (unlikely(skb_linearize(skb))) goto exit; hdr = buf_msg(skb); rx = tipc_node_crypto_rx_by_addr(net, msg_prevnode(hdr)); if (unlikely(!rx)) goto exit; switch (msg_type(hdr)) { case KEY_DISTR_MSG: if (tipc_crypto_key_rcv(rx, hdr)) goto exit; break; default: break; } tipc_node_put(rx->node); exit: kfree_skb(skb); } /** * tipc_crypto_key_distr - Distribute a TX key * @tx: the TX crypto * @key: the key's index * @dest: the destination tipc node, = NULL if distributing to all nodes * * Return: 0 in case of success, otherwise < 0 */ int tipc_crypto_key_distr(struct tipc_crypto *tx, u8 key, struct tipc_node *dest) { struct tipc_aead *aead; u32 dnode = tipc_node_get_addr(dest); int rc = -ENOKEY; if (!sysctl_tipc_key_exchange_enabled) return 0; if (key) { rcu_read_lock(); aead = tipc_aead_get(tx->aead[key]); if (likely(aead)) { rc = tipc_crypto_key_xmit(tx->net, aead->key, aead->gen, aead->mode, dnode); tipc_aead_put(aead); } rcu_read_unlock(); } return rc; } /** * tipc_crypto_key_xmit - Send a session key * @net: the struct net * @skey: the session key to be sent * @gen: the key's generation * @mode: the key's mode * @dnode: the destination node address, = 0 if broadcasting to all nodes * * The session key 'skey' is packed in a TIPC v2 'MSG_CRYPTO/KEY_DISTR_MSG' * as its data section, then xmit-ed through the uc/bc link. * * Return: 0 in case of success, otherwise < 0 */ static int tipc_crypto_key_xmit(struct net *net, struct tipc_aead_key *skey, u16 gen, u8 mode, u32 dnode) { struct sk_buff_head pkts; struct tipc_msg *hdr; struct sk_buff *skb; u16 size, cong_link_cnt; u8 *data; int rc; size = tipc_aead_key_size(skey); skb = tipc_buf_acquire(INT_H_SIZE + size, GFP_ATOMIC); if (!skb) return -ENOMEM; hdr = buf_msg(skb); tipc_msg_init(tipc_own_addr(net), hdr, MSG_CRYPTO, KEY_DISTR_MSG, INT_H_SIZE, dnode); msg_set_size(hdr, INT_H_SIZE + size); msg_set_key_gen(hdr, gen); msg_set_key_mode(hdr, mode); data = msg_data(hdr); *((__be32 *)(data + TIPC_AEAD_ALG_NAME)) = htonl(skey->keylen); memcpy(data, skey->alg_name, TIPC_AEAD_ALG_NAME); memcpy(data + TIPC_AEAD_ALG_NAME + sizeof(__be32), skey->key, skey->keylen); __skb_queue_head_init(&pkts); __skb_queue_tail(&pkts, skb); if (dnode) rc = tipc_node_xmit(net, &pkts, dnode, 0); else rc = tipc_bcast_xmit(net, &pkts, &cong_link_cnt); return rc; } /** * tipc_crypto_key_rcv - Receive a session key * @rx: the RX crypto * @hdr: the TIPC v2 message incl. the receiving session key in its data * * This function retrieves the session key in the message from peer, then * schedules a RX work to attach the key to the corresponding RX crypto. * * Return: "true" if the key has been scheduled for attaching, otherwise * "false". */ static bool tipc_crypto_key_rcv(struct tipc_crypto *rx, struct tipc_msg *hdr) { struct tipc_crypto *tx = tipc_net(rx->net)->crypto_tx; struct tipc_aead_key *skey = NULL; u16 key_gen = msg_key_gen(hdr); u32 size = msg_data_sz(hdr); u8 *data = msg_data(hdr); unsigned int keylen; /* Verify whether the size can exist in the packet */ if (unlikely(size < sizeof(struct tipc_aead_key) + TIPC_AEAD_KEYLEN_MIN)) { pr_debug("%s: message data size is too small\n", rx->name); goto exit; } keylen = ntohl(*((__be32 *)(data + TIPC_AEAD_ALG_NAME))); /* Verify the supplied size values */ if (unlikely(size != keylen + sizeof(struct tipc_aead_key) || keylen > TIPC_AEAD_KEY_SIZE_MAX)) { pr_debug("%s: invalid MSG_CRYPTO key size\n", rx->name); goto exit; } spin_lock(&rx->lock); if (unlikely(rx->skey || (key_gen == rx->key_gen && rx->key.keys))) { pr_err("%s: key existed <%p>, gen %d vs %d\n", rx->name, rx->skey, key_gen, rx->key_gen); goto exit_unlock; } /* Allocate memory for the key */ skey = kmalloc(size, GFP_ATOMIC); if (unlikely(!skey)) { pr_err("%s: unable to allocate memory for skey\n", rx->name); goto exit_unlock; } /* Copy key from msg data */ skey->keylen = keylen; memcpy(skey->alg_name, data, TIPC_AEAD_ALG_NAME); memcpy(skey->key, data + TIPC_AEAD_ALG_NAME + sizeof(__be32), skey->keylen); rx->key_gen = key_gen; rx->skey_mode = msg_key_mode(hdr); rx->skey = skey; rx->nokey = 0; mb(); /* for nokey flag */ exit_unlock: spin_unlock(&rx->lock); exit: /* Schedule the key attaching on this crypto */ if (likely(skey && queue_delayed_work(tx->wq, &rx->work, 0))) return true; return false; } /** * tipc_crypto_work_rx - Scheduled RX works handler * @work: the struct RX work * * The function processes the previous scheduled works i.e. distributing TX key * or attaching a received session key on RX crypto. */ static void tipc_crypto_work_rx(struct work_struct *work) { struct delayed_work *dwork = to_delayed_work(work); struct tipc_crypto *rx = container_of(dwork, struct tipc_crypto, work); struct tipc_crypto *tx = tipc_net(rx->net)->crypto_tx; unsigned long delay = msecs_to_jiffies(5000); bool resched = false; u8 key; int rc; /* Case 1: Distribute TX key to peer if scheduled */ if (atomic_cmpxchg(&rx->key_distr, KEY_DISTR_SCHED, KEY_DISTR_COMPL) == KEY_DISTR_SCHED) { /* Always pick the newest one for distributing */ key = tx->key.pending ?: tx->key.active; rc = tipc_crypto_key_distr(tx, key, rx->node); if (unlikely(rc)) pr_warn("%s: unable to distr key[%d] to %s, err %d\n", tx->name, key, tipc_node_get_id_str(rx->node), rc); /* Sched for key_distr releasing */ resched = true; } else { atomic_cmpxchg(&rx->key_distr, KEY_DISTR_COMPL, 0); } /* Case 2: Attach a pending received session key from peer if any */ if (rx->skey) { rc = tipc_crypto_key_init(rx, rx->skey, rx->skey_mode, false); if (unlikely(rc < 0)) pr_warn("%s: unable to attach received skey, err %d\n", rx->name, rc); switch (rc) { case -EBUSY: case -ENOMEM: /* Resched the key attaching */ resched = true; break; default: synchronize_rcu(); kfree(rx->skey); rx->skey = NULL; break; } } if (resched && queue_delayed_work(tx->wq, &rx->work, delay)) return; tipc_node_put(rx->node); } /** * tipc_crypto_rekeying_sched - (Re)schedule rekeying w/o new interval * @tx: TX crypto * @changed: if the rekeying needs to be rescheduled with new interval * @new_intv: new rekeying interval (when "changed" = true) */ void tipc_crypto_rekeying_sched(struct tipc_crypto *tx, bool changed, u32 new_intv) { unsigned long delay; bool now = false; if (changed) { if (new_intv == TIPC_REKEYING_NOW) now = true; else tx->rekeying_intv = new_intv; cancel_delayed_work_sync(&tx->work); } if (tx->rekeying_intv || now) { delay = (now) ? 0 : tx->rekeying_intv * 60 * 1000; queue_delayed_work(tx->wq, &tx->work, msecs_to_jiffies(delay)); } } /** * tipc_crypto_work_tx - Scheduled TX works handler * @work: the struct TX work * * The function processes the previous scheduled work, i.e. key rekeying, by * generating a new session key based on current one, then attaching it to the * TX crypto and finally distributing it to peers. It also re-schedules the * rekeying if needed. */ static void tipc_crypto_work_tx(struct work_struct *work) { struct delayed_work *dwork = to_delayed_work(work); struct tipc_crypto *tx = container_of(dwork, struct tipc_crypto, work); struct tipc_aead_key *skey = NULL; struct tipc_key key = tx->key; struct tipc_aead *aead; int rc = -ENOMEM; if (unlikely(key.pending)) goto resched; /* Take current key as a template */ rcu_read_lock(); aead = rcu_dereference(tx->aead[key.active ?: KEY_MASTER]); if (unlikely(!aead)) { rcu_read_unlock(); /* At least one key should exist for securing */ return; } /* Lets duplicate it first */ skey = kmemdup(aead->key, tipc_aead_key_size(aead->key), GFP_ATOMIC); rcu_read_unlock(); /* Now, generate new key, initiate & distribute it */ if (likely(skey)) { rc = tipc_aead_key_generate(skey) ?: tipc_crypto_key_init(tx, skey, PER_NODE_KEY, false); if (likely(rc > 0)) rc = tipc_crypto_key_distr(tx, rc, NULL); kfree_sensitive(skey); } if (unlikely(rc)) pr_warn_ratelimited("%s: rekeying returns %d\n", tx->name, rc); resched: /* Re-schedule rekeying if any */ tipc_crypto_rekeying_sched(tx, false, 0); } |
| 7 7 10 7 5 7 6 6 10 5 6 5 5 5 5 5 5 14 23 3 11 2 2 18 2 4 2 10 6 4 10 13 11 35 22 13 201 3 197 3 204 195 8 1 201 2 1 1 6 191 1 3 3 189 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | /* * memfd_create system call and file sealing support * * Code was originally included in shmem.c, and broken out to facilitate * use by hugetlbfs as well as tmpfs. * * This file is released under the GPL. */ #include <linux/fs.h> #include <linux/vfs.h> #include <linux/pagemap.h> #include <linux/file.h> #include <linux/mm.h> #include <linux/sched/signal.h> #include <linux/khugepaged.h> #include <linux/syscalls.h> #include <linux/hugetlb.h> #include <linux/shmem_fs.h> #include <linux/memfd.h> #include <linux/pid_namespace.h> #include <uapi/linux/memfd.h> /* * We need a tag: a new tag would expand every xa_node by 8 bytes, * so reuse a tag which we firmly believe is never set or cleared on tmpfs * or hugetlbfs because they are memory only filesystems. */ #define MEMFD_TAG_PINNED PAGECACHE_TAG_TOWRITE #define LAST_SCAN 4 /* about 150ms max */ static bool memfd_folio_has_extra_refs(struct folio *folio) { return folio_ref_count(folio) - folio_mapcount(folio) != folio_nr_pages(folio); } static void memfd_tag_pins(struct xa_state *xas) { struct folio *folio; int latency = 0; lru_add_drain(); xas_lock_irq(xas); xas_for_each(xas, folio, ULONG_MAX) { if (!xa_is_value(folio) && memfd_folio_has_extra_refs(folio)) xas_set_mark(xas, MEMFD_TAG_PINNED); if (++latency < XA_CHECK_SCHED) continue; latency = 0; xas_pause(xas); xas_unlock_irq(xas); cond_resched(); xas_lock_irq(xas); } xas_unlock_irq(xas); } /* * This is a helper function used by memfd_pin_user_pages() in GUP (gup.c). * It is mainly called to allocate a folio in a memfd when the caller * (memfd_pin_folios()) cannot find a folio in the page cache at a given * index in the mapping. */ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) { #ifdef CONFIG_HUGETLB_PAGE struct folio *folio; gfp_t gfp_mask; int err; if (is_file_hugepages(memfd)) { /* * The folio would most likely be accessed by a DMA driver, * therefore, we have zone memory constraints where we can * alloc from. Also, the folio will be pinned for an indefinite * amount of time, so it is not expected to be migrated away. */ gfp_mask = htlb_alloc_mask(hstate_file(memfd)); gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); folio = alloc_hugetlb_folio_nodemask(hstate_file(memfd), numa_node_id(), NULL, gfp_mask, false); if (folio && folio_try_get(folio)) { err = hugetlb_add_to_page_cache(folio, memfd->f_mapping, idx); if (err) { folio_put(folio); free_huge_folio(folio); return ERR_PTR(err); } return folio; } return ERR_PTR(-ENOMEM); } #endif return shmem_read_folio(memfd->f_mapping, idx); } /* * Setting SEAL_WRITE requires us to verify there's no pending writer. However, * via get_user_pages(), drivers might have some pending I/O without any active * user-space mappings (eg., direct-IO, AIO). Therefore, we look at all folios * and see whether it has an elevated ref-count. If so, we tag them and wait for * them to be dropped. * The caller must guarantee that no new user will acquire writable references * to those folios to avoid races. */ static int memfd_wait_for_pins(struct address_space *mapping) { XA_STATE(xas, &mapping->i_pages, 0); struct folio *folio; int error, scan; memfd_tag_pins(&xas); error = 0; for (scan = 0; scan <= LAST_SCAN; scan++) { int latency = 0; if (!xas_marked(&xas, MEMFD_TAG_PINNED)) break; if (!scan) lru_add_drain_all(); else if (schedule_timeout_killable((HZ << scan) / 200)) scan = LAST_SCAN; xas_set(&xas, 0); xas_lock_irq(&xas); xas_for_each_marked(&xas, folio, ULONG_MAX, MEMFD_TAG_PINNED) { bool clear = true; if (!xa_is_value(folio) && memfd_folio_has_extra_refs(folio)) { /* * On the last scan, we clean up all those tags * we inserted; but make a note that we still * found folios pinned. */ if (scan == LAST_SCAN) error = -EBUSY; else clear = false; } if (clear) xas_clear_mark(&xas, MEMFD_TAG_PINNED); if (++latency < XA_CHECK_SCHED) continue; latency = 0; xas_pause(&xas); xas_unlock_irq(&xas); cond_resched(); xas_lock_irq(&xas); } xas_unlock_irq(&xas); } return error; } static unsigned int *memfd_file_seals_ptr(struct file *file) { if (shmem_file(file)) return &SHMEM_I(file_inode(file))->seals; #ifdef CONFIG_HUGETLBFS if (is_file_hugepages(file)) return &HUGETLBFS_I(file_inode(file))->seals; #endif return NULL; } #define F_ALL_SEALS (F_SEAL_SEAL | \ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \ F_SEAL_WRITE | \ F_SEAL_FUTURE_WRITE) static int memfd_add_seals(struct file *file, unsigned int seals) { struct inode *inode = file_inode(file); unsigned int *file_seals; int error; /* * SEALING * Sealing allows multiple parties to share a tmpfs or hugetlbfs file * but restrict access to a specific subset of file operations. Seals * can only be added, but never removed. This way, mutually untrusted * parties can share common memory regions with a well-defined policy. * A malicious peer can thus never perform unwanted operations on a * shared object. * * Seals are only supported on special tmpfs or hugetlbfs files and * always affect the whole underlying inode. Once a seal is set, it * may prevent some kinds of access to the file. Currently, the * following seals are defined: * SEAL_SEAL: Prevent further seals from being set on this file * SEAL_SHRINK: Prevent the file from shrinking * SEAL_GROW: Prevent the file from growing * SEAL_WRITE: Prevent write access to the file * SEAL_EXEC: Prevent modification of the exec bits in the file mode * * As we don't require any trust relationship between two parties, we * must prevent seals from being removed. Therefore, sealing a file * only adds a given set of seals to the file, it never touches * existing seals. Furthermore, the "setting seals"-operation can be * sealed itself, which basically prevents any further seal from being * added. * * Semantics of sealing are only defined on volatile files. Only * anonymous tmpfs and hugetlbfs files support sealing. More * importantly, seals are never written to disk. Therefore, there's * no plan to support it on other file types. */ if (!(file->f_mode & FMODE_WRITE)) return -EPERM; if (seals & ~(unsigned int)F_ALL_SEALS) return -EINVAL; inode_lock(inode); file_seals = memfd_file_seals_ptr(file); if (!file_seals) { error = -EINVAL; goto unlock; } if (*file_seals & F_SEAL_SEAL) { error = -EPERM; goto unlock; } if ((seals & F_SEAL_WRITE) && !(*file_seals & F_SEAL_WRITE)) { error = mapping_deny_writable(file->f_mapping); if (error) goto unlock; error = memfd_wait_for_pins(file->f_mapping); if (error) { mapping_allow_writable(file->f_mapping); goto unlock; } } /* * SEAL_EXEC implys SEAL_WRITE, making W^X from the start. */ if (seals & F_SEAL_EXEC && inode->i_mode & 0111) seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE; *file_seals |= seals; error = 0; unlock: inode_unlock(inode); return error; } static int memfd_get_seals(struct file *file) { unsigned int *seals = memfd_file_seals_ptr(file); return seals ? *seals : -EINVAL; } long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg) { long error; switch (cmd) { case F_ADD_SEALS: error = memfd_add_seals(file, arg); break; case F_GET_SEALS: error = memfd_get_seals(file); break; default: error = -EINVAL; break; } return error; } #define MFD_NAME_PREFIX "memfd:" #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) #define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | MFD_NOEXEC_SEAL | MFD_EXEC) static int check_sysctl_memfd_noexec(unsigned int *flags) { #ifdef CONFIG_SYSCTL struct pid_namespace *ns = task_active_pid_ns(current); int sysctl = pidns_memfd_noexec_scope(ns); if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL) *flags |= MFD_NOEXEC_SEAL; else *flags |= MFD_EXEC; } if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) { pr_err_ratelimited( "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n", current->comm, task_pid_nr(current), sysctl); return -EACCES; } #endif return 0; } SYSCALL_DEFINE2(memfd_create, const char __user *, uname, unsigned int, flags) { unsigned int *file_seals; struct file *file; int fd, error; char *name; long len; if (!(flags & MFD_HUGETLB)) { if (flags & ~(unsigned int)MFD_ALL_FLAGS) return -EINVAL; } else { /* Allow huge page size encoding in flags. */ if (flags & ~(unsigned int)(MFD_ALL_FLAGS | (MFD_HUGE_MASK << MFD_HUGE_SHIFT))) return -EINVAL; } /* Invalid if both EXEC and NOEXEC_SEAL are set.*/ if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC_SEAL)) return -EINVAL; error = check_sysctl_memfd_noexec(&flags); if (error < 0) return error; /* length includes terminating zero */ len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); if (len <= 0) return -EFAULT; if (len > MFD_NAME_MAX_LEN + 1) return -EINVAL; name = kmalloc(len + MFD_NAME_PREFIX_LEN, GFP_KERNEL); if (!name) return -ENOMEM; strcpy(name, MFD_NAME_PREFIX); if (copy_from_user(&name[MFD_NAME_PREFIX_LEN], uname, len)) { error = -EFAULT; goto err_name; } /* terminating-zero may have changed after strnlen_user() returned */ if (name[len + MFD_NAME_PREFIX_LEN - 1]) { error = -EFAULT; goto err_name; } fd = get_unused_fd_flags((flags & MFD_CLOEXEC) ? O_CLOEXEC : 0); if (fd < 0) { error = fd; goto err_name; } if (flags & MFD_HUGETLB) { file = hugetlb_file_setup(name, 0, VM_NORESERVE, HUGETLB_ANONHUGE_INODE, (flags >> MFD_HUGE_SHIFT) & MFD_HUGE_MASK); } else file = shmem_file_setup(name, 0, VM_NORESERVE); if (IS_ERR(file)) { error = PTR_ERR(file); goto err_fd; } file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; file->f_flags |= O_LARGEFILE; if (flags & MFD_NOEXEC_SEAL) { struct inode *inode = file_inode(file); inode->i_mode &= ~0111; file_seals = memfd_file_seals_ptr(file); if (file_seals) { *file_seals &= ~F_SEAL_SEAL; *file_seals |= F_SEAL_EXEC; } } else if (flags & MFD_ALLOW_SEALING) { /* MFD_EXEC and MFD_ALLOW_SEALING are set */ file_seals = memfd_file_seals_ptr(file); if (file_seals) *file_seals &= ~F_SEAL_SEAL; } fd_install(fd, file); kfree(name); return fd; err_fd: put_unused_fd(fd); err_name: kfree(name); return error; } |
| 64 2 19 21 17 69 2 1 21 49 21 1 19 45 39 8 19 4 29 45 26 18 26 14 5 20 82 31 78 144 2 78 18 120 12 12 12 1 11 27 15 13 36 36 1 26 12 5 43 43 43 1 4 10 3 27 40 3 2 19 27 48 4 3 43 3 14 27 14 27 40 33 3 3 3 170 1 49 129 10 10 9 9 62 1 101 155 1 155 101 153 3 43 122 21 9 12 2 5 14 9 11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/ext4/file.c * * Copyright (C) 1992, 1993, 1994, 1995 * Remy Card (card@masi.ibp.fr) * Laboratoire MASI - Institut Blaise Pascal * Universite Pierre et Marie Curie (Paris VI) * * from * * linux/fs/minix/file.c * * Copyright (C) 1991, 1992 Linus Torvalds * * ext4 fs regular file handling primitives * * 64-bit file support on 64-bit platforms by Jakub Jelinek * (jj@sunsite.ms.mff.cuni.cz) */ #include <linux/time.h> #include <linux/fs.h> #include <linux/iomap.h> #include <linux/mount.h> #include <linux/path.h> #include <linux/dax.h> #include <linux/quotaops.h> #include <linux/pagevec.h> #include <linux/uio.h> #include <linux/mman.h> #include <linux/backing-dev.h> #include "ext4.h" #include "ext4_jbd2.h" #include "xattr.h" #include "acl.h" #include "truncate.h" /* * Returns %true if the given DIO request should be attempted with DIO, or * %false if it should fall back to buffered I/O. * * DIO isn't well specified; when it's unsupported (either due to the request * being misaligned, or due to the file not supporting DIO at all), filesystems * either fall back to buffered I/O or return EINVAL. For files that don't use * any special features like encryption or verity, ext4 has traditionally * returned EINVAL for misaligned DIO. iomap_dio_rw() uses this convention too. * In this case, we should attempt the DIO, *not* fall back to buffered I/O. * * In contrast, in cases where DIO is unsupported due to ext4 features, ext4 * traditionally falls back to buffered I/O. * * This function implements the traditional ext4 behavior in all these cases. */ static bool ext4_should_use_dio(struct kiocb *iocb, struct iov_iter *iter) { struct inode *inode = file_inode(iocb->ki_filp); u32 dio_align = ext4_dio_alignment(inode); if (dio_align == 0) return false; if (dio_align == 1) return true; return IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), dio_align); } static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) { ssize_t ret; struct inode *inode = file_inode(iocb->ki_filp); if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock_shared(inode)) return -EAGAIN; } else { inode_lock_shared(inode); } if (!ext4_should_use_dio(iocb, to)) { inode_unlock_shared(inode); /* * Fallback to buffered I/O if the operation being performed on * the inode is not supported by direct I/O. The IOCB_DIRECT * flag needs to be cleared here in order to ensure that the * direct I/O path within generic_file_read_iter() is not * taken. */ iocb->ki_flags &= ~IOCB_DIRECT; return generic_file_read_iter(iocb, to); } ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0, NULL, 0); inode_unlock_shared(inode); file_accessed(iocb->ki_filp); return ret; } #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock_shared(inode)) return -EAGAIN; } else { inode_lock_shared(inode); } /* * Recheck under inode lock - at this point we are sure it cannot * change anymore */ if (!IS_DAX(inode)) { inode_unlock_shared(inode); /* Fallback to buffered IO in case we cannot support DAX */ return generic_file_read_iter(iocb, to); } ret = dax_iomap_rw(iocb, to, &ext4_iomap_ops); inode_unlock_shared(inode); file_accessed(iocb->ki_filp); return ret; } #endif static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); if (unlikely(ext4_forced_shutdown(inode->i_sb))) return -EIO; if (!iov_iter_count(to)) return 0; /* skip atime */ #ifdef CONFIG_FS_DAX if (IS_DAX(inode)) return ext4_dax_read_iter(iocb, to); #endif if (iocb->ki_flags & IOCB_DIRECT) return ext4_dio_read_iter(iocb, to); return generic_file_read_iter(iocb, to); } static ssize_t ext4_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { struct inode *inode = file_inode(in); if (unlikely(ext4_forced_shutdown(inode->i_sb))) return -EIO; return filemap_splice_read(in, ppos, pipe, len, flags); } /* * Called when an inode is released. Note that this is different * from ext4_file_open: open gets called at every open, but release * gets called only when /all/ the files are closed. */ static int ext4_release_file(struct inode *inode, struct file *filp) { if (ext4_test_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE)) { ext4_alloc_da_blocks(inode); ext4_clear_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE); } /* if we are the last writer on the inode, drop the block reservation */ if ((filp->f_mode & FMODE_WRITE) && (atomic_read(&inode->i_writecount) == 1) && !EXT4_I(inode)->i_reserved_data_blocks) { down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode); up_write(&EXT4_I(inode)->i_data_sem); } if (is_dx(inode) && filp->private_data) ext4_htree_free_dir_info(filp->private_data); return 0; } /* * This tests whether the IO in question is block-aligned or not. * Ext4 utilizes unwritten extents when hole-filling during direct IO, and they * are converted to written only after the IO is complete. Until they are * mapped, these blocks appear as holes, so dio_zero_block() will assume that * it needs to zero out portions of the start and/or end block. If 2 AIO * threads are at work on the same unwritten block, they must be synchronized * or one thread will zero the other's data, causing corruption. */ static bool ext4_unaligned_io(struct inode *inode, struct iov_iter *from, loff_t pos) { struct super_block *sb = inode->i_sb; unsigned long blockmask = sb->s_blocksize - 1; if ((pos | iov_iter_alignment(from)) & blockmask) return true; return false; } static bool ext4_extending_io(struct inode *inode, loff_t offset, size_t len) { if (offset + len > i_size_read(inode) || offset + len > EXT4_I(inode)->i_disksize) return true; return false; } /* Is IO overwriting allocated or initialized blocks? */ static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len, bool *unwritten) { struct ext4_map_blocks map; unsigned int blkbits = inode->i_blkbits; int err, blklen; if (pos + len > i_size_read(inode)) return false; map.m_lblk = pos >> blkbits; map.m_len = EXT4_MAX_BLOCKS(len, pos, blkbits); blklen = map.m_len; err = ext4_map_blocks(NULL, inode, &map, 0); if (err != blklen) return false; /* * 'err==len' means that all of the blocks have been preallocated, * regardless of whether they have been initialized or not. We need to * check m_flags to distinguish the unwritten extents. */ *unwritten = !(map.m_flags & EXT4_MAP_MAPPED); return true; } static ssize_t ext4_generic_write_checks(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; if (unlikely(IS_IMMUTABLE(inode))) return -EPERM; ret = generic_write_checks(iocb, from); if (ret <= 0) return ret; /* * If we have encountered a bitmap-format file, the size limit * is smaller than s_maxbytes, which is for extent-mapped files. */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); if (iocb->ki_pos >= sbi->s_bitmap_maxbytes) return -EFBIG; iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos); } return iov_iter_count(from); } static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) { ssize_t ret, count; count = ext4_generic_write_checks(iocb, from); if (count <= 0) return count; ret = file_modified(iocb->ki_filp); if (ret) return ret; return count; } static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { ssize_t ret; struct inode *inode = file_inode(iocb->ki_filp); if (iocb->ki_flags & IOCB_NOWAIT) return -EOPNOTSUPP; inode_lock(inode); ret = ext4_write_checks(iocb, from); if (ret <= 0) goto out; ret = generic_perform_write(iocb, from); out: inode_unlock(inode); if (unlikely(ret <= 0)) return ret; return generic_write_sync(iocb, ret); } static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, ssize_t count) { handle_t *handle; lockdep_assert_held_write(&inode->i_rwsem); handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) return PTR_ERR(handle); if (ext4_update_inode_size(inode, offset + count)) { int ret = ext4_mark_inode_dirty(handle, inode); if (unlikely(ret)) { ext4_journal_stop(handle); return ret; } } if (inode->i_nlink) ext4_orphan_del(handle, inode); ext4_journal_stop(handle); return count; } /* * Clean up the inode after DIO or DAX extending write has completed and the * inode size has been updated using ext4_handle_inode_extension(). */ static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count) { lockdep_assert_held_write(&inode->i_rwsem); if (count < 0) { ext4_truncate_failed_write(inode); /* * If the truncate operation failed early, then the inode may * still be on the orphan list. In that case, we need to try * remove the inode from the in-memory linked list. */ if (inode->i_nlink) ext4_orphan_del(NULL, inode); return; } /* * If i_disksize got extended either due to writeback of delalloc * blocks or extending truncate while the DIO was running we could fail * to cleanup the orphan list in ext4_handle_inode_extension(). Do it * now. */ if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) { handle_t *handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) { /* * The write has successfully completed. Not much to * do with the error here so just cleanup the orphan * list and hope for the best. */ ext4_orphan_del(NULL, inode); return; } ext4_orphan_del(handle, inode); ext4_journal_stop(handle); } } static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size, int error, unsigned int flags) { loff_t pos = iocb->ki_pos; struct inode *inode = file_inode(iocb->ki_filp); if (!error && size && flags & IOMAP_DIO_UNWRITTEN) error = ext4_convert_unwritten_extents(NULL, inode, pos, size); if (error) return error; /* * Note that EXT4_I(inode)->i_disksize can get extended up to * inode->i_size while the I/O was running due to writeback of delalloc * blocks. But the code in ext4_iomap_alloc() is careful to use * zeroed/unwritten extents if this is possible; thus we won't leave * uninitialized blocks in a file even if we didn't succeed in writing * as much as we intended. Also we can race with truncate or write * expanding the file so we have to be a bit careful here. */ if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize) && pos + size <= i_size_read(inode)) return size; return ext4_handle_inode_extension(inode, pos, size); } static const struct iomap_dio_ops ext4_dio_write_ops = { .end_io = ext4_dio_write_end_io, }; /* * The intention here is to start with shared lock acquired then see if any * condition requires an exclusive inode lock. If yes, then we restart the * whole operation by releasing the shared lock and acquiring exclusive lock. * * - For unaligned_io we never take shared lock as it may cause data corruption * when two unaligned IO tries to modify the same block e.g. while zeroing. * * - For extending writes case we don't take the shared lock, since it requires * updating inode i_disksize and/or orphan handling with exclusive lock. * * - shared locking will only be true mostly with overwrites, including * initialized blocks and unwritten blocks. For overwrite unwritten blocks * we protect splitting extents by i_data_sem in ext4_inode_info, so we can * also release exclusive i_rwsem lock. * * - Otherwise we will switch to exclusive i_rwsem lock. */ static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from, bool *ilock_shared, bool *extend, bool *unwritten, int *dio_flags) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); loff_t offset; size_t count; ssize_t ret; bool overwrite, unaligned_io; restart: ret = ext4_generic_write_checks(iocb, from); if (ret <= 0) goto out; offset = iocb->ki_pos; count = ret; unaligned_io = ext4_unaligned_io(inode, from, offset); *extend = ext4_extending_io(inode, offset, count); overwrite = ext4_overwrite_io(inode, offset, count, unwritten); /* * Determine whether we need to upgrade to an exclusive lock. This is * required to change security info in file_modified(), for extending * I/O, any form of non-overwrite I/O, and unaligned I/O to unwritten * extents (as partial block zeroing may be required). * * Note that unaligned writes are allowed under shared lock so long as * they are pure overwrites. Otherwise, concurrent unaligned writes risk * data corruption due to partial block zeroing in the dio layer, and so * the I/O must occur exclusively. */ if (*ilock_shared && ((!IS_NOSEC(inode) || *extend || !overwrite || (unaligned_io && *unwritten)))) { if (iocb->ki_flags & IOCB_NOWAIT) { ret = -EAGAIN; goto out; } inode_unlock_shared(inode); *ilock_shared = false; inode_lock(inode); goto restart; } /* * Now that locking is settled, determine dio flags and exclusivity * requirements. We don't use DIO_OVERWRITE_ONLY because we enforce * behavior already. The inode lock is already held exclusive if the * write is non-overwrite or extending, so drain all outstanding dio and * set the force wait dio flag. */ if (!*ilock_shared && (unaligned_io || *extend)) { if (iocb->ki_flags & IOCB_NOWAIT) { ret = -EAGAIN; goto out; } if (unaligned_io && (!overwrite || *unwritten)) inode_dio_wait(inode); *dio_flags = IOMAP_DIO_FORCE_WAIT; } ret = file_modified(file); if (ret < 0) goto out; return count; out: if (*ilock_shared) inode_unlock_shared(inode); else inode_unlock(inode); return ret; } static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) { ssize_t ret; handle_t *handle; struct inode *inode = file_inode(iocb->ki_filp); loff_t offset = iocb->ki_pos; size_t count = iov_iter_count(from); const struct iomap_ops *iomap_ops = &ext4_iomap_ops; bool extend = false, unwritten = false; bool ilock_shared = true; int dio_flags = 0; /* * Quick check here without any i_rwsem lock to see if it is extending * IO. A more reliable check is done in ext4_dio_write_checks() with * proper locking in place. */ if (offset + count > i_size_read(inode)) ilock_shared = false; if (iocb->ki_flags & IOCB_NOWAIT) { if (ilock_shared) { if (!inode_trylock_shared(inode)) return -EAGAIN; } else { if (!inode_trylock(inode)) return -EAGAIN; } } else { if (ilock_shared) inode_lock_shared(inode); else inode_lock(inode); } /* Fallback to buffered I/O if the inode does not support direct I/O. */ if (!ext4_should_use_dio(iocb, from)) { if (ilock_shared) inode_unlock_shared(inode); else inode_unlock(inode); return ext4_buffered_write_iter(iocb, from); } /* * Prevent inline data from being created since we are going to allocate * blocks for DIO. We know the inode does not currently have inline data * because ext4_should_use_dio() checked for it, but we have to clear * the state flag before the write checks because a lock cycle could * introduce races with other writers. */ ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend, &unwritten, &dio_flags); if (ret <= 0) return ret; offset = iocb->ki_pos; count = ret; if (extend) { handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) { ret = PTR_ERR(handle); goto out; } ret = ext4_orphan_add(handle, inode); if (ret) { ext4_journal_stop(handle); goto out; } ext4_journal_stop(handle); } if (ilock_shared && !unwritten) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, dio_flags, NULL, 0); if (ret == -ENOTBLK) ret = 0; if (extend) { /* * We always perform extending DIO write synchronously so by * now the IO is completed and ext4_handle_inode_extension() * was called. Cleanup the inode in case of error or race with * writeback of delalloc blocks. */ WARN_ON_ONCE(ret == -EIOCBQUEUED); ext4_inode_extension_cleanup(inode, ret); } out: if (ilock_shared) inode_unlock_shared(inode); else inode_unlock(inode); if (ret >= 0 && iov_iter_count(from)) { ssize_t err; loff_t endbyte; offset = iocb->ki_pos; err = ext4_buffered_write_iter(iocb, from); if (err < 0) return err; /* * We need to ensure that the pages within the page cache for * the range covered by this I/O are written to disk and * invalidated. This is in attempt to preserve the expected * direct I/O semantics in the case we fallback to buffered I/O * to complete off the I/O request. */ ret += err; endbyte = offset + err - 1; err = filemap_write_and_wait_range(iocb->ki_filp->f_mapping, offset, endbyte); if (!err) invalidate_mapping_pages(iocb->ki_filp->f_mapping, offset >> PAGE_SHIFT, endbyte >> PAGE_SHIFT); } return ret; } #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { ssize_t ret; size_t count; loff_t offset; handle_t *handle; bool extend = false; struct inode *inode = file_inode(iocb->ki_filp); if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock(inode)) return -EAGAIN; } else { inode_lock(inode); } ret = ext4_write_checks(iocb, from); if (ret <= 0) goto out; offset = iocb->ki_pos; count = iov_iter_count(from); if (offset + count > EXT4_I(inode)->i_disksize) { handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) { ret = PTR_ERR(handle); goto out; } ret = ext4_orphan_add(handle, inode); if (ret) { ext4_journal_stop(handle); goto out; } extend = true; ext4_journal_stop(handle); } ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); if (extend) { ret = ext4_handle_inode_extension(inode, offset, ret); ext4_inode_extension_cleanup(inode, ret); } out: inode_unlock(inode); if (ret > 0) ret = generic_write_sync(iocb, ret); return ret; } #endif static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); if (unlikely(ext4_forced_shutdown(inode->i_sb))) return -EIO; #ifdef CONFIG_FS_DAX if (IS_DAX(inode)) return ext4_dax_write_iter(iocb, from); #endif if (iocb->ki_flags & IOCB_DIRECT) return ext4_dio_write_iter(iocb, from); else return ext4_buffered_write_iter(iocb, from); } #ifdef CONFIG_FS_DAX static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, unsigned int order) { int error = 0; vm_fault_t result; int retries = 0; handle_t *handle = NULL; struct inode *inode = file_inode(vmf->vma->vm_file); struct super_block *sb = inode->i_sb; /* * We have to distinguish real writes from writes which will result in a * COW page; COW writes should *not* poke the journal (the file will not * be changed). Doing so would cause unintended failures when mounted * read-only. * * We check for VM_SHARED rather than vmf->cow_page since the latter is * unset for order != 0 (i.e. only in do_cow_fault); for * other sizes, dax_iomap_fault will handle splitting / fallback so that * we eventually come back with a COW page. */ bool write = (vmf->flags & FAULT_FLAG_WRITE) && (vmf->vma->vm_flags & VM_SHARED); struct address_space *mapping = vmf->vma->vm_file->f_mapping; pfn_t pfn; if (write) { sb_start_pagefault(sb); file_update_time(vmf->vma->vm_file); filemap_invalidate_lock_shared(mapping); retry: handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE, EXT4_DATA_TRANS_BLOCKS(sb)); if (IS_ERR(handle)) { filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); return VM_FAULT_SIGBUS; } } else { filemap_invalidate_lock_shared(mapping); } result = dax_iomap_fault(vmf, order, &pfn, &error, &ext4_iomap_ops); if (write) { ext4_journal_stop(handle); if ((result & VM_FAULT_ERROR) && error == -ENOSPC && ext4_should_retry_alloc(sb, &retries)) goto retry; /* Handling synchronous page fault? */ if (result & VM_FAULT_NEEDDSYNC) result = dax_finish_sync_fault(vmf, order, pfn); filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); } else { filemap_invalidate_unlock_shared(mapping); } return result; } static vm_fault_t ext4_dax_fault(struct vm_fault *vmf) { return ext4_dax_huge_fault(vmf, 0); } static const struct vm_operations_struct ext4_dax_vm_ops = { .fault = ext4_dax_fault, .huge_fault = ext4_dax_huge_fault, .page_mkwrite = ext4_dax_fault, .pfn_mkwrite = ext4_dax_fault, }; #else #define ext4_dax_vm_ops ext4_file_vm_ops #endif static const struct vm_operations_struct ext4_file_vm_ops = { .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = ext4_page_mkwrite, }; static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file->f_mapping->host; struct dax_device *dax_dev = EXT4_SB(inode->i_sb)->s_daxdev; if (unlikely(ext4_forced_shutdown(inode->i_sb))) return -EIO; /* * We don't support synchronous mappings for non-DAX files and * for DAX files if underneath dax_device is not synchronous. */ if (!daxdev_mapping_supported(vma, dax_dev)) return -EOPNOTSUPP; file_accessed(file); if (IS_DAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; vm_flags_set(vma, VM_HUGEPAGE); } else { vma->vm_ops = &ext4_file_vm_ops; } return 0; } static int ext4_sample_last_mounted(struct super_block *sb, struct vfsmount *mnt) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct path path; char buf[64], *cp; handle_t *handle; int err; if (likely(ext4_test_mount_flag(sb, EXT4_MF_MNTDIR_SAMPLED))) return 0; if (sb_rdonly(sb) || !sb_start_intwrite_trylock(sb)) return 0; ext4_set_mount_flag(sb, EXT4_MF_MNTDIR_SAMPLED); /* * Sample where the filesystem has been mounted and * store it in the superblock for sysadmin convenience * when trying to sort through large numbers of block * devices or filesystem images. */ memset(buf, 0, sizeof(buf)); path.mnt = mnt; path.dentry = mnt->mnt_root; cp = d_path(&path, buf, sizeof(buf)); err = 0; if (IS_ERR(cp)) goto out; handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1); err = PTR_ERR(handle); if (IS_ERR(handle)) goto out; BUFFER_TRACE(sbi->s_sbh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, sbi->s_sbh, EXT4_JTR_NONE); if (err) goto out_journal; lock_buffer(sbi->s_sbh); strtomem_pad(sbi->s_es->s_last_mounted, cp, 0); ext4_superblock_csum_set(sb); unlock_buffer(sbi->s_sbh); ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); out_journal: ext4_journal_stop(handle); out: sb_end_intwrite(sb); return err; } static int ext4_file_open(struct inode *inode, struct file *filp) { int ret; if (unlikely(ext4_forced_shutdown(inode->i_sb))) return -EIO; ret = ext4_sample_last_mounted(inode->i_sb, filp->f_path.mnt); if (ret) return ret; ret = fscrypt_file_open(inode, filp); if (ret) return ret; ret = fsverity_file_open(inode, filp); if (ret) return ret; /* * Set up the jbd2_inode if we are opening the inode for * writing and the journal is present */ if (filp->f_mode & FMODE_WRITE) { ret = ext4_inode_attach_jinode(inode); if (ret < 0) return ret; } filp->f_mode |= FMODE_NOWAIT | FMODE_CAN_ODIRECT; return dquot_file_open(inode, filp); } /* * ext4_llseek() handles both block-mapped and extent-mapped maxbytes values * by calling generic_file_llseek_size() with the appropriate maxbytes * value for each. */ loff_t ext4_llseek(struct file *file, loff_t offset, int whence) { struct inode *inode = file->f_mapping->host; loff_t maxbytes; if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes; else maxbytes = inode->i_sb->s_maxbytes; switch (whence) { default: return generic_file_llseek_size(file, offset, whence, maxbytes, i_size_read(inode)); case SEEK_HOLE: inode_lock_shared(inode); offset = iomap_seek_hole(inode, offset, &ext4_iomap_report_ops); inode_unlock_shared(inode); break; case SEEK_DATA: inode_lock_shared(inode); offset = iomap_seek_data(inode, offset, &ext4_iomap_report_ops); inode_unlock_shared(inode); break; } if (offset < 0) return offset; return vfs_setpos(file, offset, maxbytes); } const struct file_operations ext4_file_operations = { .llseek = ext4_llseek, .read_iter = ext4_file_read_iter, .write_iter = ext4_file_write_iter, .iopoll = iocb_bio_iopoll, .unlocked_ioctl = ext4_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = ext4_compat_ioctl, #endif .mmap = ext4_file_mmap, .open = ext4_file_open, .release = ext4_release_file, .fsync = ext4_sync_file, .get_unmapped_area = thp_get_unmapped_area, .splice_read = ext4_file_splice_read, .splice_write = iter_file_splice_write, .fallocate = ext4_fallocate, .fop_flags = FOP_MMAP_SYNC | FOP_BUFFER_RASYNC | FOP_DIO_PARALLEL_WRITE, }; const struct inode_operations ext4_file_inode_operations = { .setattr = ext4_setattr, .getattr = ext4_file_getattr, .listxattr = ext4_listxattr, .get_inode_acl = ext4_get_acl, .set_acl = ext4_set_acl, .fiemap = ext4_fiemap, .fileattr_get = ext4_fileattr_get, .fileattr_set = ext4_fileattr_set, }; |
| 36 58 50 27 56 11 11 7 9 41 38 42 33 41 14 4 41 38 4 29 6 35 1 3 14 23 11 7 14 5 28 36 27 42 42 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/isofs/dir.c * * (C) 1992, 1993, 1994 Eric Youngdale Modified for ISO 9660 filesystem. * * (C) 1991 Linus Torvalds - minix filesystem * * Steve Beynon : Missing last directory entries fixed * (stephen@askone.demon.co.uk) : 21st June 1996 * * isofs directory handling functions */ #include <linux/gfp.h> #include "isofs.h" int isofs_name_translate(struct iso_directory_record *de, char *new, struct inode *inode) { char * old = de->name; int len = de->name_len[0]; int i; for (i = 0; i < len; i++) { unsigned char c = old[i]; if (!c) break; if (c >= 'A' && c <= 'Z') c |= 0x20; /* lower case */ /* Drop trailing '.;1' (ISO 9660:1988 7.5.1 requires period) */ if (c == '.' && i == len - 3 && old[i + 1] == ';' && old[i + 2] == '1') break; /* Drop trailing ';1' */ if (c == ';' && i == len - 2 && old[i + 1] == '1') break; /* Convert remaining ';' to '.' */ /* Also '/' to '.' (broken Acorn-generated ISO9660 images) */ if (c == ';' || c == '/') c = '.'; new[i] = c; } return i; } /* Acorn extensions written by Matthew Wilcox <willy@infradead.org> 1998 */ int get_acorn_filename(struct iso_directory_record *de, char *retname, struct inode *inode) { int std; unsigned char *chr; int retnamlen = isofs_name_translate(de, retname, inode); if (retnamlen == 0) return 0; std = sizeof(struct iso_directory_record) + de->name_len[0]; if (std & 1) std++; if (de->length[0] - std != 32) return retnamlen; chr = ((unsigned char *) de) + std; if (strncmp(chr, "ARCHIMEDES", 10)) return retnamlen; if ((*retname == '_') && ((chr[19] & 1) == 1)) *retname = '!'; if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff) && ((chr[12] & 0xf0) == 0xf0)) { retname[retnamlen] = ','; sprintf(retname+retnamlen+1, "%3.3x", ((chr[12] & 0xf) << 8) | chr[11]); retnamlen += 4; } return retnamlen; } /* * This should _really_ be cleaned up some day.. */ static int do_isofs_readdir(struct inode *inode, struct file *file, struct dir_context *ctx, char *tmpname, struct iso_directory_record *tmpde) { unsigned long bufsize = ISOFS_BUFFER_SIZE(inode); unsigned char bufbits = ISOFS_BUFFER_BITS(inode); unsigned long block, offset, block_saved, offset_saved; unsigned long inode_number = 0; /* Quiet GCC */ struct buffer_head *bh = NULL; int len; int map; int first_de = 1; char *p = NULL; /* Quiet GCC */ struct iso_directory_record *de; struct isofs_sb_info *sbi = ISOFS_SB(inode->i_sb); offset = ctx->pos & (bufsize - 1); block = ctx->pos >> bufbits; while (ctx->pos < inode->i_size) { int de_len; if (!bh) { bh = isofs_bread(inode, block); if (!bh) return 0; } de = (struct iso_directory_record *) (bh->b_data + offset); de_len = *(unsigned char *)de; /* * If the length byte is zero, we should move on to the next * CDROM sector. If we are at the end of the directory, we * kick out of the while loop. */ if (de_len == 0) { brelse(bh); bh = NULL; ctx->pos = (ctx->pos + ISOFS_BLOCK_SIZE) & ~(ISOFS_BLOCK_SIZE - 1); block = ctx->pos >> bufbits; offset = 0; continue; } block_saved = block; offset_saved = offset; offset += de_len; /* Make sure we have a full directory entry */ if (offset >= bufsize) { int slop = bufsize - offset + de_len; memcpy(tmpde, de, slop); offset &= bufsize - 1; block++; brelse(bh); bh = NULL; if (offset) { bh = isofs_bread(inode, block); if (!bh) return 0; memcpy((void *) tmpde + slop, bh->b_data, offset); } de = tmpde; } /* Basic sanity check, whether name doesn't exceed dir entry */ if (de_len < de->name_len[0] + sizeof(struct iso_directory_record)) { printk(KERN_NOTICE "iso9660: Corrupted directory entry" " in block %lu of inode %lu\n", block, inode->i_ino); brelse(bh); return -EIO; } if (first_de) { isofs_normalize_block_and_offset(de, &block_saved, &offset_saved); inode_number = isofs_get_ino(block_saved, offset_saved, bufbits); } if (de->flags[-sbi->s_high_sierra] & 0x80) { first_de = 0; ctx->pos += de_len; continue; } first_de = 1; /* Handle the case of the '.' directory */ if (de->name_len[0] == 1 && de->name[0] == 0) { if (!dir_emit_dot(file, ctx)) break; ctx->pos += de_len; continue; } len = 0; /* Handle the case of the '..' directory */ if (de->name_len[0] == 1 && de->name[0] == 1) { if (!dir_emit_dotdot(file, ctx)) break; ctx->pos += de_len; continue; } /* Handle everything else. Do name translation if there is no Rock Ridge NM field. */ /* * Do not report hidden files if so instructed, or associated * files unless instructed to do so */ if ((sbi->s_hide && (de->flags[-sbi->s_high_sierra] & 1)) || (!sbi->s_showassoc && (de->flags[-sbi->s_high_sierra] & 4))) { ctx->pos += de_len; continue; } map = 1; if (sbi->s_rock) { len = get_rock_ridge_filename(de, tmpname, inode); if (len != 0) { /* may be -1 */ p = tmpname; map = 0; } } if (map) { #ifdef CONFIG_JOLIET if (sbi->s_joliet_level) { len = get_joliet_filename(de, tmpname, inode); p = tmpname; } else #endif if (sbi->s_mapping == 'a') { len = get_acorn_filename(de, tmpname, inode); p = tmpname; } else if (sbi->s_mapping == 'n') { len = isofs_name_translate(de, tmpname, inode); p = tmpname; } else { p = de->name; len = de->name_len[0]; } } if (len > 0) { if (!dir_emit(ctx, p, len, inode_number, DT_UNKNOWN)) break; } ctx->pos += de_len; } if (bh) brelse(bh); return 0; } /* * Handle allocation of temporary space for name translation and * handling split directory entries.. The real work is done by * "do_isofs_readdir()". */ static int isofs_readdir(struct file *file, struct dir_context *ctx) { int result; char *tmpname; struct iso_directory_record *tmpde; struct inode *inode = file_inode(file); tmpname = (char *)__get_free_page(GFP_KERNEL); if (tmpname == NULL) return -ENOMEM; tmpde = (struct iso_directory_record *) (tmpname+1024); result = do_isofs_readdir(inode, file, ctx, tmpname, tmpde); free_page((unsigned long) tmpname); return result; } const struct file_operations isofs_dir_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate_shared = isofs_readdir, }; /* * directories can handle most operations... */ const struct inode_operations isofs_dir_inode_operations = { .lookup = isofs_lookup, }; |
| 4 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | // SPDX-License-Identifier: GPL-2.0+ #include <linux/errno.h> #include <linux/ioport.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/serial.h> #include <linux/serial_8250.h> #include "8250.h" #define PORT_RSA_MAX 4 static unsigned long probe_rsa[PORT_RSA_MAX]; static unsigned int probe_rsa_count; static int rsa8250_request_resource(struct uart_8250_port *up) { unsigned long start = UART_RSA_BASE << up->port.regshift; unsigned int size = 8 << up->port.regshift; struct uart_port *port = &up->port; int ret = -EINVAL; switch (port->iotype) { case UPIO_HUB6: case UPIO_PORT: start += port->iobase; if (request_region(start, size, "serial-rsa")) ret = 0; else ret = -EBUSY; break; } return ret; } static void rsa8250_release_resource(struct uart_8250_port *up) { unsigned long offset = UART_RSA_BASE << up->port.regshift; unsigned int size = 8 << up->port.regshift; struct uart_port *port = &up->port; switch (port->iotype) { case UPIO_HUB6: case UPIO_PORT: release_region(port->iobase + offset, size); break; } } static void univ8250_config_port(struct uart_port *port, int flags) { struct uart_8250_port *up = up_to_u8250p(port); unsigned int i; up->probe &= ~UART_PROBE_RSA; if (port->type == PORT_RSA) { if (rsa8250_request_resource(up) == 0) up->probe |= UART_PROBE_RSA; } else if (flags & UART_CONFIG_TYPE) { for (i = 0; i < probe_rsa_count; i++) { if (probe_rsa[i] == up->port.iobase) { if (rsa8250_request_resource(up) == 0) up->probe |= UART_PROBE_RSA; break; } } } univ8250_port_base_ops->config_port(port, flags); if (port->type != PORT_RSA && up->probe & UART_PROBE_RSA) rsa8250_release_resource(up); } static int univ8250_request_port(struct uart_port *port) { struct uart_8250_port *up = up_to_u8250p(port); int ret; ret = univ8250_port_base_ops->request_port(port); if (ret == 0 && port->type == PORT_RSA) { ret = rsa8250_request_resource(up); if (ret < 0) univ8250_port_base_ops->release_port(port); } return ret; } static void univ8250_release_port(struct uart_port *port) { struct uart_8250_port *up = up_to_u8250p(port); if (port->type == PORT_RSA) rsa8250_release_resource(up); univ8250_port_base_ops->release_port(port); } void univ8250_rsa_support(struct uart_ops *ops) { ops->config_port = univ8250_config_port; ops->request_port = univ8250_request_port; ops->release_port = univ8250_release_port; } module_param_hw_array(probe_rsa, ulong, ioport, &probe_rsa_count, 0444); MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA"); #ifdef CONFIG_SERIAL_8250_DEPRECATED_OPTIONS #ifndef MODULE /* * Keep the old "8250" name working as well for the module options so we don't * break people. We need to keep the names identical and the convenient macros * will happily refuse to let us do that by failing the build with redefinition * errors of global variables. So we stick them inside a dummy function to * avoid those conflicts. The options still get parsed, and the redefined * MODULE_PARAM_PREFIX lets us keep the "8250." syntax alive. * * This is hacky. I'm sorry. */ static void __used rsa8250_options(void) { #undef MODULE_PARAM_PREFIX #define MODULE_PARAM_PREFIX "8250_core." __module_param_call(MODULE_PARAM_PREFIX, probe_rsa, ¶m_array_ops, .arr = &__param_arr_probe_rsa, 0444, -1, 0); } #endif #endif |
| 67 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Media device node * * Copyright (C) 2010 Nokia Corporation * * Contacts: Laurent Pinchart <laurent.pinchart@ideasonboard.com> * Sakari Ailus <sakari.ailus@iki.fi> * * -- * * Common functions for media-related drivers to register and unregister media * device nodes. */ #ifndef _MEDIA_DEVNODE_H #define _MEDIA_DEVNODE_H #include <linux/poll.h> #include <linux/fs.h> #include <linux/device.h> #include <linux/cdev.h> struct media_device; /* * Flag to mark the media_devnode struct as registered. Drivers must not touch * this flag directly, it will be set and cleared by media_devnode_register and * media_devnode_unregister. */ #define MEDIA_FLAG_REGISTERED 0 /** * struct media_file_operations - Media device file operations * * @owner: should be filled with %THIS_MODULE * @read: pointer to the function that implements read() syscall * @write: pointer to the function that implements write() syscall * @poll: pointer to the function that implements poll() syscall * @ioctl: pointer to the function that implements ioctl() syscall * @compat_ioctl: pointer to the function that will handle 32 bits userspace * calls to the ioctl() syscall on a Kernel compiled with 64 bits. * @open: pointer to the function that implements open() syscall * @release: pointer to the function that will release the resources allocated * by the @open function. */ struct media_file_operations { struct module *owner; ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*open) (struct file *); int (*release) (struct file *); }; /** * struct media_devnode - Media device node * @media_dev: pointer to struct &media_device * @fops: pointer to struct &media_file_operations with media device ops * @dev: pointer to struct &device containing the media controller device * @cdev: struct cdev pointer character device * @parent: parent device * @minor: device node minor number * @flags: flags, combination of the ``MEDIA_FLAG_*`` constants * @release: release callback called at the end of ``media_devnode_release()`` * routine at media-device.c. * * This structure represents a media-related device node. * * The @parent is a physical device. It must be set by core or device drivers * before registering the node. */ struct media_devnode { struct media_device *media_dev; /* device ops */ const struct media_file_operations *fops; /* sysfs */ struct device dev; /* media device */ struct cdev cdev; /* character device */ struct device *parent; /* device parent */ /* device info */ int minor; unsigned long flags; /* Use bitops to access flags */ /* callbacks */ void (*release)(struct media_devnode *devnode); }; /* dev to media_devnode */ #define to_media_devnode(cd) container_of(cd, struct media_devnode, dev) /** * media_devnode_register - register a media device node * * @mdev: struct media_device we want to register a device node * @devnode: media device node structure we want to register * @owner: should be filled with %THIS_MODULE * * The registration code assigns minor numbers and registers the new device node * with the kernel. An error is returned if no free minor number can be found, * or if the registration of the device node fails. * * Zero is returned on success. * * Note that if the media_devnode_register call fails, the release() callback of * the media_devnode structure is *not* called, so the caller is responsible for * freeing any data. */ int __must_check media_devnode_register(struct media_device *mdev, struct media_devnode *devnode, struct module *owner); /** * media_devnode_unregister_prepare - clear the media device node register bit * @devnode: the device node to prepare for unregister * * This clears the passed device register bit. Future open calls will be met * with errors. Should be called before media_devnode_unregister() to avoid * races with unregister and device file open calls. * * This function can safely be called if the device node has never been * registered or has already been unregistered. */ void media_devnode_unregister_prepare(struct media_devnode *devnode); /** * media_devnode_unregister - unregister a media device node * @devnode: the device node to unregister * * This unregisters the passed device. Future open calls will be met with * errors. * * Should be called after media_devnode_unregister_prepare() */ void media_devnode_unregister(struct media_devnode *devnode); /** * media_devnode_data - returns a pointer to the &media_devnode * * @filp: pointer to struct &file */ static inline struct media_devnode *media_devnode_data(struct file *filp) { return filp->private_data; } /** * media_devnode_is_registered - returns true if &media_devnode is registered; * false otherwise. * * @devnode: pointer to struct &media_devnode. * * Note: If mdev is NULL, it also returns false. */ static inline int media_devnode_is_registered(struct media_devnode *devnode) { if (!devnode) return false; return test_bit(MEDIA_FLAG_REGISTERED, &devnode->flags); } #endif /* _MEDIA_DEVNODE_H */ |
| 2701 1900 803 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | /* SPDX-License-Identifier: GPL-2.0-only */ #ifndef __LICENSE_H #define __LICENSE_H static inline int license_is_gpl_compatible(const char *license) { return (strcmp(license, "GPL") == 0 || strcmp(license, "GPL v2") == 0 || strcmp(license, "GPL and additional rights") == 0 || strcmp(license, "Dual BSD/GPL") == 0 || strcmp(license, "Dual MIT/GPL") == 0 || strcmp(license, "Dual MPL/GPL") == 0); } #endif |
| 3530 3523 3526 748 3522 2 2 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | // SPDX-License-Identifier: GPL-2.0-only /* * fs/kernfs/symlink.c - kernfs symlink implementation * * Copyright (c) 2001-3 Patrick Mochel * Copyright (c) 2007 SUSE Linux Products GmbH * Copyright (c) 2007, 2013 Tejun Heo <tj@kernel.org> */ #include <linux/fs.h> #include <linux/gfp.h> #include <linux/namei.h> #include "kernfs-internal.h" /** * kernfs_create_link - create a symlink * @parent: directory to create the symlink in * @name: name of the symlink * @target: target node for the symlink to point to * * Return: the created node on success, ERR_PTR() value on error. * Ownership of the link matches ownership of the target. */ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent, const char *name, struct kernfs_node *target) { struct kernfs_node *kn; int error; kuid_t uid = GLOBAL_ROOT_UID; kgid_t gid = GLOBAL_ROOT_GID; if (target->iattr) { uid = target->iattr->ia_uid; gid = target->iattr->ia_gid; } kn = kernfs_new_node(parent, name, S_IFLNK|0777, uid, gid, KERNFS_LINK); if (!kn) return ERR_PTR(-ENOMEM); if (kernfs_ns_enabled(parent)) kn->ns = target->ns; kn->symlink.target_kn = target; kernfs_get(target); /* ref owned by symlink */ error = kernfs_add_one(kn); if (!error) return kn; kernfs_put(kn); return ERR_PTR(error); } static int kernfs_get_target_path(struct kernfs_node *parent, struct kernfs_node *target, char *path) { struct kernfs_node *base, *kn; char *s = path; int len = 0; /* go up to the root, stop at the base */ base = parent; while (base->parent) { kn = target->parent; while (kn->parent && base != kn) kn = kn->parent; if (base == kn) break; if ((s - path) + 3 >= PATH_MAX) return -ENAMETOOLONG; strcpy(s, "../"); s += 3; base = base->parent; } /* determine end of target string for reverse fillup */ kn = target; while (kn->parent && kn != base) { len += strlen(kn->name) + 1; kn = kn->parent; } /* check limits */ if (len < 2) return -EINVAL; len--; if ((s - path) + len >= PATH_MAX) return -ENAMETOOLONG; /* reverse fillup of target string from target to base */ kn = target; while (kn->parent && kn != base) { int slen = strlen(kn->name); len -= slen; memcpy(s + len, kn->name, slen); if (len) s[--len] = '/'; kn = kn->parent; } return 0; } static int kernfs_getlink(struct inode *inode, char *path) { struct kernfs_node *kn = inode->i_private; struct kernfs_node *parent = kn->parent; struct kernfs_node *target = kn->symlink.target_kn; struct kernfs_root *root = kernfs_root(parent); int error; down_read(&root->kernfs_rwsem); error = kernfs_get_target_path(parent, target, path); up_read(&root->kernfs_rwsem); return error; } static const char *kernfs_iop_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { char *body; int error; if (!dentry) return ERR_PTR(-ECHILD); body = kzalloc(PAGE_SIZE, GFP_KERNEL); if (!body) return ERR_PTR(-ENOMEM); error = kernfs_getlink(inode, body); if (unlikely(error < 0)) { kfree(body); return ERR_PTR(error); } set_delayed_call(done, kfree_link, body); return body; } const struct inode_operations kernfs_symlink_iops = { .listxattr = kernfs_iop_listxattr, .get_link = kernfs_iop_get_link, .setattr = kernfs_iop_setattr, .getattr = kernfs_iop_getattr, .permission = kernfs_iop_permission, }; |
| 2397 2301 2435 16 1 11 2 1 1 1 16 16 22 35 41 2413 2419 2413 8 8 2415 16 2417 30 2281 2382 2383 2406 2383 2401 2401 2395 2400 2389 2 1 1 27 27 27 4 41 41 14 27 48 48 18 5 5 27 1 22 7 22 22 22 22 22 22 22 22 22 22 22 22 23 23 67 7 60 30 61 6 48 19 1 1 1 1 1 57 4 3 28 13 37 3 2 33 2 4 35 6 3 35 3 14 27 27 27 27 27 27 27 2403 2429 60 60 3 56 59 60 25 17 5 104 104 100 3 47 3 54 57 57 57 57 3 3 2 1 1 7 6 4 4 1 1 4 4 1 1 5 5 5 4 2380 4 2378 33 2392 2298 2298 2297 22 2234 67 2297 2416 10 32 69 2380 1 2372 30 2383 82 2413 2296 102 2383 2385 2296 11 2387 2299 2297 2296 22 2299 2295 2300 2292 2233 66 2297 2292 2294 2294 1212 1244 2378 2377 1095 2376 2 2374 1 2378 2379 2380 2378 2380 2383 2384 1286 1170 1211 1169 1213 2371 2384 2375 2380 2374 1 164 2420 10 2383 1 2379 2383 6 5 2381 2229 31 124 2381 2377 1 1 8 3 3 2387 2378 8 2378 1 1 2 2 2381 2380 1 8 2 2383 2386 5 5 5 1 4 5 5 2293 2294 2369 2374 69 68 69 69 69 69 69 5 5 1 4 5 5 5 5 5 5 5 5 1 4 1 4 2394 4 7 7 7 7 7 22 22 22 16 16 2304 2 2295 2296 2298 2376 1 2381 2377 2378 1 21 1 1 20 21 2 20 4 2278 2283 2281 2281 2379 2382 2380 2380 2375 2377 2381 2383 2 2380 6 2385 9 2225 124 31 2382 2380 9 2225 155 164 2371 1 2377 2381 2382 2 2376 4 2385 1 2377 1 1 2373 1031 1364 1364 2373 9 2379 2287 71 16 2377 2384 1 12 2 24 2384 2373 9 9 1 8 9 9 1 8 2384 2296 2382 2296 2378 6 3 2280 2292 3 2292 9 2291 5 2387 2385 8 2379 88 2296 2265 85 26 2382 1 2380 7 8 1 8 5 5 3 3 2 6 7 1 14 2393 2 2393 3 5 4 2399 3 3 2392 2281 2393 4 2394 3 3 2390 3 4 2388 1 2392 4 2394 3 2389 2394 2392 1 1 2300 2399 2 1 2401 1 2404 13 2308 2381 2399 2398 2401 10 1 10 3 10 1 8 6 2 1 8 8 8 7 6 6 5 5 1 6 5 1 2 4 26 26 26 22 27 3 19 19 9 9 24 24 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 | // SPDX-License-Identifier: GPL-2.0 /* * USB hub driver. * * (C) Copyright 1999 Linus Torvalds * (C) Copyright 1999 Johannes Erdfelt * (C) Copyright 1999 Gregory P. Smith * (C) Copyright 2001 Brad Hards (bhards@bigpond.net.au) * * Released under the GPLv2 only. */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/completion.h> #include <linux/sched/mm.h> #include <linux/list.h> #include <linux/slab.h> #include <linux/kcov.h> #include <linux/ioctl.h> #include <linux/usb.h> #include <linux/usbdevice_fs.h> #include <linux/usb/hcd.h> #include <linux/usb/onboard_dev.h> #include <linux/usb/otg.h> #include <linux/usb/quirks.h> #include <linux/workqueue.h> #include <linux/mutex.h> #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> #include <linux/bitfield.h> #include <linux/uaccess.h> #include <asm/byteorder.h> #include "hub.h" #include "phy.h" #include "otg_productlist.h" #define USB_VENDOR_GENESYS_LOGIC 0x05e3 #define USB_VENDOR_SMSC 0x0424 #define USB_PRODUCT_USB5534B 0x5534 #define USB_VENDOR_CYPRESS 0x04b4 #define USB_PRODUCT_CY7C65632 0x6570 #define USB_VENDOR_TEXAS_INSTRUMENTS 0x0451 #define USB_PRODUCT_TUSB8041_USB3 0x8140 #define USB_PRODUCT_TUSB8041_USB2 0x8142 #define USB_VENDOR_MICROCHIP 0x0424 #define USB_PRODUCT_USB4913 0x4913 #define USB_PRODUCT_USB4914 0x4914 #define USB_PRODUCT_USB4915 0x4915 #define HUB_QUIRK_CHECK_PORT_AUTOSUSPEND BIT(0) #define HUB_QUIRK_DISABLE_AUTOSUSPEND BIT(1) #define HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL BIT(2) #define USB_TP_TRANSMISSION_DELAY 40 /* ns */ #define USB_TP_TRANSMISSION_DELAY_MAX 65535 /* ns */ #define USB_PING_RESPONSE_TIME 400 /* ns */ #define USB_REDUCE_FRAME_INTR_BINTERVAL 9 /* * The SET_ADDRESS request timeout will be 500 ms when * USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT quirk flag is set. */ #define USB_SHORT_SET_ADDRESS_REQ_TIMEOUT 500 /* ms */ /* Protect struct usb_device->state and ->children members * Note: Both are also protected by ->dev.sem, except that ->state can * change to USB_STATE_NOTATTACHED even when the semaphore isn't held. */ static DEFINE_SPINLOCK(device_state_lock); /* workqueue to process hub events */ static struct workqueue_struct *hub_wq; static void hub_event(struct work_struct *work); /* synchronize hub-port add/remove and peering operations */ DEFINE_MUTEX(usb_port_peer_mutex); /* cycle leds on hubs that aren't blinking for attention */ static bool blinkenlights; module_param(blinkenlights, bool, S_IRUGO); MODULE_PARM_DESC(blinkenlights, "true to cycle leds on hubs"); /* * Device SATA8000 FW1.0 from DATAST0R Technology Corp requires about * 10 seconds to send reply for the initial 64-byte descriptor request. */ /* define initial 64-byte descriptor request timeout in milliseconds */ static int initial_descriptor_timeout = USB_CTRL_GET_TIMEOUT; module_param(initial_descriptor_timeout, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(initial_descriptor_timeout, "initial 64-byte descriptor request timeout in milliseconds " "(default 5000 - 5.0 seconds)"); /* * As of 2.6.10 we introduce a new USB device initialization scheme which * closely resembles the way Windows works. Hopefully it will be compatible * with a wider range of devices than the old scheme. However some previously * working devices may start giving rise to "device not accepting address" * errors; if that happens the user can try the old scheme by adjusting the * following module parameters. * * For maximum flexibility there are two boolean parameters to control the * hub driver's behavior. On the first initialization attempt, if the * "old_scheme_first" parameter is set then the old scheme will be used, * otherwise the new scheme is used. If that fails and "use_both_schemes" * is set, then the driver will make another attempt, using the other scheme. */ static bool old_scheme_first; module_param(old_scheme_first, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(old_scheme_first, "start with the old device initialization scheme"); static bool use_both_schemes = true; module_param(use_both_schemes, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(use_both_schemes, "try the other device initialization scheme if the " "first one fails"); /* Mutual exclusion for EHCI CF initialization. This interferes with * port reset on some companion controllers. */ DECLARE_RWSEM(ehci_cf_port_reset_rwsem); EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_TIMEOUT 2000 #define HUB_DEBOUNCE_STEP 25 #define HUB_DEBOUNCE_STABLE 100 static int usb_reset_and_verify_device(struct usb_device *udev); static int hub_port_disable(struct usb_hub *hub, int port1, int set_state); static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, u16 portstatus); static inline char *portspeed(struct usb_hub *hub, int portstatus) { if (hub_is_superspeedplus(hub->hdev)) return "10.0 Gb/s"; if (hub_is_superspeed(hub->hdev)) return "5.0 Gb/s"; if (portstatus & USB_PORT_STAT_HIGH_SPEED) return "480 Mb/s"; else if (portstatus & USB_PORT_STAT_LOW_SPEED) return "1.5 Mb/s"; else return "12 Mb/s"; } /* Note that hdev or one of its children must be locked! */ struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev) { if (!hdev || !hdev->actconfig || !hdev->maxchild) return NULL; return usb_get_intfdata(hdev->actconfig->interface[0]); } int usb_device_supports_lpm(struct usb_device *udev) { /* Some devices have trouble with LPM */ if (udev->quirks & USB_QUIRK_NO_LPM) return 0; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return 0; /* USB 2.1 (and greater) devices indicate LPM support through * their USB 2.0 Extended Capabilities BOS descriptor. */ if (udev->speed == USB_SPEED_HIGH || udev->speed == USB_SPEED_FULL) { if (udev->bos->ext_cap && (USB_LPM_SUPPORT & le32_to_cpu(udev->bos->ext_cap->bmAttributes))) return 1; return 0; } /* * According to the USB 3.0 spec, all USB 3.0 devices must support LPM. * However, there are some that don't, and they set the U1/U2 exit * latencies to zero. */ if (!udev->bos->ss_cap) { dev_info(&udev->dev, "No LPM exit latency info found, disabling LPM.\n"); return 0; } if (udev->bos->ss_cap->bU1devExitLat == 0 && udev->bos->ss_cap->bU2DevExitLat == 0) { if (udev->parent) dev_info(&udev->dev, "LPM exit latency is zeroed, disabling LPM.\n"); else dev_info(&udev->dev, "We don't know the algorithms for LPM for this host, disabling LPM.\n"); return 0; } if (!udev->parent || udev->parent->lpm_capable) return 1; return 0; } /* * Set the Maximum Exit Latency (MEL) for the host to wakup up the path from * U1/U2, send a PING to the device and receive a PING_RESPONSE. * See USB 3.1 section C.1.5.2 */ static void usb_set_lpm_mel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params, unsigned int udev_exit_latency, struct usb_hub *hub, struct usb3_lpm_parameters *hub_lpm_params, unsigned int hub_exit_latency) { unsigned int total_mel; /* * tMEL1. time to transition path from host to device into U0. * MEL for parent already contains the delay up to parent, so only add * the exit latency for the last link (pick the slower exit latency), * and the hub header decode latency. See USB 3.1 section C 2.2.1 * Store MEL in nanoseconds */ total_mel = hub_lpm_params->mel + max(udev_exit_latency, hub_exit_latency) * 1000 + hub->descriptor->u.ss.bHubHdrDecLat * 100; /* * tMEL2. Time to submit PING packet. Sum of tTPTransmissionDelay for * each link + wHubDelay for each hub. Add only for last link. * tMEL4, the time for PING_RESPONSE to traverse upstream is similar. * Multiply by 2 to include it as well. */ total_mel += (__le16_to_cpu(hub->descriptor->u.ss.wHubDelay) + USB_TP_TRANSMISSION_DELAY) * 2; /* * tMEL3, tPingResponse. Time taken by device to generate PING_RESPONSE * after receiving PING. Also add 2100ns as stated in USB 3.1 C 1.5.2.4 * to cover the delay if the PING_RESPONSE is queued behind a Max Packet * Size DP. * Note these delays should be added only once for the entire path, so * add them to the MEL of the device connected to the roothub. */ if (!hub->hdev->parent) total_mel += USB_PING_RESPONSE_TIME + 2100; udev_lpm_params->mel = total_mel; } /* * Set the maximum Device to Host Exit Latency (PEL) for the device to initiate * a transition from either U1 or U2. */ static void usb_set_lpm_pel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params, unsigned int udev_exit_latency, struct usb_hub *hub, struct usb3_lpm_parameters *hub_lpm_params, unsigned int hub_exit_latency, unsigned int port_to_port_exit_latency) { unsigned int first_link_pel; unsigned int hub_pel; /* * First, the device sends an LFPS to transition the link between the * device and the parent hub into U0. The exit latency is the bigger of * the device exit latency or the hub exit latency. */ if (udev_exit_latency > hub_exit_latency) first_link_pel = udev_exit_latency * 1000; else first_link_pel = hub_exit_latency * 1000; /* * When the hub starts to receive the LFPS, there is a slight delay for * it to figure out that one of the ports is sending an LFPS. Then it * will forward the LFPS to its upstream link. The exit latency is the * delay, plus the PEL that we calculated for this hub. */ hub_pel = port_to_port_exit_latency * 1000 + hub_lpm_params->pel; /* * According to figure C-7 in the USB 3.0 spec, the PEL for this device * is the greater of the two exit latencies. */ if (first_link_pel > hub_pel) udev_lpm_params->pel = first_link_pel; else udev_lpm_params->pel = hub_pel; } /* * Set the System Exit Latency (SEL) to indicate the total worst-case time from * when a device initiates a transition to U0, until when it will receive the * first packet from the host controller. * * Section C.1.5.1 describes the four components to this: * - t1: device PEL * - t2: time for the ERDY to make it from the device to the host. * - t3: a host-specific delay to process the ERDY. * - t4: time for the packet to make it from the host to the device. * * t3 is specific to both the xHCI host and the platform the host is integrated * into. The Intel HW folks have said it's negligible, FIXME if a different * vendor says otherwise. */ static void usb_set_lpm_sel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params) { struct usb_device *parent; unsigned int num_hubs; unsigned int total_sel; /* t1 = device PEL */ total_sel = udev_lpm_params->pel; /* How many external hubs are in between the device & the root port. */ for (parent = udev->parent, num_hubs = 0; parent->parent; parent = parent->parent) num_hubs++; /* t2 = 2.1us + 250ns * (num_hubs - 1) */ if (num_hubs > 0) total_sel += 2100 + 250 * (num_hubs - 1); /* t4 = 250ns * num_hubs */ total_sel += 250 * num_hubs; udev_lpm_params->sel = total_sel; } static void usb_set_lpm_parameters(struct usb_device *udev) { struct usb_hub *hub; unsigned int port_to_port_delay; unsigned int udev_u1_del; unsigned int udev_u2_del; unsigned int hub_u1_del; unsigned int hub_u2_del; if (!udev->lpm_capable || udev->speed < USB_SPEED_SUPER) return; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return; hub = usb_hub_to_struct_hub(udev->parent); /* It doesn't take time to transition the roothub into U0, since it * doesn't have an upstream link. */ if (!hub) return; udev_u1_del = udev->bos->ss_cap->bU1devExitLat; udev_u2_del = le16_to_cpu(udev->bos->ss_cap->bU2DevExitLat); hub_u1_del = udev->parent->bos->ss_cap->bU1devExitLat; hub_u2_del = le16_to_cpu(udev->parent->bos->ss_cap->bU2DevExitLat); usb_set_lpm_mel(udev, &udev->u1_params, udev_u1_del, hub, &udev->parent->u1_params, hub_u1_del); usb_set_lpm_mel(udev, &udev->u2_params, udev_u2_del, hub, &udev->parent->u2_params, hub_u2_del); /* * Appendix C, section C.2.2.2, says that there is a slight delay from * when the parent hub notices the downstream port is trying to * transition to U0 to when the hub initiates a U0 transition on its * upstream port. The section says the delays are tPort2PortU1EL and * tPort2PortU2EL, but it doesn't define what they are. * * The hub chapter, sections 10.4.2.4 and 10.4.2.5 seem to be talking * about the same delays. Use the maximum delay calculations from those * sections. For U1, it's tHubPort2PortExitLat, which is 1us max. For * U2, it's tHubPort2PortExitLat + U2DevExitLat - U1DevExitLat. I * assume the device exit latencies they are talking about are the hub * exit latencies. * * What do we do if the U2 exit latency is less than the U1 exit * latency? It's possible, although not likely... */ port_to_port_delay = 1; usb_set_lpm_pel(udev, &udev->u1_params, udev_u1_del, hub, &udev->parent->u1_params, hub_u1_del, port_to_port_delay); if (hub_u2_del > hub_u1_del) port_to_port_delay = 1 + hub_u2_del - hub_u1_del; else port_to_port_delay = 1 + hub_u1_del; usb_set_lpm_pel(udev, &udev->u2_params, udev_u2_del, hub, &udev->parent->u2_params, hub_u2_del, port_to_port_delay); /* Now that we've got PEL, calculate SEL. */ usb_set_lpm_sel(udev, &udev->u1_params); usb_set_lpm_sel(udev, &udev->u2_params); } /* USB 2.0 spec Section 11.24.4.5 */ static int get_hub_descriptor(struct usb_device *hdev, struct usb_hub_descriptor *desc) { int i, ret, size; unsigned dtype; if (hub_is_superspeed(hdev)) { dtype = USB_DT_SS_HUB; size = USB_DT_SS_HUB_SIZE; } else { dtype = USB_DT_HUB; size = sizeof(struct usb_hub_descriptor); } for (i = 0; i < 3; i++) { ret = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN | USB_RT_HUB, dtype << 8, 0, desc, size, USB_CTRL_GET_TIMEOUT); if (hub_is_superspeed(hdev)) { if (ret == size) return ret; } else if (ret >= USB_DT_HUB_NONVAR_SIZE + 2) { /* Make sure we have the DeviceRemovable field. */ size = USB_DT_HUB_NONVAR_SIZE + desc->bNbrPorts / 8 + 1; if (ret < size) return -EMSGSIZE; return ret; } } return -EINVAL; } /* * USB 2.0 spec Section 11.24.2.1 */ static int clear_hub_feature(struct usb_device *hdev, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_CLEAR_FEATURE, USB_RT_HUB, feature, 0, NULL, 0, 1000); } /* * USB 2.0 spec Section 11.24.2.2 */ int usb_clear_port_feature(struct usb_device *hdev, int port1, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_CLEAR_FEATURE, USB_RT_PORT, feature, port1, NULL, 0, 1000); } /* * USB 2.0 spec Section 11.24.2.13 */ static int set_port_feature(struct usb_device *hdev, int port1, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_SET_FEATURE, USB_RT_PORT, feature, port1, NULL, 0, 1000); } static char *to_led_name(int selector) { switch (selector) { case HUB_LED_AMBER: return "amber"; case HUB_LED_GREEN: return "green"; case HUB_LED_OFF: return "off"; case HUB_LED_AUTO: return "auto"; default: return "??"; } } /* * USB 2.0 spec Section 11.24.2.7.1.10 and table 11-7 * for info about using port indicators */ static void set_port_led(struct usb_hub *hub, int port1, int selector) { struct usb_port *port_dev = hub->ports[port1 - 1]; int status; status = set_port_feature(hub->hdev, (selector << 8) | port1, USB_PORT_FEAT_INDICATOR); dev_dbg(&port_dev->dev, "indicator %s status %d\n", to_led_name(selector), status); } #define LED_CYCLE_PERIOD ((2*HZ)/3) static void led_work(struct work_struct *work) { struct usb_hub *hub = container_of(work, struct usb_hub, leds.work); struct usb_device *hdev = hub->hdev; unsigned i; unsigned changed = 0; int cursor = -1; if (hdev->state != USB_STATE_CONFIGURED || hub->quiescing) return; for (i = 0; i < hdev->maxchild; i++) { unsigned selector, mode; /* 30%-50% duty cycle */ switch (hub->indicator[i]) { /* cycle marker */ case INDICATOR_CYCLE: cursor = i; selector = HUB_LED_AUTO; mode = INDICATOR_AUTO; break; /* blinking green = sw attention */ case INDICATOR_GREEN_BLINK: selector = HUB_LED_GREEN; mode = INDICATOR_GREEN_BLINK_OFF; break; case INDICATOR_GREEN_BLINK_OFF: selector = HUB_LED_OFF; mode = INDICATOR_GREEN_BLINK; break; /* blinking amber = hw attention */ case INDICATOR_AMBER_BLINK: selector = HUB_LED_AMBER; mode = INDICATOR_AMBER_BLINK_OFF; break; case INDICATOR_AMBER_BLINK_OFF: selector = HUB_LED_OFF; mode = INDICATOR_AMBER_BLINK; break; /* blink green/amber = reserved */ case INDICATOR_ALT_BLINK: selector = HUB_LED_GREEN; mode = INDICATOR_ALT_BLINK_OFF; break; case INDICATOR_ALT_BLINK_OFF: selector = HUB_LED_AMBER; mode = INDICATOR_ALT_BLINK; break; default: continue; } if (selector != HUB_LED_AUTO) changed = 1; set_port_led(hub, i + 1, selector); hub->indicator[i] = mode; } if (!changed && blinkenlights) { cursor++; cursor %= hdev->maxchild; set_port_led(hub, cursor + 1, HUB_LED_GREEN); hub->indicator[cursor] = INDICATOR_CYCLE; changed++; } if (changed) queue_delayed_work(system_power_efficient_wq, &hub->leds, LED_CYCLE_PERIOD); } /* use a short timeout for hub/port status fetches */ #define USB_STS_TIMEOUT 1000 #define USB_STS_RETRIES 5 /* * USB 2.0 spec Section 11.24.2.6 */ static int get_hub_status(struct usb_device *hdev, struct usb_hub_status *data) { int i, status = -ETIMEDOUT; for (i = 0; i < USB_STS_RETRIES && (status == -ETIMEDOUT || status == -EPIPE); i++) { status = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_STATUS, USB_DIR_IN | USB_RT_HUB, 0, 0, data, sizeof(*data), USB_STS_TIMEOUT); } return status; } /* * USB 2.0 spec Section 11.24.2.7 * USB 3.1 takes into use the wValue and wLength fields, spec Section 10.16.2.6 */ static int get_port_status(struct usb_device *hdev, int port1, void *data, u16 value, u16 length) { int i, status = -ETIMEDOUT; for (i = 0; i < USB_STS_RETRIES && (status == -ETIMEDOUT || status == -EPIPE); i++) { status = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_STATUS, USB_DIR_IN | USB_RT_PORT, value, port1, data, length, USB_STS_TIMEOUT); } return status; } static int hub_ext_port_status(struct usb_hub *hub, int port1, int type, u16 *status, u16 *change, u32 *ext_status) { int ret; int len = 4; if (type != HUB_PORT_STATUS) len = 8; mutex_lock(&hub->status_mutex); ret = get_port_status(hub->hdev, port1, &hub->status->port, type, len); if (ret < len) { if (ret != -ENODEV) dev_err(hub->intfdev, "%s failed (err = %d)\n", __func__, ret); if (ret >= 0) ret = -EIO; } else { *status = le16_to_cpu(hub->status->port.wPortStatus); *change = le16_to_cpu(hub->status->port.wPortChange); if (type != HUB_PORT_STATUS && ext_status) *ext_status = le32_to_cpu( hub->status->port.dwExtPortStatus); ret = 0; } mutex_unlock(&hub->status_mutex); /* * There is no need to lock status_mutex here, because status_mutex * protects hub->status, and the phy driver only checks the port * status without changing the status. */ if (!ret) { struct usb_device *hdev = hub->hdev; /* * Only roothub will be notified of connection changes, * since the USB PHY only cares about changes at the next * level. */ if (is_root_hub(hdev)) { struct usb_hcd *hcd = bus_to_hcd(hdev->bus); bool connect; bool connect_change; connect_change = *change & USB_PORT_STAT_C_CONNECTION; connect = *status & USB_PORT_STAT_CONNECTION; if (connect_change && connect) usb_phy_roothub_notify_connect(hcd->phy_roothub, port1 - 1); else if (connect_change) usb_phy_roothub_notify_disconnect(hcd->phy_roothub, port1 - 1); } } return ret; } int usb_hub_port_status(struct usb_hub *hub, int port1, u16 *status, u16 *change) { return hub_ext_port_status(hub, port1, HUB_PORT_STATUS, status, change, NULL); } static void hub_resubmit_irq_urb(struct usb_hub *hub) { unsigned long flags; int status; spin_lock_irqsave(&hub->irq_urb_lock, flags); if (hub->quiescing) { spin_unlock_irqrestore(&hub->irq_urb_lock, flags); return; } status = usb_submit_urb(hub->urb, GFP_ATOMIC); if (status && status != -ENODEV && status != -EPERM && status != -ESHUTDOWN) { dev_err(hub->intfdev, "resubmit --> %d\n", status); mod_timer(&hub->irq_urb_retry, jiffies + HZ); } spin_unlock_irqrestore(&hub->irq_urb_lock, flags); } static void hub_retry_irq_urb(struct timer_list *t) { struct usb_hub *hub = from_timer(hub, t, irq_urb_retry); hub_resubmit_irq_urb(hub); } static void kick_hub_wq(struct usb_hub *hub) { struct usb_interface *intf; if (hub->disconnected || work_pending(&hub->events)) return; /* * Suppress autosuspend until the event is proceed. * * Be careful and make sure that the symmetric operation is * always called. We are here only when there is no pending * work for this hub. Therefore put the interface either when * the new work is called or when it is canceled. */ intf = to_usb_interface(hub->intfdev); usb_autopm_get_interface_no_resume(intf); hub_get(hub); if (queue_work(hub_wq, &hub->events)) return; /* the work has already been scheduled */ usb_autopm_put_interface_async(intf); hub_put(hub); } void usb_kick_hub_wq(struct usb_device *hdev) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (hub) kick_hub_wq(hub); } /* * Let the USB core know that a USB 3.0 device has sent a Function Wake Device * Notification, which indicates it had initiated remote wakeup. * * USB 3.0 hubs do not report the port link state change from U3 to U0 when the * device initiates resume, so the USB core will not receive notice of the * resume through the normal hub interrupt URB. */ void usb_wakeup_notification(struct usb_device *hdev, unsigned int portnum) { struct usb_hub *hub; struct usb_port *port_dev; if (!hdev) return; hub = usb_hub_to_struct_hub(hdev); if (hub) { port_dev = hub->ports[portnum - 1]; if (port_dev && port_dev->child) pm_wakeup_event(&port_dev->child->dev, 0); set_bit(portnum, hub->wakeup_bits); kick_hub_wq(hub); } } EXPORT_SYMBOL_GPL(usb_wakeup_notification); /* completion function, fires on port status changes and various faults */ static void hub_irq(struct urb *urb) { struct usb_hub *hub = urb->context; int status = urb->status; unsigned i; unsigned long bits; switch (status) { case -ENOENT: /* synchronous unlink */ case -ECONNRESET: /* async unlink */ case -ESHUTDOWN: /* hardware going away */ return; default: /* presumably an error */ /* Cause a hub reset after 10 consecutive errors */ dev_dbg(hub->intfdev, "transfer --> %d\n", status); if ((++hub->nerrors < 10) || hub->error) goto resubmit; hub->error = status; fallthrough; /* let hub_wq handle things */ case 0: /* we got data: port status changed */ bits = 0; for (i = 0; i < urb->actual_length; ++i) bits |= ((unsigned long) ((*hub->buffer)[i])) << (i*8); hub->event_bits[0] = bits; break; } hub->nerrors = 0; /* Something happened, let hub_wq figure it out */ kick_hub_wq(hub); resubmit: hub_resubmit_irq_urb(hub); } /* USB 2.0 spec Section 11.24.2.3 */ static inline int hub_clear_tt_buffer(struct usb_device *hdev, u16 devinfo, u16 tt) { /* Need to clear both directions for control ep */ if (((devinfo >> 11) & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_CONTROL) { int status = usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_CLEAR_TT_BUFFER, USB_RT_PORT, devinfo ^ 0x8000, tt, NULL, 0, 1000); if (status) return status; } return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_CLEAR_TT_BUFFER, USB_RT_PORT, devinfo, tt, NULL, 0, 1000); } /* * enumeration blocks hub_wq for a long time. we use keventd instead, since * long blocking there is the exception, not the rule. accordingly, HCDs * talking to TTs must queue control transfers (not just bulk and iso), so * both can talk to the same hub concurrently. */ static void hub_tt_work(struct work_struct *work) { struct usb_hub *hub = container_of(work, struct usb_hub, tt.clear_work); unsigned long flags; spin_lock_irqsave(&hub->tt.lock, flags); while (!list_empty(&hub->tt.clear_list)) { struct list_head *next; struct usb_tt_clear *clear; struct usb_device *hdev = hub->hdev; const struct hc_driver *drv; int status; next = hub->tt.clear_list.next; clear = list_entry(next, struct usb_tt_clear, clear_list); list_del(&clear->clear_list); /* drop lock so HCD can concurrently report other TT errors */ spin_unlock_irqrestore(&hub->tt.lock, flags); status = hub_clear_tt_buffer(hdev, clear->devinfo, clear->tt); if (status && status != -ENODEV) dev_err(&hdev->dev, "clear tt %d (%04x) error %d\n", clear->tt, clear->devinfo, status); /* Tell the HCD, even if the operation failed */ drv = clear->hcd->driver; if (drv->clear_tt_buffer_complete) (drv->clear_tt_buffer_complete)(clear->hcd, clear->ep); kfree(clear); spin_lock_irqsave(&hub->tt.lock, flags); } spin_unlock_irqrestore(&hub->tt.lock, flags); } /** * usb_hub_set_port_power - control hub port's power state * @hdev: USB device belonging to the usb hub * @hub: target hub * @port1: port index * @set: expected status * * call this function to control port's power via setting or * clearing the port's PORT_POWER feature. * * Return: 0 if successful. A negative error code otherwise. */ int usb_hub_set_port_power(struct usb_device *hdev, struct usb_hub *hub, int port1, bool set) { int ret; if (set) ret = set_port_feature(hdev, port1, USB_PORT_FEAT_POWER); else ret = usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_POWER); if (ret) return ret; if (set) set_bit(port1, hub->power_bits); else clear_bit(port1, hub->power_bits); return 0; } /** * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub * @urb: an URB associated with the failed or incomplete split transaction * * High speed HCDs use this to tell the hub driver that some split control or * bulk transaction failed in a way that requires clearing internal state of * a transaction translator. This is normally detected (and reported) from * interrupt context. * * It may not be possible for that hub to handle additional full (or low) * speed transactions until that state is fully cleared out. * * Return: 0 if successful. A negative error code otherwise. */ int usb_hub_clear_tt_buffer(struct urb *urb) { struct usb_device *udev = urb->dev; int pipe = urb->pipe; struct usb_tt *tt = udev->tt; unsigned long flags; struct usb_tt_clear *clear; /* we've got to cope with an arbitrary number of pending TT clears, * since each TT has "at least two" buffers that can need it (and * there can be many TTs per hub). even if they're uncommon. */ clear = kmalloc(sizeof *clear, GFP_ATOMIC); if (clear == NULL) { dev_err(&udev->dev, "can't save CLEAR_TT_BUFFER state\n"); /* FIXME recover somehow ... RESET_TT? */ return -ENOMEM; } /* info that CLEAR_TT_BUFFER needs */ clear->tt = tt->multi ? udev->ttport : 1; clear->devinfo = usb_pipeendpoint (pipe); clear->devinfo |= ((u16)udev->devaddr) << 4; clear->devinfo |= usb_pipecontrol(pipe) ? (USB_ENDPOINT_XFER_CONTROL << 11) : (USB_ENDPOINT_XFER_BULK << 11); if (usb_pipein(pipe)) clear->devinfo |= 1 << 15; /* info for completion callback */ clear->hcd = bus_to_hcd(udev->bus); clear->ep = urb->ep; /* tell keventd to clear state for this TT */ spin_lock_irqsave(&tt->lock, flags); list_add_tail(&clear->clear_list, &tt->clear_list); schedule_work(&tt->clear_work); spin_unlock_irqrestore(&tt->lock, flags); return 0; } EXPORT_SYMBOL_GPL(usb_hub_clear_tt_buffer); static void hub_power_on(struct usb_hub *hub, bool do_delay) { int port1; /* Enable power on each port. Some hubs have reserved values * of LPSM (> 2) in their descriptors, even though they are * USB 2.0 hubs. Some hubs do not implement port-power switching * but only emulate it. In all cases, the ports won't work * unless we send these messages to the hub. */ if (hub_is_port_power_switchable(hub)) dev_dbg(hub->intfdev, "enabling power on all ports\n"); else dev_dbg(hub->intfdev, "trying to enable port power on " "non-switchable hub\n"); for (port1 = 1; port1 <= hub->hdev->maxchild; port1++) if (test_bit(port1, hub->power_bits)) set_port_feature(hub->hdev, port1, USB_PORT_FEAT_POWER); else usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_POWER); if (do_delay) msleep(hub_power_on_good_delay(hub)); } static int hub_hub_status(struct usb_hub *hub, u16 *status, u16 *change) { int ret; mutex_lock(&hub->status_mutex); ret = get_hub_status(hub->hdev, &hub->status->hub); if (ret < 0) { if (ret != -ENODEV) dev_err(hub->intfdev, "%s failed (err = %d)\n", __func__, ret); } else { *status = le16_to_cpu(hub->status->hub.wHubStatus); *change = le16_to_cpu(hub->status->hub.wHubChange); ret = 0; } mutex_unlock(&hub->status_mutex); return ret; } static int hub_set_port_link_state(struct usb_hub *hub, int port1, unsigned int link_status) { return set_port_feature(hub->hdev, port1 | (link_status << 3), USB_PORT_FEAT_LINK_STATE); } /* * Disable a port and mark a logical connect-change event, so that some * time later hub_wq will disconnect() any existing usb_device on the port * and will re-enumerate if there actually is a device attached. */ static void hub_port_logical_disconnect(struct usb_hub *hub, int port1) { dev_dbg(&hub->ports[port1 - 1]->dev, "logical disconnect\n"); hub_port_disable(hub, port1, 1); /* FIXME let caller ask to power down the port: * - some devices won't enumerate without a VBUS power cycle * - SRP saves power that way * - ... new call, TBD ... * That's easy if this hub can switch power per-port, and * hub_wq reactivates the port later (timer, SRP, etc). * Powerdown must be optional, because of reset/DFU. */ set_bit(port1, hub->change_bits); kick_hub_wq(hub); } /** * usb_remove_device - disable a device's port on its parent hub * @udev: device to be disabled and removed * Context: @udev locked, must be able to sleep. * * After @udev's port has been disabled, hub_wq is notified and it will * see that the device has been disconnected. When the device is * physically unplugged and something is plugged in, the events will * be received and processed normally. * * Return: 0 if successful. A negative error code otherwise. */ int usb_remove_device(struct usb_device *udev) { struct usb_hub *hub; struct usb_interface *intf; int ret; if (!udev->parent) /* Can't remove a root hub */ return -EINVAL; hub = usb_hub_to_struct_hub(udev->parent); intf = to_usb_interface(hub->intfdev); ret = usb_autopm_get_interface(intf); if (ret < 0) return ret; set_bit(udev->portnum, hub->removed_bits); hub_port_logical_disconnect(hub, udev->portnum); usb_autopm_put_interface(intf); return 0; } enum hub_activation_type { HUB_INIT, HUB_INIT2, HUB_INIT3, /* INITs must come first */ HUB_POST_RESET, HUB_RESUME, HUB_RESET_RESUME, }; static void hub_init_func2(struct work_struct *ws); static void hub_init_func3(struct work_struct *ws); static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) { struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd; int ret; int port1; int status; bool need_debounce_delay = false; unsigned delay; /* Continue a partial initialization */ if (type == HUB_INIT2 || type == HUB_INIT3) { device_lock(&hdev->dev); /* Was the hub disconnected while we were waiting? */ if (hub->disconnected) goto disconnected; if (type == HUB_INIT2) goto init2; goto init3; } hub_get(hub); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits * it uses to determine the downstream port number. So hub driver * should send a set hub depth request to superspeed hub after * the superspeed hub is set configuration in initialization or * reset procedure. * * After a resume, port power should still be on. * For any other type of activation, turn it on. */ if (type != HUB_RESUME) { if (hdev->parent && hub_is_superspeed(hdev)) { ret = usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_SET_DEPTH, USB_RT_HUB, hdev->level - 1, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (ret < 0) dev_err(hub->intfdev, "set hub depth failed\n"); } /* Speed up system boot by using a delayed_work for the * hub's initial power-up delays. This is pretty awkward * and the implementation looks like a home-brewed sort of * setjmp/longjmp, but it saves at least 100 ms for each * root hub (assuming usbcore is compiled into the kernel * rather than as a module). It adds up. * * This can't be done for HUB_RESUME or HUB_RESET_RESUME * because for those activation types the ports have to be * operational when we return. In theory this could be done * for HUB_POST_RESET, but it's easier not to. */ if (type == HUB_INIT) { delay = hub_power_on_good_delay(hub); hub_power_on(hub, false); INIT_DELAYED_WORK(&hub->init_work, hub_init_func2); queue_delayed_work(system_power_efficient_wq, &hub->init_work, msecs_to_jiffies(delay)); /* Suppress autosuspend until init is done */ usb_autopm_get_interface_no_resume( to_usb_interface(hub->intfdev)); return; /* Continues at init2: below */ } else if (type == HUB_RESET_RESUME) { /* The internal host controller state for the hub device * may be gone after a host power loss on system resume. * Update the device's info so the HW knows it's a hub. */ hcd = bus_to_hcd(hdev->bus); if (hcd->driver->update_hub_device) { ret = hcd->driver->update_hub_device(hcd, hdev, &hub->tt, GFP_NOIO); if (ret < 0) { dev_err(hub->intfdev, "Host not accepting hub info update\n"); dev_err(hub->intfdev, "LS/FS devices and hubs may not work under this hub\n"); } } hub_power_on(hub, true); } else { hub_power_on(hub, true); } /* Give some time on remote wakeup to let links to transit to U0 */ } else if (hub_is_superspeed(hub->hdev)) msleep(20); init2: /* * Check each port and set hub->change_bits to let hub_wq know * which ports need attention. */ for (port1 = 1; port1 <= hdev->maxchild; ++port1) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; u16 portstatus, portchange; portstatus = portchange = 0; status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (status) goto abort; if (udev || (portstatus & USB_PORT_STAT_CONNECTION)) dev_dbg(&port_dev->dev, "status %04x change %04x\n", portstatus, portchange); /* * After anything other than HUB_RESUME (i.e., initialization * or any sort of reset), every port should be disabled. * Unconnected ports should likewise be disabled (paranoia), * and so should ports for which we have no usb_device. */ if ((portstatus & USB_PORT_STAT_ENABLE) && ( type != HUB_RESUME || !(portstatus & USB_PORT_STAT_CONNECTION) || !udev || udev->state == USB_STATE_NOTATTACHED)) { /* * USB3 protocol ports will automatically transition * to Enabled state when detect an USB3.0 device attach. * Do not disable USB3 protocol ports, just pretend * power was lost */ portstatus &= ~USB_PORT_STAT_ENABLE; if (!hub_is_superspeed(hdev)) usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_ENABLE); } /* Make sure a warm-reset request is handled by port_event */ if (type == HUB_RESUME && hub_port_warm_reset_required(hub, port1, portstatus)) set_bit(port1, hub->event_bits); /* * Add debounce if USB3 link is in polling/link training state. * Link will automatically transition to Enabled state after * link training completes. */ if (hub_is_superspeed(hdev) && ((portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_POLLING)) need_debounce_delay = true; /* Clear status-change flags; we'll debounce later */ if (portchange & USB_PORT_STAT_C_CONNECTION) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); } if (portchange & USB_PORT_STAT_C_ENABLE) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_ENABLE); } if (portchange & USB_PORT_STAT_C_RESET) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_RESET); } if ((portchange & USB_PORT_STAT_C_BH_RESET) && hub_is_superspeed(hub->hdev)) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); } /* We can forget about a "removed" device when there's a * physical disconnect or the connect status changes. */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION)) clear_bit(port1, hub->removed_bits); if (!udev || udev->state == USB_STATE_NOTATTACHED) { /* Tell hub_wq to disconnect the device or * check for a new connection or over current condition. * Based on USB2.0 Spec Section 11.12.5, * C_PORT_OVER_CURRENT could be set while * PORT_OVER_CURRENT is not. So check for any of them. */ if (udev || (portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION) || (portstatus & USB_PORT_STAT_OVERCURRENT) || (portchange & USB_PORT_STAT_C_OVERCURRENT)) set_bit(port1, hub->change_bits); } else if (portstatus & USB_PORT_STAT_ENABLE) { bool port_resumed = (portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_U0; /* The power session apparently survived the resume. * If there was an overcurrent or suspend change * (i.e., remote wakeup request), have hub_wq * take care of it. Look at the port link state * for USB 3.0 hubs, since they don't have a suspend * change bit, and they don't set the port link change * bit on device-initiated resume. */ if (portchange || (hub_is_superspeed(hub->hdev) && port_resumed)) set_bit(port1, hub->event_bits); } else if (udev->persist_enabled) { #ifdef CONFIG_PM udev->reset_resume = 1; #endif /* Don't set the change_bits when the device * was powered off. */ if (test_bit(port1, hub->power_bits)) set_bit(port1, hub->change_bits); } else { /* The power session is gone; tell hub_wq */ usb_set_device_state(udev, USB_STATE_NOTATTACHED); set_bit(port1, hub->change_bits); } } /* If no port-status-change flags were set, we don't need any * debouncing. If flags were set we can try to debounce the * ports all at once right now, instead of letting hub_wq do them * one at a time later on. * * If any port-status changes do occur during this delay, hub_wq * will see them later and handle them normally. */ if (need_debounce_delay) { delay = HUB_DEBOUNCE_STABLE; /* Don't do a long sleep inside a workqueue routine */ if (type == HUB_INIT2) { INIT_DELAYED_WORK(&hub->init_work, hub_init_func3); queue_delayed_work(system_power_efficient_wq, &hub->init_work, msecs_to_jiffies(delay)); device_unlock(&hdev->dev); return; /* Continues at init3: below */ } else { msleep(delay); } } init3: hub->quiescing = 0; status = usb_submit_urb(hub->urb, GFP_NOIO); if (status < 0) dev_err(hub->intfdev, "activate --> %d\n", status); if (hub->has_indicators && blinkenlights) queue_delayed_work(system_power_efficient_wq, &hub->leds, LED_CYCLE_PERIOD); /* Scan all ports that need attention */ kick_hub_wq(hub); abort: if (type == HUB_INIT2 || type == HUB_INIT3) { /* Allow autosuspend if it was suppressed */ disconnected: usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); device_unlock(&hdev->dev); } hub_put(hub); } /* Implement the continuations for the delays above */ static void hub_init_func2(struct work_struct *ws) { struct usb_hub *hub = container_of(ws, struct usb_hub, init_work.work); hub_activate(hub, HUB_INIT2); } static void hub_init_func3(struct work_struct *ws) { struct usb_hub *hub = container_of(ws, struct usb_hub, init_work.work); hub_activate(hub, HUB_INIT3); } enum hub_quiescing_type { HUB_DISCONNECT, HUB_PRE_RESET, HUB_SUSPEND }; static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type) { struct usb_device *hdev = hub->hdev; unsigned long flags; int i; /* hub_wq and related activity won't re-trigger */ spin_lock_irqsave(&hub->irq_urb_lock, flags); hub->quiescing = 1; spin_unlock_irqrestore(&hub->irq_urb_lock, flags); if (type != HUB_SUSPEND) { /* Disconnect all the children */ for (i = 0; i < hdev->maxchild; ++i) { if (hub->ports[i]->child) usb_disconnect(&hub->ports[i]->child); } } /* Stop hub_wq and related activity */ del_timer_sync(&hub->irq_urb_retry); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds); if (hub->tt.hub) flush_work(&hub->tt.clear_work); } static void hub_pm_barrier_for_all_ports(struct usb_hub *hub) { int i; for (i = 0; i < hub->hdev->maxchild; ++i) pm_runtime_barrier(&hub->ports[i]->dev); } /* caller has locked the hub device */ static int hub_pre_reset(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); hub_quiesce(hub, HUB_PRE_RESET); hub->in_reset = 1; hub_pm_barrier_for_all_ports(hub); return 0; } /* caller has locked the hub device */ static int hub_post_reset(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); hub->in_reset = 0; hub_pm_barrier_for_all_ports(hub); hub_activate(hub, HUB_POST_RESET); return 0; } static int hub_configure(struct usb_hub *hub, struct usb_endpoint_descriptor *endpoint) { struct usb_hcd *hcd; struct usb_device *hdev = hub->hdev; struct device *hub_dev = hub->intfdev; u16 hubstatus, hubchange; u16 wHubCharacteristics; unsigned int pipe; int maxp, ret, i; char *message = "out of memory"; unsigned unit_load; unsigned full_load; unsigned maxchild; hub->buffer = kmalloc(sizeof(*hub->buffer), GFP_KERNEL); if (!hub->buffer) { ret = -ENOMEM; goto fail; } hub->status = kmalloc(sizeof(*hub->status), GFP_KERNEL); if (!hub->status) { ret = -ENOMEM; goto fail; } mutex_init(&hub->status_mutex); hub->descriptor = kzalloc(sizeof(*hub->descriptor), GFP_KERNEL); if (!hub->descriptor) { ret = -ENOMEM; goto fail; } /* Request the entire hub descriptor. * hub->descriptor can handle USB_MAXCHILDREN ports, * but a (non-SS) hub can/will return fewer bytes here. */ ret = get_hub_descriptor(hdev, hub->descriptor); if (ret < 0) { message = "can't read hub descriptor"; goto fail; } maxchild = USB_MAXCHILDREN; if (hub_is_superspeed(hdev)) maxchild = min_t(unsigned, maxchild, USB_SS_MAXPORTS); if (hub->descriptor->bNbrPorts > maxchild) { message = "hub has too many ports!"; ret = -ENODEV; goto fail; } else if (hub->descriptor->bNbrPorts == 0) { message = "hub doesn't have any ports!"; ret = -ENODEV; goto fail; } /* * Accumulate wHubDelay + 40ns for every hub in the tree of devices. * The resulting value will be used for SetIsochDelay() request. */ if (hub_is_superspeed(hdev) || hub_is_superspeedplus(hdev)) { u32 delay = __le16_to_cpu(hub->descriptor->u.ss.wHubDelay); if (hdev->parent) delay += hdev->parent->hub_delay; delay += USB_TP_TRANSMISSION_DELAY; hdev->hub_delay = min_t(u32, delay, USB_TP_TRANSMISSION_DELAY_MAX); } maxchild = hub->descriptor->bNbrPorts; dev_info(hub_dev, "%d port%s detected\n", maxchild, (maxchild == 1) ? "" : "s"); hub->ports = kcalloc(maxchild, sizeof(struct usb_port *), GFP_KERNEL); if (!hub->ports) { ret = -ENOMEM; goto fail; } wHubCharacteristics = le16_to_cpu(hub->descriptor->wHubCharacteristics); if (hub_is_superspeed(hdev)) { unit_load = 150; full_load = 900; } else { unit_load = 100; full_load = 500; } /* FIXME for USB 3.0, skip for now */ if ((wHubCharacteristics & HUB_CHAR_COMPOUND) && !(hub_is_superspeed(hdev))) { char portstr[USB_MAXCHILDREN + 1]; for (i = 0; i < maxchild; i++) portstr[i] = hub->descriptor->u.hs.DeviceRemovable [((i + 1) / 8)] & (1 << ((i + 1) % 8)) ? 'F' : 'R'; portstr[maxchild] = 0; dev_dbg(hub_dev, "compound device; port removable status: %s\n", portstr); } else dev_dbg(hub_dev, "standalone hub\n"); switch (wHubCharacteristics & HUB_CHAR_LPSM) { case HUB_CHAR_COMMON_LPSM: dev_dbg(hub_dev, "ganged power switching\n"); break; case HUB_CHAR_INDV_PORT_LPSM: dev_dbg(hub_dev, "individual port power switching\n"); break; case HUB_CHAR_NO_LPSM: case HUB_CHAR_LPSM: dev_dbg(hub_dev, "no power switching (usb 1.0)\n"); break; } switch (wHubCharacteristics & HUB_CHAR_OCPM) { case HUB_CHAR_COMMON_OCPM: dev_dbg(hub_dev, "global over-current protection\n"); break; case HUB_CHAR_INDV_PORT_OCPM: dev_dbg(hub_dev, "individual port over-current protection\n"); break; case HUB_CHAR_NO_OCPM: case HUB_CHAR_OCPM: dev_dbg(hub_dev, "no over-current protection\n"); break; } spin_lock_init(&hub->tt.lock); INIT_LIST_HEAD(&hub->tt.clear_list); INIT_WORK(&hub->tt.clear_work, hub_tt_work); switch (hdev->descriptor.bDeviceProtocol) { case USB_HUB_PR_FS: break; case USB_HUB_PR_HS_SINGLE_TT: dev_dbg(hub_dev, "Single TT\n"); hub->tt.hub = hdev; break; case USB_HUB_PR_HS_MULTI_TT: ret = usb_set_interface(hdev, 0, 1); if (ret == 0) { dev_dbg(hub_dev, "TT per port\n"); hub->tt.multi = 1; } else dev_err(hub_dev, "Using single TT (err %d)\n", ret); hub->tt.hub = hdev; break; case USB_HUB_PR_SS: /* USB 3.0 hubs don't have a TT */ break; default: dev_dbg(hub_dev, "Unrecognized hub protocol %d\n", hdev->descriptor.bDeviceProtocol); break; } /* Note 8 FS bit times == (8 bits / 12000000 bps) ~= 666ns */ switch (wHubCharacteristics & HUB_CHAR_TTTT) { case HUB_TTTT_8_BITS: if (hdev->descriptor.bDeviceProtocol != 0) { hub->tt.think_time = 666; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 8, hub->tt.think_time); } break; case HUB_TTTT_16_BITS: hub->tt.think_time = 666 * 2; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 16, hub->tt.think_time); break; case HUB_TTTT_24_BITS: hub->tt.think_time = 666 * 3; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 24, hub->tt.think_time); break; case HUB_TTTT_32_BITS: hub->tt.think_time = 666 * 4; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 32, hub->tt.think_time); break; } /* probe() zeroes hub->indicator[] */ if (wHubCharacteristics & HUB_CHAR_PORTIND) { hub->has_indicators = 1; dev_dbg(hub_dev, "Port indicators are supported\n"); } dev_dbg(hub_dev, "power on to power good time: %dms\n", hub->descriptor->bPwrOn2PwrGood * 2); /* power budgeting mostly matters with bus-powered hubs, * and battery-powered root hubs (may provide just 8 mA). */ ret = usb_get_std_status(hdev, USB_RECIP_DEVICE, 0, &hubstatus); if (ret) { message = "can't get hub status"; goto fail; } hcd = bus_to_hcd(hdev->bus); if (hdev == hdev->bus->root_hub) { if (hcd->power_budget > 0) hdev->bus_mA = hcd->power_budget; else hdev->bus_mA = full_load * maxchild; if (hdev->bus_mA >= full_load) hub->mA_per_port = full_load; else { hub->mA_per_port = hdev->bus_mA; hub->limited_power = 1; } } else if ((hubstatus & (1 << USB_DEVICE_SELF_POWERED)) == 0) { int remaining = hdev->bus_mA - hub->descriptor->bHubContrCurrent; dev_dbg(hub_dev, "hub controller current requirement: %dmA\n", hub->descriptor->bHubContrCurrent); hub->limited_power = 1; if (remaining < maxchild * unit_load) dev_warn(hub_dev, "insufficient power available " "to use all downstream ports\n"); hub->mA_per_port = unit_load; /* 7.2.1 */ } else { /* Self-powered external hub */ /* FIXME: What about battery-powered external hubs that * provide less current per port? */ hub->mA_per_port = full_load; } if (hub->mA_per_port < full_load) dev_dbg(hub_dev, "%umA bus power budget for each child\n", hub->mA_per_port); ret = hub_hub_status(hub, &hubstatus, &hubchange); if (ret < 0) { message = "can't get hub status"; goto fail; } /* local power status reports aren't always correct */ if (hdev->actconfig->desc.bmAttributes & USB_CONFIG_ATT_SELFPOWER) dev_dbg(hub_dev, "local power source is %s\n", (hubstatus & HUB_STATUS_LOCAL_POWER) ? "lost (inactive)" : "good"); if ((wHubCharacteristics & HUB_CHAR_OCPM) == 0) dev_dbg(hub_dev, "%sover-current condition exists\n", (hubstatus & HUB_STATUS_OVERCURRENT) ? "" : "no "); /* set up the interrupt endpoint * We use the EP's maxpacket size instead of (PORTS+1+7)/8 * bytes as USB2.0[11.12.3] says because some hubs are known * to send more data (and thus cause overflow). For root hubs, * maxpktsize is defined in hcd.c's fake endpoint descriptors * to be big enough for at least USB_MAXCHILDREN ports. */ pipe = usb_rcvintpipe(hdev, endpoint->bEndpointAddress); maxp = usb_maxpacket(hdev, pipe); if (maxp > sizeof(*hub->buffer)) maxp = sizeof(*hub->buffer); hub->urb = usb_alloc_urb(0, GFP_KERNEL); if (!hub->urb) { ret = -ENOMEM; goto fail; } usb_fill_int_urb(hub->urb, hdev, pipe, *hub->buffer, maxp, hub_irq, hub, endpoint->bInterval); /* maybe cycle the hub leds */ if (hub->has_indicators && blinkenlights) hub->indicator[0] = INDICATOR_CYCLE; mutex_lock(&usb_port_peer_mutex); for (i = 0; i < maxchild; i++) { ret = usb_hub_create_port_device(hub, i + 1); if (ret < 0) { dev_err(hub->intfdev, "couldn't create port%d device.\n", i + 1); break; } } hdev->maxchild = i; for (i = 0; i < hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i]; pm_runtime_put(&port_dev->dev); } mutex_unlock(&usb_port_peer_mutex); if (ret < 0) goto fail; /* Update the HCD's internal representation of this hub before hub_wq * starts getting port status changes for devices under the hub. */ if (hcd->driver->update_hub_device) { ret = hcd->driver->update_hub_device(hcd, hdev, &hub->tt, GFP_KERNEL); if (ret < 0) { message = "can't update HCD hub info"; goto fail; } } usb_hub_adjust_deviceremovable(hdev, hub->descriptor); hub_activate(hub, HUB_INIT); return 0; fail: dev_err(hub_dev, "config failed, %s (err %d)\n", message, ret); /* hub_disconnect() frees urb and descriptor */ return ret; } static void hub_release(struct kref *kref) { struct usb_hub *hub = container_of(kref, struct usb_hub, kref); usb_put_dev(hub->hdev); usb_put_intf(to_usb_interface(hub->intfdev)); kfree(hub); } void hub_get(struct usb_hub *hub) { kref_get(&hub->kref); } void hub_put(struct usb_hub *hub) { kref_put(&hub->kref, hub_release); } static unsigned highspeed_hubs; static void hub_disconnect(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); struct usb_device *hdev = interface_to_usbdev(intf); int port1; /* * Stop adding new hub events. We do not want to block here and thus * will not try to remove any pending work item. */ hub->disconnected = 1; /* Disconnect all children and quiesce the hub */ hub->error = 0; hub_quiesce(hub, HUB_DISCONNECT); mutex_lock(&usb_port_peer_mutex); /* Avoid races with recursively_mark_NOTATTACHED() */ spin_lock_irq(&device_state_lock); port1 = hdev->maxchild; hdev->maxchild = 0; usb_set_intfdata(intf, NULL); spin_unlock_irq(&device_state_lock); for (; port1 > 0; --port1) usb_hub_remove_port_device(hub, port1); mutex_unlock(&usb_port_peer_mutex); if (hub->hdev->speed == USB_SPEED_HIGH) highspeed_hubs--; usb_free_urb(hub->urb); kfree(hub->ports); kfree(hub->descriptor); kfree(hub->status); kfree(hub->buffer); pm_suspend_ignore_children(&intf->dev, false); if (hub->quirk_disable_autosuspend) usb_autopm_put_interface(intf); onboard_dev_destroy_pdevs(&hub->onboard_devs); hub_put(hub); } static bool hub_descriptor_is_sane(struct usb_host_interface *desc) { /* Some hubs have a subclass of 1, which AFAICT according to the */ /* specs is not defined, but it works */ if (desc->desc.bInterfaceSubClass != 0 && desc->desc.bInterfaceSubClass != 1) return false; /* Multiple endpoints? What kind of mutant ninja-hub is this? */ if (desc->desc.bNumEndpoints != 1) return false; /* If the first endpoint is not interrupt IN, we'd better punt! */ if (!usb_endpoint_is_int_in(&desc->endpoint[0].desc)) return false; return true; } static int hub_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_host_interface *desc; struct usb_device *hdev; struct usb_hub *hub; desc = intf->cur_altsetting; hdev = interface_to_usbdev(intf); /* * Set default autosuspend delay as 0 to speedup bus suspend, * based on the below considerations: * * - Unlike other drivers, the hub driver does not rely on the * autosuspend delay to provide enough time to handle a wakeup * event, and the submitted status URB is just to check future * change on hub downstream ports, so it is safe to do it. * * - The patch might cause one or more auto supend/resume for * below very rare devices when they are plugged into hub * first time: * * devices having trouble initializing, and disconnect * themselves from the bus and then reconnect a second * or so later * * devices just for downloading firmware, and disconnects * themselves after completing it * * For these quite rare devices, their drivers may change the * autosuspend delay of their parent hub in the probe() to one * appropriate value to avoid the subtle problem if someone * does care it. * * - The patch may cause one or more auto suspend/resume on * hub during running 'lsusb', but it is probably too * infrequent to worry about. * * - Change autosuspend delay of hub can avoid unnecessary auto * suspend timer for hub, also may decrease power consumption * of USB bus. * * - If user has indicated to prevent autosuspend by passing * usbcore.autosuspend = -1 then keep autosuspend disabled. */ #ifdef CONFIG_PM if (hdev->dev.power.autosuspend_delay >= 0) pm_runtime_set_autosuspend_delay(&hdev->dev, 0); #endif /* * Hubs have proper suspend/resume support, except for root hubs * where the controller driver doesn't have bus_suspend and * bus_resume methods. */ if (hdev->parent) { /* normal device */ usb_enable_autosuspend(hdev); } else { /* root hub */ const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver; if (drv->bus_suspend && drv->bus_resume) usb_enable_autosuspend(hdev); } if (hdev->level == MAX_TOPO_LEVEL) { dev_err(&intf->dev, "Unsupported bus topology: hub nested too deep\n"); return -E2BIG; } #ifdef CONFIG_USB_OTG_DISABLE_EXTERNAL_HUB if (hdev->parent) { dev_warn(&intf->dev, "ignoring external hub\n"); return -ENODEV; } #endif if (!hub_descriptor_is_sane(desc)) { dev_err(&intf->dev, "bad descriptor, ignoring hub\n"); return -EIO; } /* We found a hub */ dev_info(&intf->dev, "USB hub found\n"); hub = kzalloc(sizeof(*hub), GFP_KERNEL); if (!hub) return -ENOMEM; kref_init(&hub->kref); hub->intfdev = &intf->dev; hub->hdev = hdev; INIT_DELAYED_WORK(&hub->leds, led_work); INIT_DELAYED_WORK(&hub->init_work, NULL); INIT_WORK(&hub->events, hub_event); INIT_LIST_HEAD(&hub->onboard_devs); spin_lock_init(&hub->irq_urb_lock); timer_setup(&hub->irq_urb_retry, hub_retry_irq_urb, 0); usb_get_intf(intf); usb_get_dev(hdev); usb_set_intfdata(intf, hub); intf->needs_remote_wakeup = 1; pm_suspend_ignore_children(&intf->dev, true); if (hdev->speed == USB_SPEED_HIGH) highspeed_hubs++; if (id->driver_info & HUB_QUIRK_CHECK_PORT_AUTOSUSPEND) hub->quirk_check_port_auto_suspend = 1; if (id->driver_info & HUB_QUIRK_DISABLE_AUTOSUSPEND) { hub->quirk_disable_autosuspend = 1; usb_autopm_get_interface_no_resume(intf); } if ((id->driver_info & HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL) && desc->endpoint[0].desc.bInterval > USB_REDUCE_FRAME_INTR_BINTERVAL) { desc->endpoint[0].desc.bInterval = USB_REDUCE_FRAME_INTR_BINTERVAL; /* Tell the HCD about the interrupt ep's new bInterval */ usb_set_interface(hdev, 0, 0); } if (hub_configure(hub, &desc->endpoint[0].desc) >= 0) { onboard_dev_create_pdevs(hdev, &hub->onboard_devs); return 0; } hub_disconnect(intf); return -ENODEV; } static int hub_ioctl(struct usb_interface *intf, unsigned int code, void *user_data) { struct usb_device *hdev = interface_to_usbdev(intf); struct usb_hub *hub = usb_hub_to_struct_hub(hdev); /* assert ifno == 0 (part of hub spec) */ switch (code) { case USBDEVFS_HUB_PORTINFO: { struct usbdevfs_hub_portinfo *info = user_data; int i; spin_lock_irq(&device_state_lock); if (hdev->devnum <= 0) info->nports = 0; else { info->nports = hdev->maxchild; for (i = 0; i < info->nports; i++) { if (hub->ports[i]->child == NULL) info->port[i] = 0; else info->port[i] = hub->ports[i]->child->devnum; } } spin_unlock_irq(&device_state_lock); return info->nports + 1; } default: return -ENOSYS; } } /* * Allow user programs to claim ports on a hub. When a device is attached * to one of these "claimed" ports, the program will "own" the device. */ static int find_port_owner(struct usb_device *hdev, unsigned port1, struct usb_dev_state ***ppowner) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (hdev->state == USB_STATE_NOTATTACHED) return -ENODEV; if (port1 == 0 || port1 > hdev->maxchild) return -EINVAL; /* Devices not managed by the hub driver * will always have maxchild equal to 0. */ *ppowner = &(hub->ports[port1 - 1]->port_owner); return 0; } /* In the following three functions, the caller must hold hdev's lock */ int usb_hub_claim_port(struct usb_device *hdev, unsigned port1, struct usb_dev_state *owner) { int rc; struct usb_dev_state **powner; rc = find_port_owner(hdev, port1, &powner); if (rc) return rc; if (*powner) return -EBUSY; *powner = owner; return rc; } EXPORT_SYMBOL_GPL(usb_hub_claim_port); int usb_hub_release_port(struct usb_device *hdev, unsigned port1, struct usb_dev_state *owner) { int rc; struct usb_dev_state **powner; rc = find_port_owner(hdev, port1, &powner); if (rc) return rc; if (*powner != owner) return -ENOENT; *powner = NULL; return rc; } EXPORT_SYMBOL_GPL(usb_hub_release_port); void usb_hub_release_all_ports(struct usb_device *hdev, struct usb_dev_state *owner) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); int n; for (n = 0; n < hdev->maxchild; n++) { if (hub->ports[n]->port_owner == owner) hub->ports[n]->port_owner = NULL; } } /* The caller must hold udev's lock */ bool usb_device_is_owned(struct usb_device *udev) { struct usb_hub *hub; if (udev->state == USB_STATE_NOTATTACHED || !udev->parent) return false; hub = usb_hub_to_struct_hub(udev->parent); return !!hub->ports[udev->portnum - 1]->port_owner; } static void update_port_device_state(struct usb_device *udev) { struct usb_hub *hub; struct usb_port *port_dev; if (udev->parent) { hub = usb_hub_to_struct_hub(udev->parent); /* * The Link Layer Validation System Driver (lvstest) * has a test step to unbind the hub before running the * rest of the procedure. This triggers hub_disconnect * which will set the hub's maxchild to 0, further * resulting in usb_hub_to_struct_hub returning NULL. */ if (hub) { port_dev = hub->ports[udev->portnum - 1]; WRITE_ONCE(port_dev->state, udev->state); sysfs_notify_dirent(port_dev->state_kn); } } } static void recursively_mark_NOTATTACHED(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); int i; for (i = 0; i < udev->maxchild; ++i) { if (hub->ports[i]->child) recursively_mark_NOTATTACHED(hub->ports[i]->child); } if (udev->state == USB_STATE_SUSPENDED) udev->active_duration -= jiffies; udev->state = USB_STATE_NOTATTACHED; update_port_device_state(udev); } /** * usb_set_device_state - change a device's current state (usbcore, hcds) * @udev: pointer to device whose state should be changed * @new_state: new state value to be stored * * udev->state is _not_ fully protected by the device lock. Although * most transitions are made only while holding the lock, the state can * can change to USB_STATE_NOTATTACHED at almost any time. This * is so that devices can be marked as disconnected as soon as possible, * without having to wait for any semaphores to be released. As a result, * all changes to any device's state must be protected by the * device_state_lock spinlock. * * Once a device has been added to the device tree, all changes to its state * should be made using this routine. The state should _not_ be set directly. * * If udev->state is already USB_STATE_NOTATTACHED then no change is made. * Otherwise udev->state is set to new_state, and if new_state is * USB_STATE_NOTATTACHED then all of udev's descendants' states are also set * to USB_STATE_NOTATTACHED. */ void usb_set_device_state(struct usb_device *udev, enum usb_device_state new_state) { unsigned long flags; int wakeup = -1; spin_lock_irqsave(&device_state_lock, flags); if (udev->state == USB_STATE_NOTATTACHED) ; /* do nothing */ else if (new_state != USB_STATE_NOTATTACHED) { /* root hub wakeup capabilities are managed out-of-band * and may involve silicon errata ... ignore them here. */ if (udev->parent) { if (udev->state == USB_STATE_SUSPENDED || new_state == USB_STATE_SUSPENDED) ; /* No change to wakeup settings */ else if (new_state == USB_STATE_CONFIGURED) wakeup = (udev->quirks & USB_QUIRK_IGNORE_REMOTE_WAKEUP) ? 0 : udev->actconfig->desc.bmAttributes & USB_CONFIG_ATT_WAKEUP; else wakeup = 0; } if (udev->state == USB_STATE_SUSPENDED && new_state != USB_STATE_SUSPENDED) udev->active_duration -= jiffies; else if (new_state == USB_STATE_SUSPENDED && udev->state != USB_STATE_SUSPENDED) udev->active_duration += jiffies; udev->state = new_state; update_port_device_state(udev); } else recursively_mark_NOTATTACHED(udev); spin_unlock_irqrestore(&device_state_lock, flags); if (wakeup >= 0) device_set_wakeup_capable(&udev->dev, wakeup); } EXPORT_SYMBOL_GPL(usb_set_device_state); /* * Choose a device number. * * Device numbers are used as filenames in usbfs. On USB-1.1 and * USB-2.0 buses they are also used as device addresses, however on * USB-3.0 buses the address is assigned by the controller hardware * and it usually is not the same as the device number. * * Devices connected under xHCI are not as simple. The host controller * supports virtualization, so the hardware assigns device addresses and * the HCD must setup data structures before issuing a set address * command to the hardware. */ static void choose_devnum(struct usb_device *udev) { int devnum; struct usb_bus *bus = udev->bus; /* be safe when more hub events are proceed in parallel */ mutex_lock(&bus->devnum_next_mutex); /* Try to allocate the next devnum beginning at bus->devnum_next. */ devnum = find_next_zero_bit(bus->devmap, 128, bus->devnum_next); if (devnum >= 128) devnum = find_next_zero_bit(bus->devmap, 128, 1); bus->devnum_next = (devnum >= 127 ? 1 : devnum + 1); if (devnum < 128) { set_bit(devnum, bus->devmap); udev->devnum = devnum; } mutex_unlock(&bus->devnum_next_mutex); } static void release_devnum(struct usb_device *udev) { if (udev->devnum > 0) { clear_bit(udev->devnum, udev->bus->devmap); udev->devnum = -1; } } static void update_devnum(struct usb_device *udev, int devnum) { udev->devnum = devnum; if (!udev->devaddr) udev->devaddr = (u8)devnum; } static void hub_free_dev(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Root hubs aren't real devices, so don't free HCD resources */ if (hcd->driver->free_dev && udev->parent) hcd->driver->free_dev(hcd, udev); } static void hub_disconnect_children(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); int i; /* Free up all the children before we remove this device */ for (i = 0; i < udev->maxchild; i++) { if (hub->ports[i]->child) usb_disconnect(&hub->ports[i]->child); } } /** * usb_disconnect - disconnect a device (usbcore-internal) * @pdev: pointer to device being disconnected * * Context: task context, might sleep * * Something got disconnected. Get rid of it and all of its children. * * If *pdev is a normal device then the parent hub must already be locked. * If *pdev is a root hub then the caller must hold the usb_bus_idr_lock, * which protects the set of root hubs as well as the list of buses. * * Only hub drivers (including virtual root hub drivers for host * controllers) should ever call this. * * This call is synchronous, and may not be used in an interrupt context. */ void usb_disconnect(struct usb_device **pdev) { struct usb_port *port_dev = NULL; struct usb_device *udev = *pdev; struct usb_hub *hub = NULL; int port1 = 1; /* mark the device as inactive, so any further urb submissions for * this device (and any of its children) will fail immediately. * this quiesces everything except pending urbs. */ usb_set_device_state(udev, USB_STATE_NOTATTACHED); dev_info(&udev->dev, "USB disconnect, device number %d\n", udev->devnum); /* * Ensure that the pm runtime code knows that the USB device * is in the process of being disconnected. */ pm_runtime_barrier(&udev->dev); usb_lock_device(udev); hub_disconnect_children(udev); /* deallocate hcd/hardware state ... nuking all pending urbs and * cleaning up all state associated with the current configuration * so that the hardware is now fully quiesced. */ dev_dbg(&udev->dev, "unregistering device\n"); usb_disable_device(udev, 0); usb_hcd_synchronize_unlinks(udev); if (udev->parent) { port1 = udev->portnum; hub = usb_hub_to_struct_hub(udev->parent); port_dev = hub->ports[port1 - 1]; sysfs_remove_link(&udev->dev.kobj, "port"); sysfs_remove_link(&port_dev->dev.kobj, "device"); /* * As usb_port_runtime_resume() de-references udev, make * sure no resumes occur during removal */ if (!test_and_set_bit(port1, hub->child_usage_bits)) pm_runtime_get_sync(&port_dev->dev); typec_deattach(port_dev->connector, &udev->dev); } usb_remove_ep_devs(&udev->ep0); usb_unlock_device(udev); /* Unregister the device. The device driver is responsible * for de-configuring the device and invoking the remove-device * notifier chain (used by usbfs and possibly others). */ device_del(&udev->dev); /* Free the device number and delete the parent's children[] * (or root_hub) pointer. */ release_devnum(udev); /* Avoid races with recursively_mark_NOTATTACHED() */ spin_lock_irq(&device_state_lock); *pdev = NULL; spin_unlock_irq(&device_state_lock); if (port_dev && test_and_clear_bit(port1, hub->child_usage_bits)) pm_runtime_put(&port_dev->dev); hub_free_dev(udev); put_device(&udev->dev); } #ifdef CONFIG_USB_ANNOUNCE_NEW_DEVICES static void show_string(struct usb_device *udev, char *id, char *string) { if (!string) return; dev_info(&udev->dev, "%s: %s\n", id, string); } static void announce_device(struct usb_device *udev) { u16 bcdDevice = le16_to_cpu(udev->descriptor.bcdDevice); dev_info(&udev->dev, "New USB device found, idVendor=%04x, idProduct=%04x, bcdDevice=%2x.%02x\n", le16_to_cpu(udev->descriptor.idVendor), le16_to_cpu(udev->descriptor.idProduct), bcdDevice >> 8, bcdDevice & 0xff); dev_info(&udev->dev, "New USB device strings: Mfr=%d, Product=%d, SerialNumber=%d\n", udev->descriptor.iManufacturer, udev->descriptor.iProduct, udev->descriptor.iSerialNumber); show_string(udev, "Product", udev->product); show_string(udev, "Manufacturer", udev->manufacturer); show_string(udev, "SerialNumber", udev->serial); } #else static inline void announce_device(struct usb_device *udev) { } #endif /** * usb_enumerate_device_otg - FIXME (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * Finish enumeration for On-The-Go devices * * Return: 0 if successful. A negative error code otherwise. */ static int usb_enumerate_device_otg(struct usb_device *udev) { int err = 0; #ifdef CONFIG_USB_OTG /* * OTG-aware devices on OTG-capable root hubs may be able to use SRP, * to wake us after we've powered off VBUS; and HNP, switching roles * "host" to "peripheral". The OTG descriptor helps figure this out. */ if (!udev->bus->is_b_host && udev->config && udev->parent == udev->bus->root_hub) { struct usb_otg_descriptor *desc = NULL; struct usb_bus *bus = udev->bus; unsigned port1 = udev->portnum; /* descriptor may appear anywhere in config */ err = __usb_get_extra_descriptor(udev->rawdescriptors[0], le16_to_cpu(udev->config[0].desc.wTotalLength), USB_DT_OTG, (void **) &desc, sizeof(*desc)); if (err || !(desc->bmAttributes & USB_OTG_HNP)) return 0; dev_info(&udev->dev, "Dual-Role OTG device on %sHNP port\n", (port1 == bus->otg_port) ? "" : "non-"); /* enable HNP before suspend, it's simpler */ if (port1 == bus->otg_port) { bus->b_hnp_enable = 1; err = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, 0, USB_DEVICE_B_HNP_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (err < 0) { /* * OTG MESSAGE: report errors here, * customize to match your product. */ dev_err(&udev->dev, "can't set HNP mode: %d\n", err); bus->b_hnp_enable = 0; } } else if (desc->bLength == sizeof (struct usb_otg_descriptor)) { /* * We are operating on a legacy OTP device * These should be told that they are operating * on the wrong port if we have another port that does * support HNP */ if (bus->otg_port != 0) { /* Set a_alt_hnp_support for legacy otg device */ err = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, 0, USB_DEVICE_A_ALT_HNP_SUPPORT, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (err < 0) dev_err(&udev->dev, "set a_alt_hnp_support failed: %d\n", err); } } } #endif return err; } /** * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * This is only called by usb_new_device() -- all comments that apply there * apply here wrt to environment. * * If the device is WUSB and not authorized, we don't attempt to read * the string descriptors, as they will be errored out by the device * until it has been authorized. * * Return: 0 if successful. A negative error code otherwise. */ static int usb_enumerate_device(struct usb_device *udev) { int err; struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (udev->config == NULL) { err = usb_get_configuration(udev); if (err < 0) { if (err != -ENODEV) dev_err(&udev->dev, "can't read configurations, error %d\n", err); return err; } } /* read the standard strings and cache them if present */ udev->product = usb_cache_string(udev, udev->descriptor.iProduct); udev->manufacturer = usb_cache_string(udev, udev->descriptor.iManufacturer); udev->serial = usb_cache_string(udev, udev->descriptor.iSerialNumber); err = usb_enumerate_device_otg(udev); if (err < 0) return err; if (IS_ENABLED(CONFIG_USB_OTG_PRODUCTLIST) && hcd->tpl_support && !is_targeted(udev)) { /* Maybe it can talk to us, though we can't talk to it. * (Includes HNP test device.) */ if (IS_ENABLED(CONFIG_USB_OTG) && (udev->bus->b_hnp_enable || udev->bus->is_b_host)) { err = usb_port_suspend(udev, PMSG_AUTO_SUSPEND); if (err < 0) dev_dbg(&udev->dev, "HNP fail, %d\n", err); } return -ENOTSUPP; } usb_detect_interface_quirks(udev); return 0; } static void set_usb_port_removable(struct usb_device *udev) { struct usb_device *hdev = udev->parent; struct usb_hub *hub; u8 port = udev->portnum; u16 wHubCharacteristics; bool removable = true; dev_set_removable(&udev->dev, DEVICE_REMOVABLE_UNKNOWN); if (!hdev) return; hub = usb_hub_to_struct_hub(udev->parent); /* * If the platform firmware has provided information about a port, * use that to determine whether it's removable. */ switch (hub->ports[udev->portnum - 1]->connect_type) { case USB_PORT_CONNECT_TYPE_HOT_PLUG: dev_set_removable(&udev->dev, DEVICE_REMOVABLE); return; case USB_PORT_CONNECT_TYPE_HARD_WIRED: case USB_PORT_NOT_USED: dev_set_removable(&udev->dev, DEVICE_FIXED); return; default: break; } /* * Otherwise, check whether the hub knows whether a port is removable * or not */ wHubCharacteristics = le16_to_cpu(hub->descriptor->wHubCharacteristics); if (!(wHubCharacteristics & HUB_CHAR_COMPOUND)) return; if (hub_is_superspeed(hdev)) { if (le16_to_cpu(hub->descriptor->u.ss.DeviceRemovable) & (1 << port)) removable = false; } else { if (hub->descriptor->u.hs.DeviceRemovable[port / 8] & (1 << (port % 8))) removable = false; } if (removable) dev_set_removable(&udev->dev, DEVICE_REMOVABLE); else dev_set_removable(&udev->dev, DEVICE_FIXED); } /** * usb_new_device - perform initial device setup (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * This is called with devices which have been detected but not fully * enumerated. The device descriptor is available, but not descriptors * for any device configuration. The caller must have locked either * the parent hub (if udev is a normal device) or else the * usb_bus_idr_lock (if udev is a root hub). The parent's pointer to * udev has already been installed, but udev is not yet visible through * sysfs or other filesystem code. * * This call is synchronous, and may not be used in an interrupt context. * * Only the hub driver or root-hub registrar should ever call this. * * Return: Whether the device is configured properly or not. Zero if the * interface was registered with the driver core; else a negative errno * value. * */ int usb_new_device(struct usb_device *udev) { int err; if (udev->parent) { /* Initialize non-root-hub device wakeup to disabled; * device (un)configuration controls wakeup capable * sysfs power/wakeup controls wakeup enabled/disabled */ device_init_wakeup(&udev->dev, 0); } /* Tell the runtime-PM framework the device is active */ pm_runtime_set_active(&udev->dev); pm_runtime_get_noresume(&udev->dev); pm_runtime_use_autosuspend(&udev->dev); pm_runtime_enable(&udev->dev); /* By default, forbid autosuspend for all devices. It will be * allowed for hubs during binding. */ usb_disable_autosuspend(udev); err = usb_enumerate_device(udev); /* Read descriptors */ if (err < 0) goto fail; dev_dbg(&udev->dev, "udev %d, busnum %d, minor = %d\n", udev->devnum, udev->bus->busnum, (((udev->bus->busnum-1) * 128) + (udev->devnum-1))); /* export the usbdev device-node for libusb */ udev->dev.devt = MKDEV(USB_DEVICE_MAJOR, (((udev->bus->busnum-1) * 128) + (udev->devnum-1))); /* Tell the world! */ announce_device(udev); if (udev->serial) add_device_randomness(udev->serial, strlen(udev->serial)); if (udev->product) add_device_randomness(udev->product, strlen(udev->product)); if (udev->manufacturer) add_device_randomness(udev->manufacturer, strlen(udev->manufacturer)); device_enable_async_suspend(&udev->dev); /* check whether the hub or firmware marks this port as non-removable */ set_usb_port_removable(udev); /* Register the device. The device driver is responsible * for configuring the device and invoking the add-device * notifier chain (used by usbfs and possibly others). */ err = device_add(&udev->dev); if (err) { dev_err(&udev->dev, "can't device_add, error %d\n", err); goto fail; } /* Create link files between child device and usb port device. */ if (udev->parent) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); int port1 = udev->portnum; struct usb_port *port_dev = hub->ports[port1 - 1]; err = sysfs_create_link(&udev->dev.kobj, &port_dev->dev.kobj, "port"); if (err) goto fail; err = sysfs_create_link(&port_dev->dev.kobj, &udev->dev.kobj, "device"); if (err) { sysfs_remove_link(&udev->dev.kobj, "port"); goto fail; } if (!test_and_set_bit(port1, hub->child_usage_bits)) pm_runtime_get_sync(&port_dev->dev); typec_attach(port_dev->connector, &udev->dev); } (void) usb_create_ep_devs(&udev->dev, &udev->ep0, udev); usb_mark_last_busy(udev); pm_runtime_put_sync_autosuspend(&udev->dev); return err; fail: usb_set_device_state(udev, USB_STATE_NOTATTACHED); pm_runtime_disable(&udev->dev); pm_runtime_set_suspended(&udev->dev); return err; } /** * usb_deauthorize_device - deauthorize a device (usbcore-internal) * @usb_dev: USB device * * Move the USB device to a very basic state where interfaces are disabled * and the device is in fact unconfigured and unusable. * * We share a lock (that we have) with device_del(), so we need to * defer its call. * * Return: 0. */ int usb_deauthorize_device(struct usb_device *usb_dev) { usb_lock_device(usb_dev); if (usb_dev->authorized == 0) goto out_unauthorized; usb_dev->authorized = 0; usb_set_configuration(usb_dev, -1); out_unauthorized: usb_unlock_device(usb_dev); return 0; } int usb_authorize_device(struct usb_device *usb_dev) { int result = 0, c; usb_lock_device(usb_dev); if (usb_dev->authorized == 1) goto out_authorized; result = usb_autoresume_device(usb_dev); if (result < 0) { dev_err(&usb_dev->dev, "can't autoresume for authorization: %d\n", result); goto error_autoresume; } usb_dev->authorized = 1; /* Choose and set the configuration. This registers the interfaces * with the driver core and lets interface drivers bind to them. */ c = usb_choose_configuration(usb_dev); if (c >= 0) { result = usb_set_configuration(usb_dev, c); if (result) { dev_err(&usb_dev->dev, "can't set config #%d, error %d\n", c, result); /* This need not be fatal. The user can try to * set other configurations. */ } } dev_info(&usb_dev->dev, "authorized to connect\n"); usb_autosuspend_device(usb_dev); error_autoresume: out_authorized: usb_unlock_device(usb_dev); /* complements locktree */ return result; } /** * get_port_ssp_rate - Match the extended port status to SSP rate * @hdev: The hub device * @ext_portstatus: extended port status * * Match the extended port status speed id to the SuperSpeed Plus sublink speed * capability attributes. Base on the number of connected lanes and speed, * return the corresponding enum usb_ssp_rate. */ static enum usb_ssp_rate get_port_ssp_rate(struct usb_device *hdev, u32 ext_portstatus) { struct usb_ssp_cap_descriptor *ssp_cap; u32 attr; u8 speed_id; u8 ssac; u8 lanes; int i; if (!hdev->bos) goto out; ssp_cap = hdev->bos->ssp_cap; if (!ssp_cap) goto out; speed_id = ext_portstatus & USB_EXT_PORT_STAT_RX_SPEED_ID; lanes = USB_EXT_PORT_RX_LANES(ext_portstatus) + 1; ssac = le32_to_cpu(ssp_cap->bmAttributes) & USB_SSP_SUBLINK_SPEED_ATTRIBS; for (i = 0; i <= ssac; i++) { u8 ssid; attr = le32_to_cpu(ssp_cap->bmSublinkSpeedAttr[i]); ssid = FIELD_GET(USB_SSP_SUBLINK_SPEED_SSID, attr); if (speed_id == ssid) { u16 mantissa; u8 lse; u8 type; /* * Note: currently asymmetric lane types are only * applicable for SSIC operate in SuperSpeed protocol */ type = FIELD_GET(USB_SSP_SUBLINK_SPEED_ST, attr); if (type == USB_SSP_SUBLINK_SPEED_ST_ASYM_RX || type == USB_SSP_SUBLINK_SPEED_ST_ASYM_TX) goto out; if (FIELD_GET(USB_SSP_SUBLINK_SPEED_LP, attr) != USB_SSP_SUBLINK_SPEED_LP_SSP) goto out; lse = FIELD_GET(USB_SSP_SUBLINK_SPEED_LSE, attr); mantissa = FIELD_GET(USB_SSP_SUBLINK_SPEED_LSM, attr); /* Convert to Gbps */ for (; lse < USB_SSP_SUBLINK_SPEED_LSE_GBPS; lse++) mantissa /= 1000; if (mantissa >= 10 && lanes == 1) return USB_SSP_GEN_2x1; if (mantissa >= 10 && lanes == 2) return USB_SSP_GEN_2x2; if (mantissa >= 5 && lanes == 2) return USB_SSP_GEN_1x2; goto out; } } out: return USB_SSP_GEN_UNKNOWN; } #ifdef CONFIG_USB_FEW_INIT_RETRIES #define PORT_RESET_TRIES 2 #define SET_ADDRESS_TRIES 1 #define GET_DESCRIPTOR_TRIES 1 #define GET_MAXPACKET0_TRIES 1 #define PORT_INIT_TRIES 4 #else #define PORT_RESET_TRIES 5 #define SET_ADDRESS_TRIES 2 #define GET_DESCRIPTOR_TRIES 2 #define GET_MAXPACKET0_TRIES 3 #define PORT_INIT_TRIES 4 #endif /* CONFIG_USB_FEW_INIT_RETRIES */ #define DETECT_DISCONNECT_TRIES 5 #define HUB_ROOT_RESET_TIME 60 /* times are in msec */ #define HUB_SHORT_RESET_TIME 10 #define HUB_BH_RESET_TIME 50 #define HUB_LONG_RESET_TIME 200 #define HUB_RESET_TIMEOUT 800 static bool use_new_scheme(struct usb_device *udev, int retry, struct usb_port *port_dev) { int old_scheme_first_port = (port_dev->quirks & USB_PORT_QUIRK_OLD_SCHEME) || old_scheme_first; /* * "New scheme" enumeration causes an extra state transition to be * exposed to an xhci host and causes USB3 devices to receive control * commands in the default state. This has been seen to cause * enumeration failures, so disable this enumeration scheme for USB3 * devices. */ if (udev->speed >= USB_SPEED_SUPER) return false; /* * If use_both_schemes is set, use the first scheme (whichever * it is) for the larger half of the retries, then use the other * scheme. Otherwise, use the first scheme for all the retries. */ if (use_both_schemes && retry >= (PORT_INIT_TRIES + 1) / 2) return old_scheme_first_port; /* Second half */ return !old_scheme_first_port; /* First half or all */ } /* Is a USB 3.0 port in the Inactive or Compliance Mode state? * Port warm reset is required to recover */ static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, u16 portstatus) { u16 link_state; if (!hub_is_superspeed(hub->hdev)) return false; if (test_bit(port1, hub->warm_reset_bits)) return true; link_state = portstatus & USB_PORT_STAT_LINK_STATE; return link_state == USB_SS_PORT_LS_SS_INACTIVE || link_state == USB_SS_PORT_LS_COMP_MOD; } static int hub_port_wait_reset(struct usb_hub *hub, int port1, struct usb_device *udev, unsigned int delay, bool warm) { int delay_time, ret; u16 portstatus; u16 portchange; u32 ext_portstatus = 0; for (delay_time = 0; delay_time < HUB_RESET_TIMEOUT; delay_time += delay) { /* wait to give the device a chance to reset */ msleep(delay); /* read and decode port status */ if (hub_is_superspeedplus(hub->hdev)) ret = hub_ext_port_status(hub, port1, HUB_EXT_PORT_STATUS, &portstatus, &portchange, &ext_portstatus); else ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (ret < 0) return ret; /* * The port state is unknown until the reset completes. * * On top of that, some chips may require additional time * to re-establish a connection after the reset is complete, * so also wait for the connection to be re-established. */ if (!(portstatus & USB_PORT_STAT_RESET) && (portstatus & USB_PORT_STAT_CONNECTION)) break; /* switch to the long delay after two short delay failures */ if (delay_time >= 2 * HUB_SHORT_RESET_TIME) delay = HUB_LONG_RESET_TIME; dev_dbg(&hub->ports[port1 - 1]->dev, "not %sreset yet, waiting %dms\n", warm ? "warm " : "", delay); } if ((portstatus & USB_PORT_STAT_RESET)) return -EBUSY; if (hub_port_warm_reset_required(hub, port1, portstatus)) return -ENOTCONN; /* Device went away? */ if (!(portstatus & USB_PORT_STAT_CONNECTION)) return -ENOTCONN; /* Retry if connect change is set but status is still connected. * A USB 3.0 connection may bounce if multiple warm resets were issued, * but the device may have successfully re-connected. Ignore it. */ if (!hub_is_superspeed(hub->hdev) && (portchange & USB_PORT_STAT_C_CONNECTION)) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); return -EAGAIN; } if (!(portstatus & USB_PORT_STAT_ENABLE)) return -EBUSY; if (!udev) return 0; if (hub_is_superspeedplus(hub->hdev)) { /* extended portstatus Rx and Tx lane count are zero based */ udev->rx_lanes = USB_EXT_PORT_RX_LANES(ext_portstatus) + 1; udev->tx_lanes = USB_EXT_PORT_TX_LANES(ext_portstatus) + 1; udev->ssp_rate = get_port_ssp_rate(hub->hdev, ext_portstatus); } else { udev->rx_lanes = 1; udev->tx_lanes = 1; udev->ssp_rate = USB_SSP_GEN_UNKNOWN; } if (udev->ssp_rate != USB_SSP_GEN_UNKNOWN) udev->speed = USB_SPEED_SUPER_PLUS; else if (hub_is_superspeed(hub->hdev)) udev->speed = USB_SPEED_SUPER; else if (portstatus & USB_PORT_STAT_HIGH_SPEED) udev->speed = USB_SPEED_HIGH; else if (portstatus & USB_PORT_STAT_LOW_SPEED) udev->speed = USB_SPEED_LOW; else udev->speed = USB_SPEED_FULL; return 0; } /* Handle port reset and port warm(BH) reset (for USB3 protocol ports) */ static int hub_port_reset(struct usb_hub *hub, int port1, struct usb_device *udev, unsigned int delay, bool warm) { int i, status; u16 portchange, portstatus; struct usb_port *port_dev = hub->ports[port1 - 1]; int reset_recovery_time; if (!hub_is_superspeed(hub->hdev)) { if (warm) { dev_err(hub->intfdev, "only USB3 hub support " "warm reset\n"); return -EINVAL; } /* Block EHCI CF initialization during the port reset. * Some companion controllers don't like it when they mix. */ down_read(&ehci_cf_port_reset_rwsem); } else if (!warm) { /* * If the caller hasn't explicitly requested a warm reset, * double check and see if one is needed. */ if (usb_hub_port_status(hub, port1, &portstatus, &portchange) == 0) if (hub_port_warm_reset_required(hub, port1, portstatus)) warm = true; } clear_bit(port1, hub->warm_reset_bits); /* Reset the port */ for (i = 0; i < PORT_RESET_TRIES; i++) { status = set_port_feature(hub->hdev, port1, (warm ? USB_PORT_FEAT_BH_PORT_RESET : USB_PORT_FEAT_RESET)); if (status == -ENODEV) { ; /* The hub is gone */ } else if (status) { dev_err(&port_dev->dev, "cannot %sreset (err = %d)\n", warm ? "warm " : "", status); } else { status = hub_port_wait_reset(hub, port1, udev, delay, warm); if (status && status != -ENOTCONN && status != -ENODEV) dev_dbg(hub->intfdev, "port_wait_reset: err = %d\n", status); } /* * Check for disconnect or reset, and bail out after several * reset attempts to avoid warm reset loop. */ if (status == 0 || status == -ENOTCONN || status == -ENODEV || (status == -EBUSY && i == PORT_RESET_TRIES - 1)) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_RESET); if (!hub_is_superspeed(hub->hdev)) goto done; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); if (udev) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); /* * If a USB 3.0 device migrates from reset to an error * state, re-issue the warm reset. */ if (usb_hub_port_status(hub, port1, &portstatus, &portchange) < 0) goto done; if (!hub_port_warm_reset_required(hub, port1, portstatus)) goto done; /* * If the port is in SS.Inactive or Compliance Mode, the * hot or warm reset failed. Try another warm reset. */ if (!warm) { dev_dbg(&port_dev->dev, "hot reset failed, warm reset\n"); warm = true; } } dev_dbg(&port_dev->dev, "not enabled, trying %sreset again...\n", warm ? "warm " : ""); delay = HUB_LONG_RESET_TIME; } dev_err(&port_dev->dev, "Cannot enable. Maybe the USB cable is bad?\n"); done: if (status == 0) { if (port_dev->quirks & USB_PORT_QUIRK_FAST_ENUM) usleep_range(10000, 12000); else { /* TRSTRCY = 10 ms; plus some extra */ reset_recovery_time = 10 + 40; /* Hub needs extra delay after resetting its port. */ if (hub->hdev->quirks & USB_QUIRK_HUB_SLOW_RESET) reset_recovery_time += 100; msleep(reset_recovery_time); } if (udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); update_devnum(udev, 0); /* The xHC may think the device is already reset, * so ignore the status. */ if (hcd->driver->reset_device) hcd->driver->reset_device(hcd, udev); usb_set_device_state(udev, USB_STATE_DEFAULT); } } else { if (udev) usb_set_device_state(udev, USB_STATE_NOTATTACHED); } if (!hub_is_superspeed(hub->hdev)) up_read(&ehci_cf_port_reset_rwsem); return status; } /* * hub_port_stop_enumerate - stop USB enumeration or ignore port events * @hub: target hub * @port1: port num of the port * @retries: port retries number of hub_port_init() * * Return: * true: ignore port actions/events or give up connection attempts. * false: keep original behavior. * * This function will be based on retries to check whether the port which is * marked with early_stop attribute would stop enumeration or ignore events. * * Note: * This function didn't change anything if early_stop is not set, and it will * prevent all connection attempts when early_stop is set and the attempts of * the port are more than 1. */ static bool hub_port_stop_enumerate(struct usb_hub *hub, int port1, int retries) { struct usb_port *port_dev = hub->ports[port1 - 1]; if (port_dev->early_stop) { if (port_dev->ignore_event) return true; /* * We want unsuccessful attempts to fail quickly. * Since some devices may need one failure during * port initialization, we allow two tries but no * more. */ if (retries < 2) return false; port_dev->ignore_event = 1; } else port_dev->ignore_event = 0; return port_dev->ignore_event; } /* Check if a port is power on */ int usb_port_is_power_on(struct usb_hub *hub, unsigned int portstatus) { int ret = 0; if (hub_is_superspeed(hub->hdev)) { if (portstatus & USB_SS_PORT_STAT_POWER) ret = 1; } else { if (portstatus & USB_PORT_STAT_POWER) ret = 1; } return ret; } static void usb_lock_port(struct usb_port *port_dev) __acquires(&port_dev->status_lock) { mutex_lock(&port_dev->status_lock); __acquire(&port_dev->status_lock); } static void usb_unlock_port(struct usb_port *port_dev) __releases(&port_dev->status_lock) { mutex_unlock(&port_dev->status_lock); __release(&port_dev->status_lock); } #ifdef CONFIG_PM /* Check if a port is suspended(USB2.0 port) or in U3 state(USB3.0 port) */ static int port_is_suspended(struct usb_hub *hub, unsigned portstatus) { int ret = 0; if (hub_is_superspeed(hub->hdev)) { if ((portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_U3) ret = 1; } else { if (portstatus & USB_PORT_STAT_SUSPEND) ret = 1; } return ret; } /* Determine whether the device on a port is ready for a normal resume, * is ready for a reset-resume, or should be disconnected. */ static int check_port_resume_type(struct usb_device *udev, struct usb_hub *hub, int port1, int status, u16 portchange, u16 portstatus) { struct usb_port *port_dev = hub->ports[port1 - 1]; int retries = 3; retry: /* Is a warm reset needed to recover the connection? */ if (status == 0 && udev->reset_resume && hub_port_warm_reset_required(hub, port1, portstatus)) { /* pass */; } /* Is the device still present? */ else if (status || port_is_suspended(hub, portstatus) || !usb_port_is_power_on(hub, portstatus)) { if (status >= 0) status = -ENODEV; } else if (!(portstatus & USB_PORT_STAT_CONNECTION)) { if (retries--) { usleep_range(200, 300); status = usb_hub_port_status(hub, port1, &portstatus, &portchange); goto retry; } status = -ENODEV; } /* Can't do a normal resume if the port isn't enabled, * so try a reset-resume instead. */ else if (!(portstatus & USB_PORT_STAT_ENABLE) && !udev->reset_resume) { if (udev->persist_enabled) udev->reset_resume = 1; else status = -ENODEV; } if (status) { dev_dbg(&port_dev->dev, "status %04x.%04x after resume, %d\n", portchange, portstatus, status); } else if (udev->reset_resume) { /* Late port handoff can set status-change bits */ if (portchange & USB_PORT_STAT_C_CONNECTION) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); if (portchange & USB_PORT_STAT_C_ENABLE) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_ENABLE); /* * Whatever made this reset-resume necessary may have * turned on the port1 bit in hub->change_bits. But after * a successful reset-resume we want the bit to be clear; * if it was on it would indicate that something happened * following the reset-resume. */ clear_bit(port1, hub->change_bits); } return status; } int usb_disable_ltm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Check if the roothub and device supports LTM. */ if (!usb_device_supports_ltm(hcd->self.root_hub) || !usb_device_supports_ltm(udev)) return 0; /* Clear Feature LTM Enable can only be sent if the device is * configured. */ if (!udev->actconfig) return 0; return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_LTM_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } EXPORT_SYMBOL_GPL(usb_disable_ltm); void usb_enable_ltm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Check if the roothub and device supports LTM. */ if (!usb_device_supports_ltm(hcd->self.root_hub) || !usb_device_supports_ltm(udev)) return; /* Set Feature LTM Enable can only be sent if the device is * configured. */ if (!udev->actconfig) return; usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_LTM_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } EXPORT_SYMBOL_GPL(usb_enable_ltm); /* * usb_enable_remote_wakeup - enable remote wakeup for a device * @udev: target device * * For USB-2 devices: Set the device's remote wakeup feature. * * For USB-3 devices: Assume there's only one function on the device and * enable remote wake for the first interface. FIXME if the interface * association descriptor shows there's more than one function. */ static int usb_enable_remote_wakeup(struct usb_device *udev) { if (udev->speed < USB_SPEED_SUPER) return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_REMOTE_WAKEUP, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); else return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_INTERFACE, USB_INTRF_FUNC_SUSPEND, USB_INTRF_FUNC_SUSPEND_RW | USB_INTRF_FUNC_SUSPEND_LP, NULL, 0, USB_CTRL_SET_TIMEOUT); } /* * usb_disable_remote_wakeup - disable remote wakeup for a device * @udev: target device * * For USB-2 devices: Clear the device's remote wakeup feature. * * For USB-3 devices: Assume there's only one function on the device and * disable remote wake for the first interface. FIXME if the interface * association descriptor shows there's more than one function. */ static int usb_disable_remote_wakeup(struct usb_device *udev) { if (udev->speed < USB_SPEED_SUPER) return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_REMOTE_WAKEUP, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); else return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_INTERFACE, USB_INTRF_FUNC_SUSPEND, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } /* Count of wakeup-enabled devices at or below udev */ unsigned usb_wakeup_enabled_descendants(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); return udev->do_remote_wakeup + (hub ? hub->wakeup_enabled_descendants : 0); } EXPORT_SYMBOL_GPL(usb_wakeup_enabled_descendants); /* * usb_port_suspend - suspend a usb device's upstream port * @udev: device that's no longer in active use, not a root hub * Context: must be able to sleep; device not locked; pm locks held * * Suspends a USB device that isn't in active use, conserving power. * Devices may wake out of a suspend, if anything important happens, * using the remote wakeup mechanism. They may also be taken out of * suspend by the host, using usb_port_resume(). It's also routine * to disconnect devices while they are suspended. * * This only affects the USB hardware for a device; its interfaces * (and, for hubs, child devices) must already have been suspended. * * Selective port suspend reduces power; most suspended devices draw * less than 500 uA. It's also used in OTG, along with remote wakeup. * All devices below the suspended port are also suspended. * * Devices leave suspend state when the host wakes them up. Some devices * also support "remote wakeup", where the device can activate the USB * tree above them to deliver data, such as a keypress or packet. In * some cases, this wakes the USB host. * * Suspending OTG devices may trigger HNP, if that's been enabled * between a pair of dual-role devices. That will change roles, such * as from A-Host to A-Peripheral or from B-Host back to B-Peripheral. * * Devices on USB hub ports have only one "suspend" state, corresponding * to ACPI D2, "may cause the device to lose some context". * State transitions include: * * - suspend, resume ... when the VBUS power link stays live * - suspend, disconnect ... VBUS lost * * Once VBUS drop breaks the circuit, the port it's using has to go through * normal re-enumeration procedures, starting with enabling VBUS power. * Other than re-initializing the hub (plug/unplug, except for root hubs), * Linux (2.6) currently has NO mechanisms to initiate that: no hub_wq * timer, no SRP, no requests through sysfs. * * If Runtime PM isn't enabled or used, non-SuperSpeed devices may not get * suspended until their bus goes into global suspend (i.e., the root * hub is suspended). Nevertheless, we change @udev->state to * USB_STATE_SUSPENDED as this is the device's "logical" state. The actual * upstream port setting is stored in @udev->port_is_suspended. * * Returns 0 on success, else negative errno. */ int usb_port_suspend(struct usb_device *udev, pm_message_t msg) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); struct usb_port *port_dev = hub->ports[udev->portnum - 1]; int port1 = udev->portnum; int status; bool really_suspend = true; usb_lock_port(port_dev); /* enable remote wakeup when appropriate; this lets the device * wake up the upstream hub (including maybe the root hub). * * NOTE: OTG devices may issue remote wakeup (or SRP) even when * we don't explicitly enable it here. */ if (udev->do_remote_wakeup) { status = usb_enable_remote_wakeup(udev); if (status) { dev_dbg(&udev->dev, "won't remote wakeup, status %d\n", status); /* bail if autosuspend is requested */ if (PMSG_IS_AUTO(msg)) goto err_wakeup; } } /* disable USB2 hardware LPM */ usb_disable_usb2_hardware_lpm(udev); if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n"); status = -ENOMEM; if (PMSG_IS_AUTO(msg)) goto err_ltm; } /* see 7.1.7.6 */ if (hub_is_superspeed(hub->hdev)) status = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_U3); /* * For system suspend, we do not need to enable the suspend feature * on individual USB-2 ports. The devices will automatically go * into suspend a few ms after the root hub stops sending packets. * The USB 2.0 spec calls this "global suspend". * * However, many USB hubs have a bug: They don't relay wakeup requests * from a downstream port if the port's suspend feature isn't on. * Therefore we will turn on the suspend feature if udev or any of its * descendants is enabled for remote wakeup. */ else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0) status = set_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND); else { really_suspend = false; status = 0; } if (status) { /* Check if the port has been suspended for the timeout case * to prevent the suspended port from incorrect handling. */ if (status == -ETIMEDOUT) { int ret; u16 portstatus, portchange; portstatus = portchange = 0; ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); dev_dbg(&port_dev->dev, "suspend timeout, status %04x\n", portstatus); if (ret == 0 && port_is_suspended(hub, portstatus)) { status = 0; goto suspend_done; } } dev_dbg(&port_dev->dev, "can't suspend, status %d\n", status); /* Try to enable USB3 LTM again */ usb_enable_ltm(udev); err_ltm: /* Try to enable USB2 hardware LPM again */ usb_enable_usb2_hardware_lpm(udev); if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); err_wakeup: /* System sleep transitions should never fail */ if (!PMSG_IS_AUTO(msg)) status = 0; } else { suspend_done: dev_dbg(&udev->dev, "usb %ssuspend, wakeup %d\n", (PMSG_IS_AUTO(msg) ? "auto-" : ""), udev->do_remote_wakeup); if (really_suspend) { udev->port_is_suspended = 1; /* device has up to 10 msec to fully suspend */ msleep(10); } usb_set_device_state(udev, USB_STATE_SUSPENDED); } if (status == 0 && !udev->do_remote_wakeup && udev->persist_enabled && test_and_clear_bit(port1, hub->child_usage_bits)) pm_runtime_put_sync(&port_dev->dev); usb_mark_last_busy(hub->hdev); usb_unlock_port(port_dev); return status; } /* * If the USB "suspend" state is in use (rather than "global suspend"), * many devices will be individually taken out of suspend state using * special "resume" signaling. This routine kicks in shortly after * hardware resume signaling is finished, either because of selective * resume (by host) or remote wakeup (by device) ... now see what changed * in the tree that's rooted at this device. * * If @udev->reset_resume is set then the device is reset before the * status check is done. */ static int finish_port_resume(struct usb_device *udev) { int status = 0; u16 devstatus = 0; /* caller owns the udev device lock */ dev_dbg(&udev->dev, "%s\n", udev->reset_resume ? "finish reset-resume" : "finish resume"); /* usb ch9 identifies four variants of SUSPENDED, based on what * state the device resumes to. Linux currently won't see the * first two on the host side; they'd be inside hub_port_init() * during many timeouts, but hub_wq can't suspend until later. */ usb_set_device_state(udev, udev->actconfig ? USB_STATE_CONFIGURED : USB_STATE_ADDRESS); /* 10.5.4.5 says not to reset a suspended port if the attached * device is enabled for remote wakeup. Hence the reset * operation is carried out here, after the port has been * resumed. */ if (udev->reset_resume) { /* * If the device morphs or switches modes when it is reset, * we don't want to perform a reset-resume. We'll fail the * resume, which will cause a logical disconnect, and then * the device will be rediscovered. */ retry_reset_resume: if (udev->quirks & USB_QUIRK_RESET) status = -ENODEV; else status = usb_reset_and_verify_device(udev); } /* 10.5.4.5 says be sure devices in the tree are still there. * For now let's assume the device didn't go crazy on resume, * and device drivers will know about any resume quirks. */ if (status == 0) { devstatus = 0; status = usb_get_std_status(udev, USB_RECIP_DEVICE, 0, &devstatus); /* If a normal resume failed, try doing a reset-resume */ if (status && !udev->reset_resume && udev->persist_enabled) { dev_dbg(&udev->dev, "retry with reset-resume\n"); udev->reset_resume = 1; goto retry_reset_resume; } } if (status) { dev_dbg(&udev->dev, "gone after usb resume? status %d\n", status); /* * There are a few quirky devices which violate the standard * by claiming to have remote wakeup enabled after a reset, * which crash if the feature is cleared, hence check for * udev->reset_resume */ } else if (udev->actconfig && !udev->reset_resume) { if (udev->speed < USB_SPEED_SUPER) { if (devstatus & (1 << USB_DEVICE_REMOTE_WAKEUP)) status = usb_disable_remote_wakeup(udev); } else { status = usb_get_std_status(udev, USB_RECIP_INTERFACE, 0, &devstatus); if (!status && devstatus & (USB_INTRF_STAT_FUNC_RW_CAP | USB_INTRF_STAT_FUNC_RW)) status = usb_disable_remote_wakeup(udev); } if (status) dev_dbg(&udev->dev, "disable remote wakeup, status %d\n", status); status = 0; } return status; } /* * There are some SS USB devices which take longer time for link training. * XHCI specs 4.19.4 says that when Link training is successful, port * sets CCS bit to 1. So if SW reads port status before successful link * training, then it will not find device to be present. * USB Analyzer log with such buggy devices show that in some cases * device switch on the RX termination after long delay of host enabling * the VBUS. In few other cases it has been seen that device fails to * negotiate link training in first attempt. It has been * reported till now that few devices take as long as 2000 ms to train * the link after host enabling its VBUS and termination. Following * routine implements a 2000 ms timeout for link training. If in a case * link trains before timeout, loop will exit earlier. * * There are also some 2.0 hard drive based devices and 3.0 thumb * drives that, when plugged into a 2.0 only port, take a long * time to set CCS after VBUS enable. * * FIXME: If a device was connected before suspend, but was removed * while system was asleep, then the loop in the following routine will * only exit at timeout. * * This routine should only be called when persist is enabled. */ static int wait_for_connected(struct usb_device *udev, struct usb_hub *hub, int port1, u16 *portchange, u16 *portstatus) { int status = 0, delay_ms = 0; while (delay_ms < 2000) { if (status || *portstatus & USB_PORT_STAT_CONNECTION) break; if (!usb_port_is_power_on(hub, *portstatus)) { status = -ENODEV; break; } msleep(20); delay_ms += 20; status = usb_hub_port_status(hub, port1, portstatus, portchange); } dev_dbg(&udev->dev, "Waited %dms for CONNECT\n", delay_ms); return status; } /* * usb_port_resume - re-activate a suspended usb device's upstream port * @udev: device to re-activate, not a root hub * Context: must be able to sleep; device not locked; pm locks held * * This will re-activate the suspended device, increasing power usage * while letting drivers communicate again with its endpoints. * USB resume explicitly guarantees that the power session between * the host and the device is the same as it was when the device * suspended. * * If @udev->reset_resume is set then this routine won't check that the * port is still enabled. Furthermore, finish_port_resume() above will * reset @udev. The end result is that a broken power session can be * recovered and @udev will appear to persist across a loss of VBUS power. * * For example, if a host controller doesn't maintain VBUS suspend current * during a system sleep or is reset when the system wakes up, all the USB * power sessions below it will be broken. This is especially troublesome * for mass-storage devices containing mounted filesystems, since the * device will appear to have disconnected and all the memory mappings * to it will be lost. Using the USB_PERSIST facility, the device can be * made to appear as if it had not disconnected. * * This facility can be dangerous. Although usb_reset_and_verify_device() makes * every effort to insure that the same device is present after the * reset as before, it cannot provide a 100% guarantee. Furthermore it's * quite possible for a device to remain unaltered but its media to be * changed. If the user replaces a flash memory card while the system is * asleep, he will have only himself to blame when the filesystem on the * new card is corrupted and the system crashes. * * Returns 0 on success, else negative errno. */ int usb_port_resume(struct usb_device *udev, pm_message_t msg) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); struct usb_port *port_dev = hub->ports[udev->portnum - 1]; int port1 = udev->portnum; int status; u16 portchange, portstatus; if (!test_and_set_bit(port1, hub->child_usage_bits)) { status = pm_runtime_resume_and_get(&port_dev->dev); if (status < 0) { dev_dbg(&udev->dev, "can't resume usb port, status %d\n", status); return status; } } usb_lock_port(port_dev); /* Skip the initial Clear-Suspend step for a remote wakeup */ status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (status == 0 && !port_is_suspended(hub, portstatus)) { if (portchange & USB_PORT_STAT_C_SUSPEND) pm_wakeup_event(&udev->dev, 0); goto SuspendCleared; } /* see 7.1.7.7; affects power usage, but not budgeting */ if (hub_is_superspeed(hub->hdev)) status = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_U0); else status = usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND); if (status) { dev_dbg(&port_dev->dev, "can't resume, status %d\n", status); } else { /* drive resume for USB_RESUME_TIMEOUT msec */ dev_dbg(&udev->dev, "usb %sresume\n", (PMSG_IS_AUTO(msg) ? "auto-" : "")); msleep(USB_RESUME_TIMEOUT); /* Virtual root hubs can trigger on GET_PORT_STATUS to * stop resume signaling. Then finish the resume * sequence. */ status = usb_hub_port_status(hub, port1, &portstatus, &portchange); } SuspendCleared: if (status == 0) { udev->port_is_suspended = 0; if (hub_is_superspeed(hub->hdev)) { if (portchange & USB_PORT_STAT_C_LINK_STATE) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); } else { if (portchange & USB_PORT_STAT_C_SUSPEND) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_SUSPEND); } /* TRSMRCY = 10 msec */ msleep(10); } if (udev->persist_enabled) status = wait_for_connected(udev, hub, port1, &portchange, &portstatus); status = check_port_resume_type(udev, hub, port1, status, portchange, portstatus); if (status == 0) status = finish_port_resume(udev); if (status < 0) { dev_dbg(&udev->dev, "can't resume, status %d\n", status); hub_port_logical_disconnect(hub, port1); } else { /* Try to enable USB2 hardware LPM */ usb_enable_usb2_hardware_lpm(udev); /* Try to enable USB3 LTM */ usb_enable_ltm(udev); } usb_unlock_port(port_dev); return status; } int usb_remote_wakeup(struct usb_device *udev) { int status = 0; usb_lock_device(udev); if (udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); status = usb_autoresume_device(udev); if (status == 0) { /* Let the drivers do their thing, then... */ usb_autosuspend_device(udev); } } usb_unlock_device(udev); return status; } /* Returns 1 if there was a remote wakeup and a connect status change. */ static int hub_handle_remote_wakeup(struct usb_hub *hub, unsigned int port, u16 portstatus, u16 portchange) __must_hold(&port_dev->status_lock) { struct usb_port *port_dev = hub->ports[port - 1]; struct usb_device *hdev; struct usb_device *udev; int connect_change = 0; u16 link_state; int ret; hdev = hub->hdev; udev = port_dev->child; if (!hub_is_superspeed(hdev)) { if (!(portchange & USB_PORT_STAT_C_SUSPEND)) return 0; usb_clear_port_feature(hdev, port, USB_PORT_FEAT_C_SUSPEND); } else { link_state = portstatus & USB_PORT_STAT_LINK_STATE; if (!udev || udev->state != USB_STATE_SUSPENDED || (link_state != USB_SS_PORT_LS_U0 && link_state != USB_SS_PORT_LS_U1 && link_state != USB_SS_PORT_LS_U2)) return 0; } if (udev) { /* TRSMRCY = 10 msec */ msleep(10); usb_unlock_port(port_dev); ret = usb_remote_wakeup(udev); usb_lock_port(port_dev); if (ret < 0) connect_change = 1; } else { ret = -ENODEV; hub_port_disable(hub, port, 1); } dev_dbg(&port_dev->dev, "resume, status %d\n", ret); return connect_change; } static int check_ports_changed(struct usb_hub *hub) { int port1; for (port1 = 1; port1 <= hub->hdev->maxchild; ++port1) { u16 portstatus, portchange; int status; status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (!status && portchange) return 1; } return 0; } static int hub_suspend(struct usb_interface *intf, pm_message_t msg) { struct usb_hub *hub = usb_get_intfdata(intf); struct usb_device *hdev = hub->hdev; unsigned port1; /* * Warn if children aren't already suspended. * Also, add up the number of wakeup-enabled descendants. */ hub->wakeup_enabled_descendants = 0; for (port1 = 1; port1 <= hdev->maxchild; port1++) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; if (udev && udev->can_submit) { dev_warn(&port_dev->dev, "device %s not suspended yet\n", dev_name(&udev->dev)); if (PMSG_IS_AUTO(msg)) return -EBUSY; } if (udev) hub->wakeup_enabled_descendants += usb_wakeup_enabled_descendants(udev); } if (hdev->do_remote_wakeup && hub->quirk_check_port_auto_suspend) { /* check if there are changes pending on hub ports */ if (check_ports_changed(hub)) { if (PMSG_IS_AUTO(msg)) return -EBUSY; pm_wakeup_event(&hdev->dev, 2000); } } if (hub_is_superspeed(hdev) && hdev->do_remote_wakeup) { /* Enable hub to send remote wakeup for all ports. */ for (port1 = 1; port1 <= hdev->maxchild; port1++) { set_port_feature(hdev, port1 | USB_PORT_FEAT_REMOTE_WAKE_CONNECT | USB_PORT_FEAT_REMOTE_WAKE_DISCONNECT | USB_PORT_FEAT_REMOTE_WAKE_OVER_CURRENT, USB_PORT_FEAT_REMOTE_WAKE_MASK); } } dev_dbg(&intf->dev, "%s\n", __func__); /* stop hub_wq and related activity */ hub_quiesce(hub, HUB_SUSPEND); return 0; } /* Report wakeup requests from the ports of a resuming root hub */ static void report_wakeup_requests(struct usb_hub *hub) { struct usb_device *hdev = hub->hdev; struct usb_device *udev; struct usb_hcd *hcd; unsigned long resuming_ports; int i; if (hdev->parent) return; /* Not a root hub */ hcd = bus_to_hcd(hdev->bus); if (hcd->driver->get_resuming_ports) { /* * The get_resuming_ports() method returns a bitmap (origin 0) * of ports which have started wakeup signaling but have not * yet finished resuming. During system resume we will * resume all the enabled ports, regardless of any wakeup * signals, which means the wakeup requests would be lost. * To prevent this, report them to the PM core here. */ resuming_ports = hcd->driver->get_resuming_ports(hcd); for (i = 0; i < hdev->maxchild; ++i) { if (test_bit(i, &resuming_ports)) { udev = hub->ports[i]->child; if (udev) pm_wakeup_event(&udev->dev, 0); } } } } static int hub_resume(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); dev_dbg(&intf->dev, "%s\n", __func__); hub_activate(hub, HUB_RESUME); /* * This should be called only for system resume, not runtime resume. * We can't tell the difference here, so some wakeup requests will be * reported at the wrong time or more than once. This shouldn't * matter much, so long as they do get reported. */ report_wakeup_requests(hub); return 0; } static int hub_reset_resume(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); dev_dbg(&intf->dev, "%s\n", __func__); hub_activate(hub, HUB_RESET_RESUME); return 0; } /** * usb_root_hub_lost_power - called by HCD if the root hub lost Vbus power * @rhdev: struct usb_device for the root hub * * The USB host controller driver calls this function when its root hub * is resumed and Vbus power has been interrupted or the controller * has been reset. The routine marks @rhdev as having lost power. * When the hub driver is resumed it will take notice and carry out * power-session recovery for all the "USB-PERSIST"-enabled child devices; * the others will be disconnected. */ void usb_root_hub_lost_power(struct usb_device *rhdev) { dev_notice(&rhdev->dev, "root hub lost power or was reset\n"); rhdev->reset_resume = 1; } EXPORT_SYMBOL_GPL(usb_root_hub_lost_power); static const char * const usb3_lpm_names[] = { "U0", "U1", "U2", "U3", }; /* * Send a Set SEL control transfer to the device, prior to enabling * device-initiated U1 or U2. This lets the device know the exit latencies from * the time the device initiates a U1 or U2 exit, to the time it will receive a * packet from the host. * * This function will fail if the SEL or PEL values for udev are greater than * the maximum allowed values for the link state to be enabled. */ static int usb_req_set_sel(struct usb_device *udev) { struct usb_set_sel_req *sel_values; unsigned long long u1_sel; unsigned long long u1_pel; unsigned long long u2_sel; unsigned long long u2_pel; int ret; if (!udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable) return 0; /* Convert SEL and PEL stored in ns to us */ u1_sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); u1_pel = DIV_ROUND_UP(udev->u1_params.pel, 1000); u2_sel = DIV_ROUND_UP(udev->u2_params.sel, 1000); u2_pel = DIV_ROUND_UP(udev->u2_params.pel, 1000); /* * Make sure that the calculated SEL and PEL values for the link * state we're enabling aren't bigger than the max SEL/PEL * value that will fit in the SET SEL control transfer. * Otherwise the device would get an incorrect idea of the exit * latency for the link state, and could start a device-initiated * U1/U2 when the exit latencies are too high. */ if (u1_sel > USB3_LPM_MAX_U1_SEL_PEL || u1_pel > USB3_LPM_MAX_U1_SEL_PEL || u2_sel > USB3_LPM_MAX_U2_SEL_PEL || u2_pel > USB3_LPM_MAX_U2_SEL_PEL) { dev_dbg(&udev->dev, "Device-initiated U1/U2 disabled due to long SEL or PEL\n"); return -EINVAL; } /* * usb_enable_lpm() can be called as part of a failed device reset, * which may be initiated by an error path of a mass storage driver. * Therefore, use GFP_NOIO. */ sel_values = kmalloc(sizeof *(sel_values), GFP_NOIO); if (!sel_values) return -ENOMEM; sel_values->u1_sel = u1_sel; sel_values->u1_pel = u1_pel; sel_values->u2_sel = cpu_to_le16(u2_sel); sel_values->u2_pel = cpu_to_le16(u2_pel); ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_SEL, USB_RECIP_DEVICE, 0, 0, sel_values, sizeof *(sel_values), USB_CTRL_SET_TIMEOUT); kfree(sel_values); if (ret > 0) udev->lpm_devinit_allow = 1; return ret; } /* * Enable or disable device-initiated U1 or U2 transitions. */ static int usb_set_device_initiated_lpm(struct usb_device *udev, enum usb3_link_state state, bool enable) { int ret; int feature; switch (state) { case USB3_LPM_U1: feature = USB_DEVICE_U1_ENABLE; break; case USB3_LPM_U2: feature = USB_DEVICE_U2_ENABLE; break; default: dev_warn(&udev->dev, "%s: Can't %s non-U1 or U2 state.\n", __func__, enable ? "enable" : "disable"); return -EINVAL; } if (udev->state != USB_STATE_CONFIGURED) { dev_dbg(&udev->dev, "%s: Can't %s %s state " "for unconfigured device.\n", __func__, enable ? "enable" : "disable", usb3_lpm_names[state]); return 0; } if (enable) { /* * Now send the control transfer to enable device-initiated LPM * for either U1 or U2. */ ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, feature, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } else { ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, feature, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } if (ret < 0) { dev_warn(&udev->dev, "%s of device-initiated %s failed.\n", enable ? "Enable" : "Disable", usb3_lpm_names[state]); return -EBUSY; } return 0; } static int usb_set_lpm_timeout(struct usb_device *udev, enum usb3_link_state state, int timeout) { int ret; int feature; switch (state) { case USB3_LPM_U1: feature = USB_PORT_FEAT_U1_TIMEOUT; break; case USB3_LPM_U2: feature = USB_PORT_FEAT_U2_TIMEOUT; break; default: dev_warn(&udev->dev, "%s: Can't set timeout for non-U1 or U2 state.\n", __func__); return -EINVAL; } if (state == USB3_LPM_U1 && timeout > USB3_LPM_U1_MAX_TIMEOUT && timeout != USB3_LPM_DEVICE_INITIATED) { dev_warn(&udev->dev, "Failed to set %s timeout to 0x%x, " "which is a reserved value.\n", usb3_lpm_names[state], timeout); return -EINVAL; } ret = set_port_feature(udev->parent, USB_PORT_LPM_TIMEOUT(timeout) | udev->portnum, feature); if (ret < 0) { dev_warn(&udev->dev, "Failed to set %s timeout to 0x%x," "error code %i\n", usb3_lpm_names[state], timeout, ret); return -EBUSY; } if (state == USB3_LPM_U1) udev->u1_params.timeout = timeout; else udev->u2_params.timeout = timeout; return 0; } /* * Don't allow device intiated U1/U2 if the system exit latency + one bus * interval is greater than the minimum service interval of any active * periodic endpoint. See USB 3.2 section 9.4.9 */ static bool usb_device_may_initiate_lpm(struct usb_device *udev, enum usb3_link_state state) { unsigned int sel; /* us */ int i, j; if (!udev->lpm_devinit_allow) return false; if (state == USB3_LPM_U1) sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); else if (state == USB3_LPM_U2) sel = DIV_ROUND_UP(udev->u2_params.sel, 1000); else return false; for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { struct usb_interface *intf; struct usb_endpoint_descriptor *desc; unsigned int interval; intf = udev->actconfig->interface[i]; if (!intf) continue; for (j = 0; j < intf->cur_altsetting->desc.bNumEndpoints; j++) { desc = &intf->cur_altsetting->endpoint[j].desc; if (usb_endpoint_xfer_int(desc) || usb_endpoint_xfer_isoc(desc)) { interval = (1 << (desc->bInterval - 1)) * 125; if (sel + 125 > interval) return false; } } } return true; } /* * Enable the hub-initiated U1/U2 idle timeouts, and enable device-initiated * U1/U2 entry. * * We will attempt to enable U1 or U2, but there are no guarantees that the * control transfers to set the hub timeout or enable device-initiated U1/U2 * will be successful. * * If the control transfer to enable device-initiated U1/U2 entry fails, then * hub-initiated U1/U2 will be disabled. * * If we cannot set the parent hub U1/U2 timeout, we attempt to let the xHCI * driver know about it. If that call fails, it should be harmless, and just * take up more slightly more bus bandwidth for unnecessary U1/U2 exit latency. */ static void usb_enable_link_state(struct usb_hcd *hcd, struct usb_device *udev, enum usb3_link_state state) { int timeout; __u8 u1_mel; __le16 u2_mel; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return; u1_mel = udev->bos->ss_cap->bU1devExitLat; u2_mel = udev->bos->ss_cap->bU2DevExitLat; /* If the device says it doesn't have *any* exit latency to come out of * U1 or U2, it's probably lying. Assume it doesn't implement that link * state. */ if ((state == USB3_LPM_U1 && u1_mel == 0) || (state == USB3_LPM_U2 && u2_mel == 0)) return; /* We allow the host controller to set the U1/U2 timeout internally * first, so that it can change its schedule to account for the * additional latency to send data to a device in a lower power * link state. */ timeout = hcd->driver->enable_usb3_lpm_timeout(hcd, udev, state); /* xHCI host controller doesn't want to enable this LPM state. */ if (timeout == 0) return; if (timeout < 0) { dev_warn(&udev->dev, "Could not enable %s link state, " "xHCI error %i.\n", usb3_lpm_names[state], timeout); return; } if (usb_set_lpm_timeout(udev, state, timeout)) { /* If we can't set the parent hub U1/U2 timeout, * device-initiated LPM won't be allowed either, so let the xHCI * host know that this link state won't be enabled. */ hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state); return; } /* Only a configured device will accept the Set Feature * U1/U2_ENABLE */ if (udev->actconfig && usb_device_may_initiate_lpm(udev, state)) { if (usb_set_device_initiated_lpm(udev, state, true)) { /* * Request to enable device initiated U1/U2 failed, * better to turn off lpm in this case. */ usb_set_lpm_timeout(udev, state, 0); hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state); return; } } if (state == USB3_LPM_U1) udev->usb3_lpm_u1_enabled = 1; else if (state == USB3_LPM_U2) udev->usb3_lpm_u2_enabled = 1; } /* * Disable the hub-initiated U1/U2 idle timeouts, and disable device-initiated * U1/U2 entry. * * If this function returns -EBUSY, the parent hub will still allow U1/U2 entry. * If zero is returned, the parent will not allow the link to go into U1/U2. * * If zero is returned, device-initiated U1/U2 entry may still be enabled, but * it won't have an effect on the bus link state because the parent hub will * still disallow device-initiated U1/U2 entry. * * If zero is returned, the xHCI host controller may still think U1/U2 entry is * possible. The result will be slightly more bus bandwidth will be taken up * (to account for U1/U2 exit latency), but it should be harmless. */ static int usb_disable_link_state(struct usb_hcd *hcd, struct usb_device *udev, enum usb3_link_state state) { switch (state) { case USB3_LPM_U1: case USB3_LPM_U2: break; default: dev_warn(&udev->dev, "%s: Can't disable non-U1 or U2 state.\n", __func__); return -EINVAL; } if (usb_set_lpm_timeout(udev, state, 0)) return -EBUSY; usb_set_device_initiated_lpm(udev, state, false); if (hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state)) dev_warn(&udev->dev, "Could not disable xHCI %s timeout, " "bus schedule bandwidth may be impacted.\n", usb3_lpm_names[state]); /* As soon as usb_set_lpm_timeout(0) return 0, hub initiated LPM * is disabled. Hub will disallows link to enter U1/U2 as well, * even device is initiating LPM. Hence LPM is disabled if hub LPM * timeout set to 0, no matter device-initiated LPM is disabled or * not. */ if (state == USB3_LPM_U1) udev->usb3_lpm_u1_enabled = 0; else if (state == USB3_LPM_U2) udev->usb3_lpm_u2_enabled = 0; return 0; } /* * Disable hub-initiated and device-initiated U1 and U2 entry. * Caller must own the bandwidth_mutex. * * This will call usb_enable_lpm() on failure, which will decrement * lpm_disable_count, and will re-enable LPM if lpm_disable_count reaches zero. */ int usb_disable_lpm(struct usb_device *udev) { struct usb_hcd *hcd; if (!udev || !udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable || udev->state < USB_STATE_CONFIGURED) return 0; hcd = bus_to_hcd(udev->bus); if (!hcd || !hcd->driver->disable_usb3_lpm_timeout) return 0; udev->lpm_disable_count++; if ((udev->u1_params.timeout == 0 && udev->u2_params.timeout == 0)) return 0; /* If LPM is enabled, attempt to disable it. */ if (usb_disable_link_state(hcd, udev, USB3_LPM_U1)) goto enable_lpm; if (usb_disable_link_state(hcd, udev, USB3_LPM_U2)) goto enable_lpm; return 0; enable_lpm: usb_enable_lpm(udev); return -EBUSY; } EXPORT_SYMBOL_GPL(usb_disable_lpm); /* Grab the bandwidth_mutex before calling usb_disable_lpm() */ int usb_unlocked_disable_lpm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret; if (!hcd) return -EINVAL; mutex_lock(hcd->bandwidth_mutex); ret = usb_disable_lpm(udev); mutex_unlock(hcd->bandwidth_mutex); return ret; } EXPORT_SYMBOL_GPL(usb_unlocked_disable_lpm); /* * Attempt to enable device-initiated and hub-initiated U1 and U2 entry. The * xHCI host policy may prevent U1 or U2 from being enabled. * * Other callers may have disabled link PM, so U1 and U2 entry will be disabled * until the lpm_disable_count drops to zero. Caller must own the * bandwidth_mutex. */ void usb_enable_lpm(struct usb_device *udev) { struct usb_hcd *hcd; struct usb_hub *hub; struct usb_port *port_dev; if (!udev || !udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable || udev->state < USB_STATE_CONFIGURED) return; udev->lpm_disable_count--; hcd = bus_to_hcd(udev->bus); /* Double check that we can both enable and disable LPM. * Device must be configured to accept set feature U1/U2 timeout. */ if (!hcd || !hcd->driver->enable_usb3_lpm_timeout || !hcd->driver->disable_usb3_lpm_timeout) return; if (udev->lpm_disable_count > 0) return; hub = usb_hub_to_struct_hub(udev->parent); if (!hub) return; port_dev = hub->ports[udev->portnum - 1]; if (port_dev->usb3_lpm_u1_permit) usb_enable_link_state(hcd, udev, USB3_LPM_U1); if (port_dev->usb3_lpm_u2_permit) usb_enable_link_state(hcd, udev, USB3_LPM_U2); } EXPORT_SYMBOL_GPL(usb_enable_lpm); /* Grab the bandwidth_mutex before calling usb_enable_lpm() */ void usb_unlocked_enable_lpm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (!hcd) return; mutex_lock(hcd->bandwidth_mutex); usb_enable_lpm(udev); mutex_unlock(hcd->bandwidth_mutex); } EXPORT_SYMBOL_GPL(usb_unlocked_enable_lpm); /* usb3 devices use U3 for disabled, make sure remote wakeup is disabled */ static void hub_usb3_port_prepare_disable(struct usb_hub *hub, struct usb_port *port_dev) { struct usb_device *udev = port_dev->child; int ret; if (udev && udev->port_is_suspended && udev->do_remote_wakeup) { ret = hub_set_port_link_state(hub, port_dev->portnum, USB_SS_PORT_LS_U0); if (!ret) { msleep(USB_RESUME_TIMEOUT); ret = usb_disable_remote_wakeup(udev); } if (ret) dev_warn(&udev->dev, "Port disable: can't disable remote wake\n"); udev->do_remote_wakeup = 0; } } #else /* CONFIG_PM */ #define hub_suspend NULL #define hub_resume NULL #define hub_reset_resume NULL static inline void hub_usb3_port_prepare_disable(struct usb_hub *hub, struct usb_port *port_dev) { } int usb_disable_lpm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_disable_lpm); void usb_enable_lpm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_enable_lpm); int usb_unlocked_disable_lpm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_unlocked_disable_lpm); void usb_unlocked_enable_lpm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_unlocked_enable_lpm); int usb_disable_ltm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_disable_ltm); void usb_enable_ltm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_enable_ltm); static int hub_handle_remote_wakeup(struct usb_hub *hub, unsigned int port, u16 portstatus, u16 portchange) { return 0; } static int usb_req_set_sel(struct usb_device *udev) { return 0; } #endif /* CONFIG_PM */ /* * USB-3 does not have a similar link state as USB-2 that will avoid negotiating * a connection with a plugged-in cable but will signal the host when the cable * is unplugged. Disable remote wake and set link state to U3 for USB-3 devices */ static int hub_port_disable(struct usb_hub *hub, int port1, int set_state) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *hdev = hub->hdev; int ret = 0; if (!hub->error) { if (hub_is_superspeed(hub->hdev)) { hub_usb3_port_prepare_disable(hub, port_dev); ret = hub_set_port_link_state(hub, port_dev->portnum, USB_SS_PORT_LS_U3); } else { ret = usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_ENABLE); } } if (port_dev->child && set_state) usb_set_device_state(port_dev->child, USB_STATE_NOTATTACHED); if (ret && ret != -ENODEV) dev_err(&port_dev->dev, "cannot disable (err = %d)\n", ret); return ret; } /* * usb_port_disable - disable a usb device's upstream port * @udev: device to disable * Context: @udev locked, must be able to sleep. * * Disables a USB device that isn't in active use. */ int usb_port_disable(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); return hub_port_disable(hub, udev->portnum, 0); } /* USB 2.0 spec, 7.1.7.3 / fig 7-29: * * Between connect detection and reset signaling there must be a delay * of 100ms at least for debounce and power-settling. The corresponding * timer shall restart whenever the downstream port detects a disconnect. * * Apparently there are some bluetooth and irda-dongles and a number of * low-speed devices for which this debounce period may last over a second. * Not covered by the spec - but easy to deal with. * * This implementation uses a 1500ms total debounce timeout; if the * connection isn't stable by then it returns -ETIMEDOUT. It checks * every 25ms for transient disconnects. When the port status has been * unchanged for 100ms it returns the port status. */ int hub_port_debounce(struct usb_hub *hub, int port1, bool must_be_connected) { int ret; u16 portchange, portstatus; unsigned connection = 0xffff; int total_time, stable_time = 0; struct usb_port *port_dev = hub->ports[port1 - 1]; for (total_time = 0; ; total_time += HUB_DEBOUNCE_STEP) { ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (ret < 0) return ret; if (!(portchange & USB_PORT_STAT_C_CONNECTION) && (portstatus & USB_PORT_STAT_CONNECTION) == connection) { if (!must_be_connected || (connection == USB_PORT_STAT_CONNECTION)) stable_time += HUB_DEBOUNCE_STEP; if (stable_time >= HUB_DEBOUNCE_STABLE) break; } else { stable_time = 0; connection = portstatus & USB_PORT_STAT_CONNECTION; } if (portchange & USB_PORT_STAT_C_CONNECTION) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); } if (total_time >= HUB_DEBOUNCE_TIMEOUT) break; msleep(HUB_DEBOUNCE_STEP); } dev_dbg(&port_dev->dev, "debounce total %dms stable %dms status 0x%x\n", total_time, stable_time, portstatus); if (stable_time < HUB_DEBOUNCE_STABLE) return -ETIMEDOUT; return portstatus; } void usb_ep0_reinit(struct usb_device *udev) { usb_disable_endpoint(udev, 0 + USB_DIR_IN, true); usb_disable_endpoint(udev, 0 + USB_DIR_OUT, true); usb_enable_endpoint(udev, &udev->ep0, true); } EXPORT_SYMBOL_GPL(usb_ep0_reinit); #define usb_sndaddr0pipe() (PIPE_CONTROL << 30) #define usb_rcvaddr0pipe() ((PIPE_CONTROL << 30) | USB_DIR_IN) static int hub_set_address(struct usb_device *udev, int devnum) { int retval; unsigned int timeout_ms = USB_CTRL_SET_TIMEOUT; struct usb_hcd *hcd = bus_to_hcd(udev->bus); struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); if (hub->hdev->quirks & USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT) timeout_ms = USB_SHORT_SET_ADDRESS_REQ_TIMEOUT; /* * The host controller will choose the device address, * instead of the core having chosen it earlier */ if (!hcd->driver->address_device && devnum <= 1) return -EINVAL; if (udev->state == USB_STATE_ADDRESS) return 0; if (udev->state != USB_STATE_DEFAULT) return -EINVAL; if (hcd->driver->address_device) retval = hcd->driver->address_device(hcd, udev, timeout_ms); else retval = usb_control_msg(udev, usb_sndaddr0pipe(), USB_REQ_SET_ADDRESS, 0, devnum, 0, NULL, 0, timeout_ms); if (retval == 0) { update_devnum(udev, devnum); /* Device now using proper address. */ usb_set_device_state(udev, USB_STATE_ADDRESS); usb_ep0_reinit(udev); } return retval; } /* * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM * when they're plugged into a USB 2.0 port, but they don't work when LPM is * enabled. * * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the * device says it supports the new USB 2.0 Link PM errata by setting the BESL * support bit in the BOS descriptor. */ static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; if (!udev->usb2_hw_lpm_capable || !udev->bos) return; if (hub) connect_type = hub->ports[udev->portnum - 1]->connect_type; if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { udev->usb2_hw_lpm_allowed = 1; usb_enable_usb2_hardware_lpm(udev); } } static int hub_enable_device(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (!hcd->driver->enable_device) return 0; if (udev->state == USB_STATE_ADDRESS) return 0; if (udev->state != USB_STATE_DEFAULT) return -EINVAL; return hcd->driver->enable_device(hcd, udev); } /* * Get the bMaxPacketSize0 value during initialization by reading the * device's device descriptor. Since we don't already know this value, * the transfer is unsafe and it ignores I/O errors, only testing for * reasonable received values. * * For "old scheme" initialization, size will be 8 so we read just the * start of the device descriptor, which should work okay regardless of * the actual bMaxPacketSize0 value. For "new scheme" initialization, * size will be 64 (and buf will point to a sufficiently large buffer), * which might not be kosher according to the USB spec but it's what * Windows does and what many devices expect. * * Returns: bMaxPacketSize0 or a negative error code. */ static int get_bMaxPacketSize0(struct usb_device *udev, struct usb_device_descriptor *buf, int size, bool first_time) { int i, rc; /* * Retry on all errors; some devices are flakey. * 255 is for WUSB devices, we actually need to use * 512 (WUSB1.0[4.8.1]). */ for (i = 0; i < GET_MAXPACKET0_TRIES; ++i) { /* Start with invalid values in case the transfer fails */ buf->bDescriptorType = buf->bMaxPacketSize0 = 0; rc = usb_control_msg(udev, usb_rcvaddr0pipe(), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN, USB_DT_DEVICE << 8, 0, buf, size, initial_descriptor_timeout); switch (buf->bMaxPacketSize0) { case 8: case 16: case 32: case 64: case 9: if (buf->bDescriptorType == USB_DT_DEVICE) { rc = buf->bMaxPacketSize0; break; } fallthrough; default: if (rc >= 0) rc = -EPROTO; break; } /* * Some devices time out if they are powered on * when already connected. They need a second * reset, so return early. But only on the first * attempt, lest we get into a time-out/reset loop. */ if (rc > 0 || (rc == -ETIMEDOUT && first_time && udev->speed > USB_SPEED_FULL)) break; } return rc; } #define GET_DESCRIPTOR_BUFSIZE 64 /* Reset device, (re)assign address, get device descriptor. * Device connection must be stable, no more debouncing needed. * Returns device in USB_STATE_ADDRESS, except on error. * * If this is called for an already-existing device (as part of * usb_reset_and_verify_device), the caller must own the device lock and * the port lock. For a newly detected device that is not accessible * through any global pointers, it's not necessary to lock the device, * but it is still necessary to lock the port. * * For a newly detected device, @dev_descr must be NULL. The device * descriptor retrieved from the device will then be stored in * @udev->descriptor. For an already existing device, @dev_descr * must be non-NULL. The device descriptor will be stored there, * not in @udev->descriptor, because descriptors for registered * devices are meant to be immutable. */ static int hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, int retry_counter, struct usb_device_descriptor *dev_descr) { struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd = bus_to_hcd(hdev->bus); struct usb_port *port_dev = hub->ports[port1 - 1]; int retries, operations, retval, i; unsigned delay = HUB_SHORT_RESET_TIME; enum usb_device_speed oldspeed = udev->speed; const char *speed; int devnum = udev->devnum; const char *driver_name; bool do_new_scheme; const bool initial = !dev_descr; int maxp0; struct usb_device_descriptor *buf, *descr; buf = kmalloc(GET_DESCRIPTOR_BUFSIZE, GFP_NOIO); if (!buf) return -ENOMEM; /* root hub ports have a slightly longer reset period * (from USB 2.0 spec, section 7.1.7.5) */ if (!hdev->parent) { delay = HUB_ROOT_RESET_TIME; if (port1 == hdev->bus->otg_port) hdev->bus->b_hnp_enable = 0; } /* Some low speed devices have problems with the quick delay, so */ /* be a bit pessimistic with those devices. RHbug #23670 */ if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ retval = hub_port_reset(hub, port1, udev, delay, false); if (retval < 0) /* error or disconnect */ goto fail; /* success, speed is known */ retval = -ENODEV; /* Don't allow speed changes at reset, except usb 3.0 to faster */ if (oldspeed != USB_SPEED_UNKNOWN && oldspeed != udev->speed && !(oldspeed == USB_SPEED_SUPER && udev->speed > oldspeed)) { dev_dbg(&udev->dev, "device reset changed speed!\n"); goto fail; } oldspeed = udev->speed; if (initial) { /* USB 2.0 section 5.5.3 talks about ep0 maxpacket ... * it's fixed size except for full speed devices. */ switch (udev->speed) { case USB_SPEED_SUPER_PLUS: case USB_SPEED_SUPER: udev->ep0.desc.wMaxPacketSize = cpu_to_le16(512); break; case USB_SPEED_HIGH: /* fixed at 64 */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(64); break; case USB_SPEED_FULL: /* 8, 16, 32, or 64 */ /* to determine the ep0 maxpacket size, try to read * the device descriptor to get bMaxPacketSize0 and * then correct our initial guess. */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(64); break; case USB_SPEED_LOW: /* fixed at 8 */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(8); break; default: goto fail; } } speed = usb_speed_string(udev->speed); /* * The controller driver may be NULL if the controller device * is the middle device between platform device and roothub. * This middle device may not need a device driver due to * all hardware control can be at platform device driver, this * platform device is usually a dual-role USB controller device. */ if (udev->bus->controller->driver) driver_name = udev->bus->controller->driver->name; else driver_name = udev->bus->sysdev->driver->name; if (udev->speed < USB_SPEED_SUPER) dev_info(&udev->dev, "%s %s USB device number %d using %s\n", (initial ? "new" : "reset"), speed, devnum, driver_name); if (initial) { /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; udev->ttport = hdev->ttport; } else if (udev->speed != USB_SPEED_HIGH && hdev->speed == USB_SPEED_HIGH) { if (!hub->tt.hub) { dev_err(&udev->dev, "parent hub has no TT\n"); retval = -EINVAL; goto fail; } udev->tt = &hub->tt; udev->ttport = port1; } } /* Why interleave GET_DESCRIPTOR and SET_ADDRESS this way? * Because device hardware and firmware is sometimes buggy in * this area, and this is how Linux has done it for ages. * Change it cautiously. * * NOTE: If use_new_scheme() is true we will start by issuing * a 64-byte GET_DESCRIPTOR request. This is what Windows does, * so it may help with some non-standards-compliant devices. * Otherwise we start with SET_ADDRESS and then try to read the * first 8 bytes of the device descriptor to get the ep0 maxpacket * value. */ do_new_scheme = use_new_scheme(udev, retry_counter, port_dev); for (retries = 0; retries < GET_DESCRIPTOR_TRIES; (++retries, msleep(100))) { if (hub_port_stop_enumerate(hub, port1, retries)) { retval = -ENODEV; break; } if (do_new_scheme) { retval = hub_enable_device(udev); if (retval < 0) { dev_err(&udev->dev, "hub failed to enable device, error %d\n", retval); goto fail; } maxp0 = get_bMaxPacketSize0(udev, buf, GET_DESCRIPTOR_BUFSIZE, retries == 0); if (maxp0 > 0 && !initial && maxp0 != udev->descriptor.bMaxPacketSize0) { dev_err(&udev->dev, "device reset changed ep0 maxpacket size!\n"); retval = -ENODEV; goto fail; } retval = hub_port_reset(hub, port1, udev, delay, false); if (retval < 0) /* error or disconnect */ goto fail; if (oldspeed != udev->speed) { dev_dbg(&udev->dev, "device reset changed speed!\n"); retval = -ENODEV; goto fail; } if (maxp0 < 0) { if (maxp0 != -ENODEV) dev_err(&udev->dev, "device descriptor read/64, error %d\n", maxp0); retval = maxp0; continue; } } for (operations = 0; operations < SET_ADDRESS_TRIES; ++operations) { retval = hub_set_address(udev, devnum); if (retval >= 0) break; msleep(200); } if (retval < 0) { if (retval != -ENODEV) dev_err(&udev->dev, "device not accepting address %d, error %d\n", devnum, retval); goto fail; } if (udev->speed >= USB_SPEED_SUPER) { devnum = udev->devnum; dev_info(&udev->dev, "%s SuperSpeed%s%s USB device number %d using %s\n", (udev->config) ? "reset" : "new", (udev->speed == USB_SPEED_SUPER_PLUS) ? " Plus" : "", (udev->ssp_rate == USB_SSP_GEN_2x2) ? " Gen 2x2" : (udev->ssp_rate == USB_SSP_GEN_2x1) ? " Gen 2x1" : (udev->ssp_rate == USB_SSP_GEN_1x2) ? " Gen 1x2" : "", devnum, driver_name); } /* * cope with hardware quirkiness: * - let SET_ADDRESS settle, some device hardware wants it * - read ep0 maxpacket even for high and low speed, */ msleep(10); if (do_new_scheme) break; maxp0 = get_bMaxPacketSize0(udev, buf, 8, retries == 0); if (maxp0 < 0) { retval = maxp0; if (retval != -ENODEV) dev_err(&udev->dev, "device descriptor read/8, error %d\n", retval); } else { u32 delay; if (!initial && maxp0 != udev->descriptor.bMaxPacketSize0) { dev_err(&udev->dev, "device reset changed ep0 maxpacket size!\n"); retval = -ENODEV; goto fail; } delay = udev->parent->hub_delay; udev->hub_delay = min_t(u32, delay, USB_TP_TRANSMISSION_DELAY_MAX); retval = usb_set_isoch_delay(udev); if (retval) { dev_dbg(&udev->dev, "Failed set isoch delay, error %d\n", retval); retval = 0; } break; } } if (retval) goto fail; /* * Check the ep0 maxpacket guess and correct it if necessary. * maxp0 is the value stored in the device descriptor; * i is the value it encodes (logarithmic for SuperSpeed or greater). */ i = maxp0; if (udev->speed >= USB_SPEED_SUPER) { if (maxp0 <= 16) i = 1 << maxp0; else i = 0; /* Invalid */ } if (usb_endpoint_maxp(&udev->ep0.desc) == i) { ; /* Initial ep0 maxpacket guess is right */ } else if (((udev->speed == USB_SPEED_FULL || udev->speed == USB_SPEED_HIGH) && (i == 8 || i == 16 || i == 32 || i == 64)) || (udev->speed >= USB_SPEED_SUPER && i > 0)) { /* Initial guess is wrong; use the descriptor's value */ if (udev->speed == USB_SPEED_FULL) dev_dbg(&udev->dev, "ep0 maxpacket = %d\n", i); else dev_warn(&udev->dev, "Using ep0 maxpacket: %d\n", i); udev->ep0.desc.wMaxPacketSize = cpu_to_le16(i); usb_ep0_reinit(udev); } else { /* Initial guess is wrong and descriptor's value is invalid */ dev_err(&udev->dev, "Invalid ep0 maxpacket: %d\n", maxp0); retval = -EMSGSIZE; goto fail; } descr = usb_get_device_descriptor(udev); if (IS_ERR(descr)) { retval = PTR_ERR(descr); if (retval != -ENODEV) dev_err(&udev->dev, "device descriptor read/all, error %d\n", retval); goto fail; } if (initial) udev->descriptor = *descr; else *dev_descr = *descr; kfree(descr); /* * Some superspeed devices have finished the link training process * and attached to a superspeed hub port, but the device descriptor * got from those devices show they aren't superspeed devices. Warm * reset the port attached by the devices can fix them. */ if ((udev->speed >= USB_SPEED_SUPER) && (le16_to_cpu(udev->descriptor.bcdUSB) < 0x0300)) { dev_err(&udev->dev, "got a wrong device descriptor, warm reset device\n"); hub_port_reset(hub, port1, udev, HUB_BH_RESET_TIME, true); retval = -EINVAL; goto fail; } usb_detect_quirks(udev); if (le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { retval = usb_get_bos_descriptor(udev); if (!retval) { udev->lpm_capable = usb_device_supports_lpm(udev); udev->lpm_disable_count = 1; usb_set_lpm_parameters(udev); usb_req_set_sel(udev); } } retval = 0; /* notify HCD that we have a device connected and addressed */ if (hcd->driver->update_device) hcd->driver->update_device(hcd, udev); hub_set_initial_usb2_lpm_policy(udev); fail: if (retval) { hub_port_disable(hub, port1, 0); update_devnum(udev, devnum); /* for disconnect processing */ } kfree(buf); return retval; } static void check_highspeed(struct usb_hub *hub, struct usb_device *udev, int port1) { struct usb_qualifier_descriptor *qual; int status; if (udev->quirks & USB_QUIRK_DEVICE_QUALIFIER) return; qual = kmalloc(sizeof *qual, GFP_KERNEL); if (qual == NULL) return; status = usb_get_descriptor(udev, USB_DT_DEVICE_QUALIFIER, 0, qual, sizeof *qual); if (status == sizeof *qual) { dev_info(&udev->dev, "not running at top speed; " "connect to a high speed hub\n"); /* hub LEDs are probably harder to miss than syslog */ if (hub->has_indicators) { hub->indicator[port1-1] = INDICATOR_GREEN_BLINK; queue_delayed_work(system_power_efficient_wq, &hub->leds, 0); } } kfree(qual); } static unsigned hub_power_remaining(struct usb_hub *hub) { struct usb_device *hdev = hub->hdev; int remaining; int port1; if (!hub->limited_power) return 0; remaining = hdev->bus_mA - hub->descriptor->bHubContrCurrent; for (port1 = 1; port1 <= hdev->maxchild; ++port1) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; unsigned unit_load; int delta; if (!udev) continue; if (hub_is_superspeed(udev)) unit_load = 150; else unit_load = 100; /* * Unconfigured devices may not use more than one unit load, * or 8mA for OTG ports */ if (udev->actconfig) delta = usb_get_max_power(udev, udev->actconfig); else if (port1 != udev->bus->otg_port || hdev->parent) delta = unit_load; else delta = 8; if (delta > hub->mA_per_port) dev_warn(&port_dev->dev, "%dmA is over %umA budget!\n", delta, hub->mA_per_port); remaining -= delta; } if (remaining < 0) { dev_warn(hub->intfdev, "%dmA over power budget!\n", -remaining); remaining = 0; } return remaining; } static int descriptors_changed(struct usb_device *udev, struct usb_device_descriptor *new_device_descriptor, struct usb_host_bos *old_bos) { int changed = 0; unsigned index; unsigned serial_len = 0; unsigned len; unsigned old_length; int length; char *buf; if (memcmp(&udev->descriptor, new_device_descriptor, sizeof(*new_device_descriptor)) != 0) return 1; if ((old_bos && !udev->bos) || (!old_bos && udev->bos)) return 1; if (udev->bos) { len = le16_to_cpu(udev->bos->desc->wTotalLength); if (len != le16_to_cpu(old_bos->desc->wTotalLength)) return 1; if (memcmp(udev->bos->desc, old_bos->desc, len)) return 1; } /* Since the idVendor, idProduct, and bcdDevice values in the * device descriptor haven't changed, we will assume the * Manufacturer and Product strings haven't changed either. * But the SerialNumber string could be different (e.g., a * different flash card of the same brand). */ if (udev->serial) serial_len = strlen(udev->serial) + 1; len = serial_len; for (index = 0; index < udev->descriptor.bNumConfigurations; index++) { old_length = le16_to_cpu(udev->config[index].desc.wTotalLength); len = max(len, old_length); } buf = kmalloc(len, GFP_NOIO); if (!buf) /* assume the worst */ return 1; for (index = 0; index < udev->descriptor.bNumConfigurations; index++) { old_length = le16_to_cpu(udev->config[index].desc.wTotalLength); length = usb_get_descriptor(udev, USB_DT_CONFIG, index, buf, old_length); if (length != old_length) { dev_dbg(&udev->dev, "config index %d, error %d\n", index, length); changed = 1; break; } if (memcmp(buf, udev->rawdescriptors[index], old_length) != 0) { dev_dbg(&udev->dev, "config index %d changed (#%d)\n", index, ((struct usb_config_descriptor *) buf)-> bConfigurationValue); changed = 1; break; } } if (!changed && serial_len) { length = usb_string(udev, udev->descriptor.iSerialNumber, buf, serial_len); if (length + 1 != serial_len) { dev_dbg(&udev->dev, "serial string error %d\n", length); changed = 1; } else if (memcmp(buf, udev->serial, length) != 0) { dev_dbg(&udev->dev, "serial string changed\n"); changed = 1; } } kfree(buf); return changed; } static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, u16 portchange) { int status = -ENODEV; int i; unsigned unit_load; struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd = bus_to_hcd(hdev->bus); struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; static int unreliable_port = -1; bool retry_locked; /* Disconnect any existing devices under this port */ if (udev) { if (hcd->usb_phy && !hdev->parent) usb_phy_notify_disconnect(hcd->usb_phy, udev->speed); usb_disconnect(&port_dev->child); } /* We can forget about a "removed" device when there's a physical * disconnect or the connect status changes. */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION)) clear_bit(port1, hub->removed_bits); if (portchange & (USB_PORT_STAT_C_CONNECTION | USB_PORT_STAT_C_ENABLE)) { status = hub_port_debounce_be_stable(hub, port1); if (status < 0) { if (status != -ENODEV && port1 != unreliable_port && printk_ratelimit()) dev_err(&port_dev->dev, "connect-debounce failed\n"); portstatus &= ~USB_PORT_STAT_CONNECTION; unreliable_port = port1; } else { portstatus = status; } } /* Return now if debouncing failed or nothing is connected or * the device was "removed". */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || test_bit(port1, hub->removed_bits)) { /* * maybe switch power back on (e.g. root hub was reset) * but only if the port isn't owned by someone else. */ if (hub_is_port_power_switchable(hub) && !usb_port_is_power_on(hub, portstatus) && !port_dev->port_owner) set_port_feature(hdev, port1, USB_PORT_FEAT_POWER); if (portstatus & USB_PORT_STAT_ENABLE) goto done; return; } if (hub_is_superspeed(hub->hdev)) unit_load = 150; else unit_load = 100; status = 0; for (i = 0; i < PORT_INIT_TRIES; i++) { if (hub_port_stop_enumerate(hub, port1, i)) { status = -ENODEV; break; } usb_lock_port(port_dev); mutex_lock(hcd->address0_mutex); retry_locked = true; /* reallocate for each attempt, since references * to the previous one can escape in various ways */ udev = usb_alloc_dev(hdev, hdev->bus, port1); if (!udev) { dev_err(&port_dev->dev, "couldn't allocate usb_device\n"); mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); goto done; } usb_set_device_state(udev, USB_STATE_POWERED); udev->bus_mA = hub->mA_per_port; udev->level = hdev->level + 1; /* Devices connected to SuperSpeed hubs are USB 3.0 or later */ if (hub_is_superspeed(hub->hdev)) udev->speed = USB_SPEED_SUPER; else udev->speed = USB_SPEED_UNKNOWN; choose_devnum(udev); if (udev->devnum <= 0) { status = -ENOTCONN; /* Don't retry */ goto loop; } /* reset (non-USB 3.0 devices) and get descriptor */ status = hub_port_init(hub, udev, port1, i, NULL); if (status < 0) goto loop; mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); retry_locked = false; if (udev->quirks & USB_QUIRK_DELAY_INIT) msleep(2000); /* consecutive bus-powered hubs aren't reliable; they can * violate the voltage drop budget. if the new child has * a "powered" LED, users should notice we didn't enable it * (without reading syslog), even without per-port LEDs * on the parent. */ if (udev->descriptor.bDeviceClass == USB_CLASS_HUB && udev->bus_mA <= unit_load) { u16 devstat; status = usb_get_std_status(udev, USB_RECIP_DEVICE, 0, &devstat); if (status) { dev_dbg(&udev->dev, "get status %d ?\n", status); goto loop_disable; } if ((devstat & (1 << USB_DEVICE_SELF_POWERED)) == 0) { dev_err(&udev->dev, "can't connect bus-powered hub " "to this port\n"); if (hub->has_indicators) { hub->indicator[port1-1] = INDICATOR_AMBER_BLINK; queue_delayed_work( system_power_efficient_wq, &hub->leds, 0); } status = -ENOTCONN; /* Don't retry */ goto loop_disable; } } /* check for devices running slower than they could */ if (le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0200 && udev->speed == USB_SPEED_FULL && highspeed_hubs != 0) check_highspeed(hub, udev, port1); /* Store the parent's children[] pointer. At this point * udev becomes globally accessible, although presumably * no one will look at it until hdev is unlocked. */ status = 0; mutex_lock(&usb_port_peer_mutex); /* We mustn't add new devices if the parent hub has * been disconnected; we would race with the * recursively_mark_NOTATTACHED() routine. */ spin_lock_irq(&device_state_lock); if (hdev->state == USB_STATE_NOTATTACHED) status = -ENOTCONN; else port_dev->child = udev; spin_unlock_irq(&device_state_lock); mutex_unlock(&usb_port_peer_mutex); /* Run it through the hoops (find a driver, etc) */ if (!status) { status = usb_new_device(udev); if (status) { mutex_lock(&usb_port_peer_mutex); spin_lock_irq(&device_state_lock); port_dev->child = NULL; spin_unlock_irq(&device_state_lock); mutex_unlock(&usb_port_peer_mutex); } else { if (hcd->usb_phy && !hdev->parent) usb_phy_notify_connect(hcd->usb_phy, udev->speed); } } if (status) goto loop_disable; status = hub_power_remaining(hub); if (status) dev_dbg(hub->intfdev, "%dmA power budget left\n", status); return; loop_disable: hub_port_disable(hub, port1, 1); loop: usb_ep0_reinit(udev); release_devnum(udev); hub_free_dev(udev); if (retry_locked) { mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); } usb_put_dev(udev); if ((status == -ENOTCONN) || (status == -ENOTSUPP)) break; /* When halfway through our retry count, power-cycle the port */ if (i == (PORT_INIT_TRIES - 1) / 2) { dev_info(&port_dev->dev, "attempt power cycle\n"); usb_hub_set_port_power(hdev, hub, port1, false); msleep(2 * hub_power_on_good_delay(hub)); usb_hub_set_port_power(hdev, hub, port1, true); msleep(hub_power_on_good_delay(hub)); } } if (hub->hdev->parent || !hcd->driver->port_handed_over || !(hcd->driver->port_handed_over)(hcd, port1)) { if (status != -ENOTCONN && status != -ENODEV) dev_err(&port_dev->dev, "unable to enumerate USB device\n"); } done: hub_port_disable(hub, port1, 1); if (hcd->driver->relinquish_port && !hub->hdev->parent) { if (status != -ENOTCONN && status != -ENODEV) hcd->driver->relinquish_port(hcd, port1); } } /* Handle physical or logical connection change events. * This routine is called when: * a port connection-change occurs; * a port enable-change occurs (often caused by EMI); * usb_reset_and_verify_device() encounters changed descriptors (as from * a firmware download) * caller already locked the hub */ static void hub_port_connect_change(struct usb_hub *hub, int port1, u16 portstatus, u16 portchange) __must_hold(&port_dev->status_lock) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; struct usb_device_descriptor *descr; int status = -ENODEV; dev_dbg(&port_dev->dev, "status %04x, change %04x, %s\n", portstatus, portchange, portspeed(hub, portstatus)); if (hub->has_indicators) { set_port_led(hub, port1, HUB_LED_AUTO); hub->indicator[port1-1] = INDICATOR_AUTO; } #ifdef CONFIG_USB_OTG /* during HNP, don't repeat the debounce */ if (hub->hdev->bus->is_b_host) portchange &= ~(USB_PORT_STAT_C_CONNECTION | USB_PORT_STAT_C_ENABLE); #endif /* Try to resuscitate an existing device */ if ((portstatus & USB_PORT_STAT_CONNECTION) && udev && udev->state != USB_STATE_NOTATTACHED) { if (portstatus & USB_PORT_STAT_ENABLE) { /* * USB-3 connections are initialized automatically by * the hostcontroller hardware. Therefore check for * changed device descriptors before resuscitating the * device. */ descr = usb_get_device_descriptor(udev); if (IS_ERR(descr)) { dev_dbg(&udev->dev, "can't read device descriptor %ld\n", PTR_ERR(descr)); } else { if (descriptors_changed(udev, descr, udev->bos)) { dev_dbg(&udev->dev, "device descriptor has changed\n"); } else { status = 0; /* Nothing to do */ } kfree(descr); } #ifdef CONFIG_PM } else if (udev->state == USB_STATE_SUSPENDED && udev->persist_enabled) { /* For a suspended device, treat this as a * remote wakeup event. */ usb_unlock_port(port_dev); status = usb_remote_wakeup(udev); usb_lock_port(port_dev); #endif } else { /* Don't resuscitate */; } } clear_bit(port1, hub->change_bits); /* successfully revalidated the connection */ if (status == 0) return; usb_unlock_port(port_dev); hub_port_connect(hub, port1, portstatus, portchange); usb_lock_port(port_dev); } /* Handle notifying userspace about hub over-current events */ static void port_over_current_notify(struct usb_port *port_dev) { char *envp[3] = { NULL, NULL, NULL }; struct device *hub_dev; char *port_dev_path; sysfs_notify(&port_dev->dev.kobj, NULL, "over_current_count"); hub_dev = port_dev->dev.parent; if (!hub_dev) return; port_dev_path = kobject_get_path(&port_dev->dev.kobj, GFP_KERNEL); if (!port_dev_path) return; envp[0] = kasprintf(GFP_KERNEL, "OVER_CURRENT_PORT=%s", port_dev_path); if (!envp[0]) goto exit; envp[1] = kasprintf(GFP_KERNEL, "OVER_CURRENT_COUNT=%u", port_dev->over_current_count); if (!envp[1]) goto exit; kobject_uevent_env(&hub_dev->kobj, KOBJ_CHANGE, envp); exit: kfree(envp[1]); kfree(envp[0]); kfree(port_dev_path); } static void port_event(struct usb_hub *hub, int port1) __must_hold(&port_dev->status_lock) { int connect_change; struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; struct usb_device *hdev = hub->hdev; u16 portstatus, portchange; int i = 0; connect_change = test_bit(port1, hub->change_bits); clear_bit(port1, hub->event_bits); clear_bit(port1, hub->wakeup_bits); if (usb_hub_port_status(hub, port1, &portstatus, &portchange) < 0) return; if (portchange & USB_PORT_STAT_C_CONNECTION) { usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_CONNECTION); connect_change = 1; } if (portchange & USB_PORT_STAT_C_ENABLE) { if (!connect_change) dev_dbg(&port_dev->dev, "enable change, status %08x\n", portstatus); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_ENABLE); /* * EM interference sometimes causes badly shielded USB devices * to be shutdown by the hub, this hack enables them again. * Works at least with mouse driver. */ if (!(portstatus & USB_PORT_STAT_ENABLE) && !connect_change && udev) { dev_err(&port_dev->dev, "disabled by hub (EMI?), re-enabling...\n"); connect_change = 1; } } if (portchange & USB_PORT_STAT_C_OVERCURRENT) { u16 status = 0, unused; port_dev->over_current_count++; port_over_current_notify(port_dev); dev_dbg(&port_dev->dev, "over-current change #%u\n", port_dev->over_current_count); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_OVER_CURRENT); msleep(100); /* Cool down */ hub_power_on(hub, true); usb_hub_port_status(hub, port1, &status, &unused); if (status & USB_PORT_STAT_OVERCURRENT) dev_err(&port_dev->dev, "over-current condition\n"); } if (portchange & USB_PORT_STAT_C_RESET) { dev_dbg(&port_dev->dev, "reset change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_RESET); } if ((portchange & USB_PORT_STAT_C_BH_RESET) && hub_is_superspeed(hdev)) { dev_dbg(&port_dev->dev, "warm reset change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); } if (portchange & USB_PORT_STAT_C_LINK_STATE) { dev_dbg(&port_dev->dev, "link state change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); } if (portchange & USB_PORT_STAT_C_CONFIG_ERROR) { dev_warn(&port_dev->dev, "config error\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_PORT_CONFIG_ERROR); } /* skip port actions that require the port to be powered on */ if (!pm_runtime_active(&port_dev->dev)) return; /* skip port actions if ignore_event and early_stop are true */ if (port_dev->ignore_event && port_dev->early_stop) return; if (hub_handle_remote_wakeup(hub, port1, portstatus, portchange)) connect_change = 1; /* * Avoid trying to recover a USB3 SS.Inactive port with a warm reset if * the device was disconnected. A 12ms disconnect detect timer in * SS.Inactive state transitions the port to RxDetect automatically. * SS.Inactive link error state is common during device disconnect. */ while (hub_port_warm_reset_required(hub, port1, portstatus)) { if ((i++ < DETECT_DISCONNECT_TRIES) && udev) { u16 unused; msleep(20); usb_hub_port_status(hub, port1, &portstatus, &unused); dev_dbg(&port_dev->dev, "Wait for inactive link disconnect detect\n"); continue; } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); if (hub_port_reset(hub, port1, NULL, HUB_BH_RESET_TIME, true) < 0) hub_port_disable(hub, port1, 1); } else { dev_dbg(&port_dev->dev, "do warm reset, full device\n"); usb_unlock_port(port_dev); usb_lock_device(udev); usb_reset_device(udev); usb_unlock_device(udev); usb_lock_port(port_dev); connect_change = 0; } break; } if (connect_change) hub_port_connect_change(hub, port1, portstatus, portchange); } static void hub_event(struct work_struct *work) { struct usb_device *hdev; struct usb_interface *intf; struct usb_hub *hub; struct device *hub_dev; u16 hubstatus; u16 hubchange; int i, ret; hub = container_of(work, struct usb_hub, events); hdev = hub->hdev; hub_dev = hub->intfdev; intf = to_usb_interface(hub_dev); kcov_remote_start_usb((u64)hdev->bus->busnum); dev_dbg(hub_dev, "state %d ports %d chg %04x evt %04x\n", hdev->state, hdev->maxchild, /* NOTE: expects max 15 ports... */ (u16) hub->change_bits[0], (u16) hub->event_bits[0]); /* Lock the device, then check to see if we were * disconnected while waiting for the lock to succeed. */ usb_lock_device(hdev); if (unlikely(hub->disconnected)) goto out_hdev_lock; /* If the hub has died, clean up after it */ if (hdev->state == USB_STATE_NOTATTACHED) { hub->error = -ENODEV; hub_quiesce(hub, HUB_DISCONNECT); goto out_hdev_lock; } /* Autoresume */ ret = usb_autopm_get_interface(intf); if (ret) { dev_dbg(hub_dev, "Can't autoresume: %d\n", ret); goto out_hdev_lock; } /* If this is an inactive hub, do nothing */ if (hub->quiescing) goto out_autopm; if (hub->error) { dev_dbg(hub_dev, "resetting for error %d\n", hub->error); ret = usb_reset_device(hdev); if (ret) { dev_dbg(hub_dev, "error resetting hub: %d\n", ret); goto out_autopm; } hub->nerrors = 0; hub->error = 0; } /* deal with port status changes */ for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; if (test_bit(i, hub->event_bits) || test_bit(i, hub->change_bits) || test_bit(i, hub->wakeup_bits)) { /* * The get_noresume and barrier ensure that if * the port was in the process of resuming, we * flush that work and keep the port active for * the duration of the port_event(). However, * if the port is runtime pm suspended * (powered-off), we leave it in that state, run * an abbreviated port_event(), and move on. */ pm_runtime_get_noresume(&port_dev->dev); pm_runtime_barrier(&port_dev->dev); usb_lock_port(port_dev); port_event(hub, i); usb_unlock_port(port_dev); pm_runtime_put_sync(&port_dev->dev); } } /* deal with hub status changes */ if (test_and_clear_bit(0, hub->event_bits) == 0) ; /* do nothing */ else if (hub_hub_status(hub, &hubstatus, &hubchange) < 0) dev_err(hub_dev, "get_hub_status failed\n"); else { if (hubchange & HUB_CHANGE_LOCAL_POWER) { dev_dbg(hub_dev, "power change\n"); clear_hub_feature(hdev, C_HUB_LOCAL_POWER); if (hubstatus & HUB_STATUS_LOCAL_POWER) /* FIXME: Is this always true? */ hub->limited_power = 1; else hub->limited_power = 0; } if (hubchange & HUB_CHANGE_OVERCURRENT) { u16 status = 0; u16 unused; dev_dbg(hub_dev, "over-current change\n"); clear_hub_feature(hdev, C_HUB_OVER_CURRENT); msleep(500); /* Cool down */ hub_power_on(hub, true); hub_hub_status(hub, &status, &unused); if (status & HUB_STATUS_OVERCURRENT) dev_err(hub_dev, "over-current condition\n"); } } out_autopm: /* Balance the usb_autopm_get_interface() above */ usb_autopm_put_interface_no_suspend(intf); out_hdev_lock: usb_unlock_device(hdev); /* Balance the stuff in kick_hub_wq() and allow autosuspend */ usb_autopm_put_interface(intf); hub_put(hub); kcov_remote_stop(); } static const struct usb_device_id hub_id_table[] = { { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT | USB_DEVICE_ID_MATCH_INT_CLASS, .idVendor = USB_VENDOR_SMSC, .idProduct = USB_PRODUCT_USB5534B, .bInterfaceClass = USB_CLASS_HUB, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_CYPRESS, .idProduct = USB_PRODUCT_CY7C65632, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_INT_CLASS, .idVendor = USB_VENDOR_GENESYS_LOGIC, .bInterfaceClass = USB_CLASS_HUB, .driver_info = HUB_QUIRK_CHECK_PORT_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_TEXAS_INSTRUMENTS, .idProduct = USB_PRODUCT_TUSB8041_USB2, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_TEXAS_INSTRUMENTS, .idProduct = USB_PRODUCT_TUSB8041_USB3, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4913, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4914, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4915, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_DEV_CLASS, .bDeviceClass = USB_CLASS_HUB}, { .match_flags = USB_DEVICE_ID_MATCH_INT_CLASS, .bInterfaceClass = USB_CLASS_HUB}, { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, hub_id_table); static struct usb_driver hub_driver = { .name = "hub", .probe = hub_probe, .disconnect = hub_disconnect, .suspend = hub_suspend, .resume = hub_resume, .reset_resume = hub_reset_resume, .pre_reset = hub_pre_reset, .post_reset = hub_post_reset, .unlocked_ioctl = hub_ioctl, .id_table = hub_id_table, .supports_autosuspend = 1, }; int usb_hub_init(void) { if (usb_register(&hub_driver) < 0) { printk(KERN_ERR "%s: can't register hub driver\n", usbcore_name); return -1; } /* * The workqueue needs to be freezable to avoid interfering with * USB-PERSIST port handover. Otherwise it might see that a full-speed * device was gone before the EHCI controller had handed its port * over to the companion full-speed controller. */ hub_wq = alloc_workqueue("usb_hub_wq", WQ_FREEZABLE, 0); if (hub_wq) return 0; /* Fall through if kernel_thread failed */ usb_deregister(&hub_driver); pr_err("%s: can't allocate workqueue for usb hub\n", usbcore_name); return -1; } void usb_hub_cleanup(void) { destroy_workqueue(hub_wq); /* * Hub resources are freed for us by usb_deregister. It calls * usb_driver_purge on every device which in turn calls that * devices disconnect function if it is using this driver. * The hub_disconnect function takes care of releasing the * individual hub resources. -greg */ usb_deregister(&hub_driver); } /* usb_hub_cleanup() */ /** * usb_reset_and_verify_device - perform a USB port reset to reinitialize a device * @udev: device to reset (not in SUSPENDED or NOTATTACHED state) * * WARNING - don't use this routine to reset a composite device * (one with multiple interfaces owned by separate drivers)! * Use usb_reset_device() instead. * * Do a port reset, reassign the device's address, and establish its * former operating configuration. If the reset fails, or the device's * descriptors change from their values before the reset, or the original * configuration and altsettings cannot be restored, a flag will be set * telling hub_wq to pretend the device has been disconnected and then * re-connected. All drivers will be unbound, and the device will be * re-enumerated and probed all over again. * * Return: 0 if the reset succeeded, -ENODEV if the device has been * flagged for logical disconnection, or some other negative error code * if the reset wasn't even attempted. * * Note: * The caller must own the device lock and the port lock, the latter is * taken by usb_reset_device(). For example, it's safe to use * usb_reset_device() from a driver probe() routine after downloading * new firmware. For calls that might not occur during probe(), drivers * should lock the device using usb_lock_device_for_reset(). * * Locking exception: This routine may also be called from within an * autoresume handler. Such usage won't conflict with other tasks * holding the device lock because these tasks should always call * usb_autopm_resume_device(), thereby preventing any unwanted * autoresume. The autoresume handler is expected to have already * acquired the port lock before calling this routine. */ static int usb_reset_and_verify_device(struct usb_device *udev) { struct usb_device *parent_hdev = udev->parent; struct usb_hub *parent_hub; struct usb_hcd *hcd = bus_to_hcd(udev->bus); struct usb_device_descriptor descriptor; struct usb_host_bos *bos; int i, j, ret = 0; int port1 = udev->portnum; if (udev->state == USB_STATE_NOTATTACHED || udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "device reset not allowed in state %d\n", udev->state); return -EINVAL; } if (!parent_hdev) return -EISDIR; parent_hub = usb_hub_to_struct_hub(parent_hdev); /* Disable USB2 hardware LPM. * It will be re-enabled by the enumeration process. */ usb_disable_usb2_hardware_lpm(udev); bos = udev->bos; udev->bos = NULL; mutex_lock(hcd->address0_mutex); for (i = 0; i < PORT_INIT_TRIES; ++i) { if (hub_port_stop_enumerate(parent_hub, port1, i)) { ret = -ENODEV; break; } /* ep0 maxpacket size may change; let the HCD know about it. * Other endpoints will be handled by re-enumeration. */ usb_ep0_reinit(udev); ret = hub_port_init(parent_hub, udev, port1, i, &descriptor); if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) break; } mutex_unlock(hcd->address0_mutex); if (ret < 0) goto re_enumerate; /* Device might have changed firmware (DFU or similar) */ if (descriptors_changed(udev, &descriptor, bos)) { dev_info(&udev->dev, "device firmware changed\n"); goto re_enumerate; } /* Restore the device's previous configuration */ if (!udev->actconfig) goto done; mutex_lock(hcd->bandwidth_mutex); ret = usb_hcd_alloc_bandwidth(udev, udev->actconfig, NULL, NULL); if (ret < 0) { dev_warn(&udev->dev, "Busted HC? Not enough HCD resources for " "old configuration.\n"); mutex_unlock(hcd->bandwidth_mutex); goto re_enumerate; } ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_CONFIGURATION, 0, udev->actconfig->desc.bConfigurationValue, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (ret < 0) { dev_err(&udev->dev, "can't restore configuration #%d (error=%d)\n", udev->actconfig->desc.bConfigurationValue, ret); mutex_unlock(hcd->bandwidth_mutex); goto re_enumerate; } mutex_unlock(hcd->bandwidth_mutex); usb_set_device_state(udev, USB_STATE_CONFIGURED); /* Put interfaces back into the same altsettings as before. * Don't bother to send the Set-Interface request for interfaces * that were already in altsetting 0; besides being unnecessary, * many devices can't handle it. Instead just reset the host-side * endpoint state. */ for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { struct usb_host_config *config = udev->actconfig; struct usb_interface *intf = config->interface[i]; struct usb_interface_descriptor *desc; desc = &intf->cur_altsetting->desc; if (desc->bAlternateSetting == 0) { usb_disable_interface(udev, intf, true); usb_enable_interface(udev, intf, true); ret = 0; } else { /* Let the bandwidth allocation function know that this * device has been reset, and it will have to use * alternate setting 0 as the current alternate setting. */ intf->resetting_device = 1; ret = usb_set_interface(udev, desc->bInterfaceNumber, desc->bAlternateSetting); intf->resetting_device = 0; } if (ret < 0) { dev_err(&udev->dev, "failed to restore interface %d " "altsetting %d (error=%d)\n", desc->bInterfaceNumber, desc->bAlternateSetting, ret); goto re_enumerate; } /* Resetting also frees any allocated streams */ for (j = 0; j < intf->cur_altsetting->desc.bNumEndpoints; j++) intf->cur_altsetting->endpoint[j].streams = 0; } done: /* Now that the alt settings are re-installed, enable LTM and LPM. */ usb_enable_usb2_hardware_lpm(udev); usb_unlocked_enable_lpm(udev); usb_enable_ltm(udev); usb_release_bos_descriptor(udev); udev->bos = bos; return 0; re_enumerate: usb_release_bos_descriptor(udev); udev->bos = bos; hub_port_logical_disconnect(parent_hub, port1); return -ENODEV; } /** * usb_reset_device - warn interface drivers and perform a USB port reset * @udev: device to reset (not in NOTATTACHED state) * * Warns all drivers bound to registered interfaces (using their pre_reset * method), performs the port reset, and then lets the drivers know that * the reset is over (using their post_reset method). * * Return: The same as for usb_reset_and_verify_device(). * However, if a reset is already in progress (for instance, if a * driver doesn't have pre_reset() or post_reset() callbacks, and while * being unbound or re-bound during the ongoing reset its disconnect() * or probe() routine tries to perform a second, nested reset), the * routine returns -EINPROGRESS. * * Note: * The caller must own the device lock. For example, it's safe to use * this from a driver probe() routine after downloading new firmware. * For calls that might not occur during probe(), drivers should lock * the device using usb_lock_device_for_reset(). * * If an interface is currently being probed or disconnected, we assume * its driver knows how to handle resets. For all other interfaces, * if the driver doesn't have pre_reset and post_reset methods then * we attempt to unbind it and rebind afterward. */ int usb_reset_device(struct usb_device *udev) { int ret; int i; unsigned int noio_flag; struct usb_port *port_dev; struct usb_host_config *config = udev->actconfig; struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); if (udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&udev->dev, "device reset not allowed in state %d\n", udev->state); return -EINVAL; } if (!udev->parent) { /* this requires hcd-specific logic; see ohci_restart() */ dev_dbg(&udev->dev, "%s for root hub!\n", __func__); return -EISDIR; } if (udev->reset_in_progress) return -EINPROGRESS; udev->reset_in_progress = 1; port_dev = hub->ports[udev->portnum - 1]; /* * Don't allocate memory with GFP_KERNEL in current * context to avoid possible deadlock if usb mass * storage interface or usbnet interface(iSCSI case) * is included in current configuration. The easist * approach is to do it for every device reset, * because the device 'memalloc_noio' flag may have * not been set before reseting the usb device. */ noio_flag = memalloc_noio_save(); /* Prevent autosuspend during the reset */ usb_autoresume_device(udev); if (config) { for (i = 0; i < config->desc.bNumInterfaces; ++i) { struct usb_interface *cintf = config->interface[i]; struct usb_driver *drv; int unbind = 0; if (cintf->dev.driver) { drv = to_usb_driver(cintf->dev.driver); if (drv->pre_reset && drv->post_reset) unbind = (drv->pre_reset)(cintf); else if (cintf->condition == USB_INTERFACE_BOUND) unbind = 1; if (unbind) usb_forced_unbind_intf(cintf); } } } usb_lock_port(port_dev); ret = usb_reset_and_verify_device(udev); usb_unlock_port(port_dev); if (config) { for (i = config->desc.bNumInterfaces - 1; i >= 0; --i) { struct usb_interface *cintf = config->interface[i]; struct usb_driver *drv; int rebind = cintf->needs_binding; if (!rebind && cintf->dev.driver) { drv = to_usb_driver(cintf->dev.driver); if (drv->post_reset) rebind = (drv->post_reset)(cintf); else if (cintf->condition == USB_INTERFACE_BOUND) rebind = 1; if (rebind) cintf->needs_binding = 1; } } /* If the reset failed, hub_wq will unbind drivers later */ if (ret == 0) usb_unbind_and_rebind_marked_interfaces(udev); } usb_autosuspend_device(udev); memalloc_noio_restore(noio_flag); udev->reset_in_progress = 0; return ret; } EXPORT_SYMBOL_GPL(usb_reset_device); /** * usb_queue_reset_device - Reset a USB device from an atomic context * @iface: USB interface belonging to the device to reset * * This function can be used to reset a USB device from an atomic * context, where usb_reset_device() won't work (as it blocks). * * Doing a reset via this method is functionally equivalent to calling * usb_reset_device(), except for the fact that it is delayed to a * workqueue. This means that any drivers bound to other interfaces * might be unbound, as well as users from usbfs in user space. * * Corner cases: * * - Scheduling two resets at the same time from two different drivers * attached to two different interfaces of the same device is * possible; depending on how the driver attached to each interface * handles ->pre_reset(), the second reset might happen or not. * * - If the reset is delayed so long that the interface is unbound from * its driver, the reset will be skipped. * * - This function can be called during .probe(). It can also be called * during .disconnect(), but doing so is pointless because the reset * will not occur. If you really want to reset the device during * .disconnect(), call usb_reset_device() directly -- but watch out * for nested unbinding issues! */ void usb_queue_reset_device(struct usb_interface *iface) { if (schedule_work(&iface->reset_ws)) usb_get_intf(iface); } EXPORT_SYMBOL_GPL(usb_queue_reset_device); /** * usb_hub_find_child - Get the pointer of child device * attached to the port which is specified by @port1. * @hdev: USB device belonging to the usb hub * @port1: port num to indicate which port the child device * is attached to. * * USB drivers call this function to get hub's child device * pointer. * * Return: %NULL if input param is invalid and * child's usb_device pointer if non-NULL. */ struct usb_device *usb_hub_find_child(struct usb_device *hdev, int port1) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (port1 < 1 || port1 > hdev->maxchild) return NULL; return hub->ports[port1 - 1]->child; } EXPORT_SYMBOL_GPL(usb_hub_find_child); void usb_hub_adjust_deviceremovable(struct usb_device *hdev, struct usb_hub_descriptor *desc) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); enum usb_port_connect_type connect_type; int i; if (!hub) return; if (!hub_is_superspeed(hdev)) { for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; connect_type = port_dev->connect_type; if (connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { u8 mask = 1 << (i%8); if (!(desc->u.hs.DeviceRemovable[i/8] & mask)) { dev_dbg(&port_dev->dev, "DeviceRemovable is changed to 1 according to platform information.\n"); desc->u.hs.DeviceRemovable[i/8] |= mask; } } } } else { u16 port_removable = le16_to_cpu(desc->u.ss.DeviceRemovable); for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; connect_type = port_dev->connect_type; if (connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { u16 mask = 1 << i; if (!(port_removable & mask)) { dev_dbg(&port_dev->dev, "DeviceRemovable is changed to 1 according to platform information.\n"); port_removable |= mask; } } } desc->u.ss.DeviceRemovable = cpu_to_le16(port_removable); } } #ifdef CONFIG_ACPI /** * usb_get_hub_port_acpi_handle - Get the usb port's acpi handle * @hdev: USB device belonging to the usb hub * @port1: port num of the port * * Return: Port's acpi handle if successful, %NULL if params are * invalid. */ acpi_handle usb_get_hub_port_acpi_handle(struct usb_device *hdev, int port1) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (!hub) return NULL; return ACPI_HANDLE(&hub->ports[port1 - 1]->dev); } #endif |
| 2 1 1 2 77 73 22 69 1 6 76 66 1 1 55 60 1 67 66 28 1 2 66 27 5 10 10 9 1 27 27 102 101 102 17 18 55 55 55 56 56 52 52 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 | // SPDX-License-Identifier: GPL-2.0-or-later /* * ALSA sequencer Memory Manager * Copyright (c) 1998 by Frank van de Pol <fvdpol@coil.demon.nl> * Jaroslav Kysela <perex@perex.cz> * 2000 by Takashi Iwai <tiwai@suse.de> */ #include <linux/init.h> #include <linux/export.h> #include <linux/slab.h> #include <linux/sched/signal.h> #include <linux/mm.h> #include <sound/core.h> #include <sound/seq_kernel.h> #include "seq_memory.h" #include "seq_queue.h" #include "seq_info.h" #include "seq_lock.h" static inline int snd_seq_pool_available(struct snd_seq_pool *pool) { return pool->total_elements - atomic_read(&pool->counter); } static inline int snd_seq_output_ok(struct snd_seq_pool *pool) { return snd_seq_pool_available(pool) >= pool->room; } /* * Variable length event: * The event like sysex uses variable length type. * The external data may be stored in three different formats. * 1) kernel space * This is the normal case. * ext.data.len = length * ext.data.ptr = buffer pointer * 2) user space * When an event is generated via read(), the external data is * kept in user space until expanded. * ext.data.len = length | SNDRV_SEQ_EXT_USRPTR * ext.data.ptr = userspace pointer * 3) chained cells * When the variable length event is enqueued (in prioq or fifo), * the external data is decomposed to several cells. * ext.data.len = length | SNDRV_SEQ_EXT_CHAINED * ext.data.ptr = the additiona cell head * -> cell.next -> cell.next -> .. */ /* * exported: * call dump function to expand external data. */ static int get_var_len(const struct snd_seq_event *event) { if ((event->flags & SNDRV_SEQ_EVENT_LENGTH_MASK) != SNDRV_SEQ_EVENT_LENGTH_VARIABLE) return -EINVAL; return event->data.ext.len & ~SNDRV_SEQ_EXT_MASK; } static int dump_var_event(const struct snd_seq_event *event, snd_seq_dump_func_t func, void *private_data, int offset, int maxlen) { int len, err; struct snd_seq_event_cell *cell; len = get_var_len(event); if (len <= 0) return len; if (len <= offset) return 0; if (maxlen && len > offset + maxlen) len = offset + maxlen; if (event->data.ext.len & SNDRV_SEQ_EXT_USRPTR) { char buf[32]; char __user *curptr = (char __force __user *)event->data.ext.ptr; curptr += offset; len -= offset; while (len > 0) { int size = sizeof(buf); if (len < size) size = len; if (copy_from_user(buf, curptr, size)) return -EFAULT; err = func(private_data, buf, size); if (err < 0) return err; curptr += size; len -= size; } return 0; } if (!(event->data.ext.len & SNDRV_SEQ_EXT_CHAINED)) return func(private_data, event->data.ext.ptr + offset, len - offset); cell = (struct snd_seq_event_cell *)event->data.ext.ptr; for (; len > 0 && cell; cell = cell->next) { int size = sizeof(struct snd_seq_event); char *curptr = (char *)&cell->event; if (offset >= size) { offset -= size; len -= size; continue; } if (len < size) size = len; err = func(private_data, curptr + offset, size - offset); if (err < 0) return err; offset = 0; len -= size; } return 0; } int snd_seq_dump_var_event(const struct snd_seq_event *event, snd_seq_dump_func_t func, void *private_data) { return dump_var_event(event, func, private_data, 0, 0); } EXPORT_SYMBOL(snd_seq_dump_var_event); /* * exported: * expand the variable length event to linear buffer space. */ static int seq_copy_in_kernel(void *ptr, void *src, int size) { char **bufptr = ptr; memcpy(*bufptr, src, size); *bufptr += size; return 0; } static int seq_copy_in_user(void *ptr, void *src, int size) { char __user **bufptr = ptr; if (copy_to_user(*bufptr, src, size)) return -EFAULT; *bufptr += size; return 0; } static int expand_var_event(const struct snd_seq_event *event, int offset, int size, char *buf, bool in_kernel) { if (event->data.ext.len & SNDRV_SEQ_EXT_USRPTR) { if (! in_kernel) return -EINVAL; if (copy_from_user(buf, (char __force __user *)event->data.ext.ptr + offset, size)) return -EFAULT; return 0; } return dump_var_event(event, in_kernel ? seq_copy_in_kernel : seq_copy_in_user, &buf, offset, size); } int snd_seq_expand_var_event(const struct snd_seq_event *event, int count, char *buf, int in_kernel, int size_aligned) { int len, newlen, err; len = get_var_len(event); if (len < 0) return len; newlen = len; if (size_aligned > 0) newlen = roundup(len, size_aligned); if (count < newlen) return -EAGAIN; err = expand_var_event(event, 0, len, buf, in_kernel); if (err < 0) return err; if (len != newlen) { if (in_kernel) memset(buf + len, 0, newlen - len); else if (clear_user((__force void __user *)buf + len, newlen - len)) return -EFAULT; } return newlen; } EXPORT_SYMBOL(snd_seq_expand_var_event); int snd_seq_expand_var_event_at(const struct snd_seq_event *event, int count, char *buf, int offset) { int len, err; len = get_var_len(event); if (len < 0) return len; if (len <= offset) return 0; len -= offset; if (len > count) len = count; err = expand_var_event(event, offset, count, buf, true); if (err < 0) return err; return len; } EXPORT_SYMBOL_GPL(snd_seq_expand_var_event_at); /* * release this cell, free extended data if available */ static inline void free_cell(struct snd_seq_pool *pool, struct snd_seq_event_cell *cell) { cell->next = pool->free; pool->free = cell; atomic_dec(&pool->counter); } void snd_seq_cell_free(struct snd_seq_event_cell * cell) { struct snd_seq_pool *pool; if (snd_BUG_ON(!cell)) return; pool = cell->pool; if (snd_BUG_ON(!pool)) return; guard(spinlock_irqsave)(&pool->lock); free_cell(pool, cell); if (snd_seq_ev_is_variable(&cell->event)) { if (cell->event.data.ext.len & SNDRV_SEQ_EXT_CHAINED) { struct snd_seq_event_cell *curp, *nextptr; curp = cell->event.data.ext.ptr; for (; curp; curp = nextptr) { nextptr = curp->next; curp->next = pool->free; free_cell(pool, curp); } } } if (waitqueue_active(&pool->output_sleep)) { /* has enough space now? */ if (snd_seq_output_ok(pool)) wake_up(&pool->output_sleep); } } /* * allocate an event cell. */ static int snd_seq_cell_alloc(struct snd_seq_pool *pool, struct snd_seq_event_cell **cellp, int nonblock, struct file *file, struct mutex *mutexp) { struct snd_seq_event_cell *cell; unsigned long flags; int err = -EAGAIN; wait_queue_entry_t wait; if (pool == NULL) return -EINVAL; *cellp = NULL; init_waitqueue_entry(&wait, current); spin_lock_irqsave(&pool->lock, flags); if (pool->ptr == NULL) { /* not initialized */ pr_debug("ALSA: seq: pool is not initialized\n"); err = -EINVAL; goto __error; } while (pool->free == NULL && ! nonblock && ! pool->closing) { set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&pool->output_sleep, &wait); spin_unlock_irqrestore(&pool->lock, flags); if (mutexp) mutex_unlock(mutexp); schedule(); if (mutexp) mutex_lock(mutexp); spin_lock_irqsave(&pool->lock, flags); remove_wait_queue(&pool->output_sleep, &wait); /* interrupted? */ if (signal_pending(current)) { err = -ERESTARTSYS; goto __error; } } if (pool->closing) { /* closing.. */ err = -ENOMEM; goto __error; } cell = pool->free; if (cell) { int used; pool->free = cell->next; atomic_inc(&pool->counter); used = atomic_read(&pool->counter); if (pool->max_used < used) pool->max_used = used; pool->event_alloc_success++; /* clear cell pointers */ cell->next = NULL; err = 0; } else pool->event_alloc_failures++; *cellp = cell; __error: spin_unlock_irqrestore(&pool->lock, flags); return err; } /* * duplicate the event to a cell. * if the event has external data, the data is decomposed to additional * cells. */ int snd_seq_event_dup(struct snd_seq_pool *pool, struct snd_seq_event *event, struct snd_seq_event_cell **cellp, int nonblock, struct file *file, struct mutex *mutexp) { int ncells, err; unsigned int extlen; struct snd_seq_event_cell *cell; int size; *cellp = NULL; ncells = 0; extlen = 0; if (snd_seq_ev_is_variable(event)) { extlen = event->data.ext.len & ~SNDRV_SEQ_EXT_MASK; ncells = DIV_ROUND_UP(extlen, sizeof(struct snd_seq_event)); } if (ncells >= pool->total_elements) return -ENOMEM; err = snd_seq_cell_alloc(pool, &cell, nonblock, file, mutexp); if (err < 0) return err; /* copy the event */ size = snd_seq_event_packet_size(event); memcpy(&cell->ump, event, size); #if IS_ENABLED(CONFIG_SND_SEQ_UMP) if (size < sizeof(cell->event)) cell->ump.raw.extra = 0; #endif /* decompose */ if (snd_seq_ev_is_variable(event)) { int len = extlen; int is_chained = event->data.ext.len & SNDRV_SEQ_EXT_CHAINED; int is_usrptr = event->data.ext.len & SNDRV_SEQ_EXT_USRPTR; struct snd_seq_event_cell *src, *tmp, *tail; char *buf; cell->event.data.ext.len = extlen | SNDRV_SEQ_EXT_CHAINED; cell->event.data.ext.ptr = NULL; src = (struct snd_seq_event_cell *)event->data.ext.ptr; buf = (char *)event->data.ext.ptr; tail = NULL; while (ncells-- > 0) { size = sizeof(struct snd_seq_event); if (len < size) size = len; err = snd_seq_cell_alloc(pool, &tmp, nonblock, file, mutexp); if (err < 0) goto __error; if (cell->event.data.ext.ptr == NULL) cell->event.data.ext.ptr = tmp; if (tail) tail->next = tmp; tail = tmp; /* copy chunk */ if (is_chained && src) { tmp->event = src->event; src = src->next; } else if (is_usrptr) { if (copy_from_user(&tmp->event, (char __force __user *)buf, size)) { err = -EFAULT; goto __error; } } else { memcpy(&tmp->event, buf, size); } buf += size; len -= size; } } *cellp = cell; return 0; __error: snd_seq_cell_free(cell); return err; } /* poll wait */ int snd_seq_pool_poll_wait(struct snd_seq_pool *pool, struct file *file, poll_table *wait) { poll_wait(file, &pool->output_sleep, wait); return snd_seq_output_ok(pool); } /* allocate room specified number of events */ int snd_seq_pool_init(struct snd_seq_pool *pool) { int cell; struct snd_seq_event_cell *cellptr; if (snd_BUG_ON(!pool)) return -EINVAL; cellptr = kvmalloc_array(pool->size, sizeof(struct snd_seq_event_cell), GFP_KERNEL); if (!cellptr) return -ENOMEM; /* add new cells to the free cell list */ guard(spinlock_irq)(&pool->lock); if (pool->ptr) { kvfree(cellptr); return 0; } pool->ptr = cellptr; pool->free = NULL; for (cell = 0; cell < pool->size; cell++) { cellptr = pool->ptr + cell; cellptr->pool = pool; cellptr->next = pool->free; pool->free = cellptr; } pool->room = (pool->size + 1) / 2; /* init statistics */ pool->max_used = 0; pool->total_elements = pool->size; return 0; } /* refuse the further insertion to the pool */ void snd_seq_pool_mark_closing(struct snd_seq_pool *pool) { if (snd_BUG_ON(!pool)) return; guard(spinlock_irqsave)(&pool->lock); pool->closing = 1; } /* remove events */ int snd_seq_pool_done(struct snd_seq_pool *pool) { struct snd_seq_event_cell *ptr; if (snd_BUG_ON(!pool)) return -EINVAL; /* wait for closing all threads */ if (waitqueue_active(&pool->output_sleep)) wake_up(&pool->output_sleep); while (atomic_read(&pool->counter) > 0) schedule_timeout_uninterruptible(1); /* release all resources */ scoped_guard(spinlock_irq, &pool->lock) { ptr = pool->ptr; pool->ptr = NULL; pool->free = NULL; pool->total_elements = 0; } kvfree(ptr); guard(spinlock_irq)(&pool->lock); pool->closing = 0; return 0; } /* init new memory pool */ struct snd_seq_pool *snd_seq_pool_new(int poolsize) { struct snd_seq_pool *pool; /* create pool block */ pool = kzalloc(sizeof(*pool), GFP_KERNEL); if (!pool) return NULL; spin_lock_init(&pool->lock); pool->ptr = NULL; pool->free = NULL; pool->total_elements = 0; atomic_set(&pool->counter, 0); pool->closing = 0; init_waitqueue_head(&pool->output_sleep); pool->size = poolsize; /* init statistics */ pool->max_used = 0; return pool; } /* remove memory pool */ int snd_seq_pool_delete(struct snd_seq_pool **ppool) { struct snd_seq_pool *pool = *ppool; *ppool = NULL; if (pool == NULL) return 0; snd_seq_pool_mark_closing(pool); snd_seq_pool_done(pool); kfree(pool); return 0; } /* exported to seq_clientmgr.c */ void snd_seq_info_pool(struct snd_info_buffer *buffer, struct snd_seq_pool *pool, char *space) { if (pool == NULL) return; snd_iprintf(buffer, "%sPool size : %d\n", space, pool->total_elements); snd_iprintf(buffer, "%sCells in use : %d\n", space, atomic_read(&pool->counter)); snd_iprintf(buffer, "%sPeak cells in use : %d\n", space, pool->max_used); snd_iprintf(buffer, "%sAlloc success : %d\n", space, pool->event_alloc_success); snd_iprintf(buffer, "%sAlloc failures : %d\n", space, pool->event_alloc_failures); } |
| 54 10 10 10 3 3 3 177 178 1 1 1 1 1 1 3 5 6 3 3 3 2 5 6 6 6 1 6 6 5 4 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 | /* * cgroup_freezer.c - control group freezer subsystem * * Copyright IBM Corporation, 2007 * * Author : Cedric Le Goater <clg@fr.ibm.com> * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2.1 of the GNU Lesser General Public License * as published by the Free Software Foundation. * * This program is distributed in the hope that it would be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ #include <linux/export.h> #include <linux/slab.h> #include <linux/cgroup.h> #include <linux/fs.h> #include <linux/uaccess.h> #include <linux/freezer.h> #include <linux/seq_file.h> #include <linux/mutex.h> #include <linux/cpu.h> /* * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of * its ancestors has FREEZING_SELF set. */ enum freezer_state_flags { CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */ CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */ CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */ CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */ /* mask for all FREEZING flags */ CGROUP_FREEZING = CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT, }; struct freezer { struct cgroup_subsys_state css; unsigned int state; }; static DEFINE_MUTEX(freezer_mutex); static inline struct freezer *css_freezer(struct cgroup_subsys_state *css) { return css ? container_of(css, struct freezer, css) : NULL; } static inline struct freezer *task_freezer(struct task_struct *task) { return css_freezer(task_css(task, freezer_cgrp_id)); } static struct freezer *parent_freezer(struct freezer *freezer) { return css_freezer(freezer->css.parent); } bool cgroup_freezing(struct task_struct *task) { bool ret; unsigned int state; rcu_read_lock(); /* Check if the cgroup is still FREEZING, but not FROZEN. The extra * !FROZEN check is required, because the FREEZING bit is not cleared * when the state FROZEN is reached. */ state = task_freezer(task)->state; ret = (state & CGROUP_FREEZING) && !(state & CGROUP_FROZEN); rcu_read_unlock(); return ret; } static const char *freezer_state_strs(unsigned int state) { if (state & CGROUP_FROZEN) return "FROZEN"; if (state & CGROUP_FREEZING) return "FREEZING"; return "THAWED"; }; static struct cgroup_subsys_state * freezer_css_alloc(struct cgroup_subsys_state *parent_css) { struct freezer *freezer; freezer = kzalloc(sizeof(struct freezer), GFP_KERNEL); if (!freezer) return ERR_PTR(-ENOMEM); return &freezer->css; } /** * freezer_css_online - commit creation of a freezer css * @css: css being created * * We're committing to creation of @css. Mark it online and inherit * parent's freezing state while holding cpus read lock and freezer_mutex. */ static int freezer_css_online(struct cgroup_subsys_state *css) { struct freezer *freezer = css_freezer(css); struct freezer *parent = parent_freezer(freezer); cpus_read_lock(); mutex_lock(&freezer_mutex); freezer->state |= CGROUP_FREEZER_ONLINE; if (parent && (parent->state & CGROUP_FREEZING)) { freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN; static_branch_inc_cpuslocked(&freezer_active); } mutex_unlock(&freezer_mutex); cpus_read_unlock(); return 0; } /** * freezer_css_offline - initiate destruction of a freezer css * @css: css being destroyed * * @css is going away. Mark it dead and decrement freezer_active if * it was holding one. */ static void freezer_css_offline(struct cgroup_subsys_state *css) { struct freezer *freezer = css_freezer(css); cpus_read_lock(); mutex_lock(&freezer_mutex); if (freezer->state & CGROUP_FREEZING) static_branch_dec_cpuslocked(&freezer_active); freezer->state = 0; mutex_unlock(&freezer_mutex); cpus_read_unlock(); } static void freezer_css_free(struct cgroup_subsys_state *css) { kfree(css_freezer(css)); } /* * Tasks can be migrated into a different freezer anytime regardless of its * current state. freezer_attach() is responsible for making new tasks * conform to the current state. * * Freezer state changes and task migration are synchronized via * @freezer->lock. freezer_attach() makes the new tasks conform to the * current state and all following state changes can see the new tasks. */ static void freezer_attach(struct cgroup_taskset *tset) { struct task_struct *task; struct cgroup_subsys_state *new_css; mutex_lock(&freezer_mutex); /* * Make the new tasks conform to the current state of @new_css. * For simplicity, when migrating any task to a FROZEN cgroup, we * revert it to FREEZING and let update_if_frozen() determine the * correct state later. * * Tasks in @tset are on @new_css but may not conform to its * current state before executing the following - !frozen tasks may * be visible in a FROZEN cgroup and frozen tasks in a THAWED one. */ cgroup_taskset_for_each(task, new_css, tset) { struct freezer *freezer = css_freezer(new_css); if (!(freezer->state & CGROUP_FREEZING)) { __thaw_task(task); } else { freeze_task(task); /* clear FROZEN and propagate upwards */ while (freezer && (freezer->state & CGROUP_FROZEN)) { freezer->state &= ~CGROUP_FROZEN; freezer = parent_freezer(freezer); } } } mutex_unlock(&freezer_mutex); } /** * freezer_fork - cgroup post fork callback * @task: a task which has just been forked * * @task has just been created and should conform to the current state of * the cgroup_freezer it belongs to. This function may race against * freezer_attach(). Losing to freezer_attach() means that we don't have * to do anything as freezer_attach() will put @task into the appropriate * state. */ static void freezer_fork(struct task_struct *task) { struct freezer *freezer; /* * The root cgroup is non-freezable, so we can skip locking the * freezer. This is safe regardless of race with task migration. * If we didn't race or won, skipping is obviously the right thing * to do. If we lost and root is the new cgroup, noop is still the * right thing to do. */ if (task_css_is_root(task, freezer_cgrp_id)) return; mutex_lock(&freezer_mutex); rcu_read_lock(); freezer = task_freezer(task); if (freezer->state & CGROUP_FREEZING) freeze_task(task); rcu_read_unlock(); mutex_unlock(&freezer_mutex); } /** * update_if_frozen - update whether a cgroup finished freezing * @css: css of interest * * Once FREEZING is initiated, transition to FROZEN is lazily updated by * calling this function. If the current state is FREEZING but not FROZEN, * this function checks whether all tasks of this cgroup and the descendant * cgroups finished freezing and, if so, sets FROZEN. * * The caller is responsible for grabbing RCU read lock and calling * update_if_frozen() on all descendants prior to invoking this function. * * Task states and freezer state might disagree while tasks are being * migrated into or out of @css, so we can't verify task states against * @freezer state here. See freezer_attach() for details. */ static void update_if_frozen(struct cgroup_subsys_state *css) { struct freezer *freezer = css_freezer(css); struct cgroup_subsys_state *pos; struct css_task_iter it; struct task_struct *task; lockdep_assert_held(&freezer_mutex); if (!(freezer->state & CGROUP_FREEZING) || (freezer->state & CGROUP_FROZEN)) return; /* are all (live) children frozen? */ rcu_read_lock(); css_for_each_child(pos, css) { struct freezer *child = css_freezer(pos); if ((child->state & CGROUP_FREEZER_ONLINE) && !(child->state & CGROUP_FROZEN)) { rcu_read_unlock(); return; } } rcu_read_unlock(); /* are all tasks frozen? */ css_task_iter_start(css, 0, &it); while ((task = css_task_iter_next(&it))) { if (freezing(task) && !frozen(task)) goto out_iter_end; } freezer->state |= CGROUP_FROZEN; out_iter_end: css_task_iter_end(&it); } static int freezer_read(struct seq_file *m, void *v) { struct cgroup_subsys_state *css = seq_css(m), *pos; mutex_lock(&freezer_mutex); rcu_read_lock(); /* update states bottom-up */ css_for_each_descendant_post(pos, css) { if (!css_tryget_online(pos)) continue; rcu_read_unlock(); update_if_frozen(pos); rcu_read_lock(); css_put(pos); } rcu_read_unlock(); mutex_unlock(&freezer_mutex); seq_puts(m, freezer_state_strs(css_freezer(css)->state)); seq_putc(m, '\n'); return 0; } static void freeze_cgroup(struct freezer *freezer) { struct css_task_iter it; struct task_struct *task; css_task_iter_start(&freezer->css, 0, &it); while ((task = css_task_iter_next(&it))) freeze_task(task); css_task_iter_end(&it); } static void unfreeze_cgroup(struct freezer *freezer) { struct css_task_iter it; struct task_struct *task; css_task_iter_start(&freezer->css, 0, &it); while ((task = css_task_iter_next(&it))) __thaw_task(task); css_task_iter_end(&it); } /** * freezer_apply_state - apply state change to a single cgroup_freezer * @freezer: freezer to apply state change to * @freeze: whether to freeze or unfreeze * @state: CGROUP_FREEZING_* flag to set or clear * * Set or clear @state on @cgroup according to @freeze, and perform * freezing or thawing as necessary. */ static void freezer_apply_state(struct freezer *freezer, bool freeze, unsigned int state) { /* also synchronizes against task migration, see freezer_attach() */ lockdep_assert_held(&freezer_mutex); if (!(freezer->state & CGROUP_FREEZER_ONLINE)) return; if (freeze) { if (!(freezer->state & CGROUP_FREEZING)) static_branch_inc_cpuslocked(&freezer_active); freezer->state |= state; freeze_cgroup(freezer); } else { bool was_freezing = freezer->state & CGROUP_FREEZING; freezer->state &= ~state; if (!(freezer->state & CGROUP_FREEZING)) { freezer->state &= ~CGROUP_FROZEN; if (was_freezing) static_branch_dec_cpuslocked(&freezer_active); unfreeze_cgroup(freezer); } } } /** * freezer_change_state - change the freezing state of a cgroup_freezer * @freezer: freezer of interest * @freeze: whether to freeze or thaw * * Freeze or thaw @freezer according to @freeze. The operations are * recursive - all descendants of @freezer will be affected. */ static void freezer_change_state(struct freezer *freezer, bool freeze) { struct cgroup_subsys_state *pos; cpus_read_lock(); /* * Update all its descendants in pre-order traversal. Each * descendant will try to inherit its parent's FREEZING state as * CGROUP_FREEZING_PARENT. */ mutex_lock(&freezer_mutex); rcu_read_lock(); css_for_each_descendant_pre(pos, &freezer->css) { struct freezer *pos_f = css_freezer(pos); struct freezer *parent = parent_freezer(pos_f); if (!css_tryget_online(pos)) continue; rcu_read_unlock(); if (pos_f == freezer) freezer_apply_state(pos_f, freeze, CGROUP_FREEZING_SELF); else freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING, CGROUP_FREEZING_PARENT); rcu_read_lock(); css_put(pos); } rcu_read_unlock(); mutex_unlock(&freezer_mutex); cpus_read_unlock(); } static ssize_t freezer_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { bool freeze; buf = strstrip(buf); if (strcmp(buf, freezer_state_strs(0)) == 0) freeze = false; else if (strcmp(buf, freezer_state_strs(CGROUP_FROZEN)) == 0) freeze = true; else return -EINVAL; freezer_change_state(css_freezer(of_css(of)), freeze); return nbytes; } static u64 freezer_self_freezing_read(struct cgroup_subsys_state *css, struct cftype *cft) { struct freezer *freezer = css_freezer(css); return (bool)(freezer->state & CGROUP_FREEZING_SELF); } static u64 freezer_parent_freezing_read(struct cgroup_subsys_state *css, struct cftype *cft) { struct freezer *freezer = css_freezer(css); return (bool)(freezer->state & CGROUP_FREEZING_PARENT); } static struct cftype files[] = { { .name = "state", .flags = CFTYPE_NOT_ON_ROOT, .seq_show = freezer_read, .write = freezer_write, }, { .name = "self_freezing", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = freezer_self_freezing_read, }, { .name = "parent_freezing", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = freezer_parent_freezing_read, }, { } /* terminate */ }; struct cgroup_subsys freezer_cgrp_subsys = { .css_alloc = freezer_css_alloc, .css_online = freezer_css_online, .css_offline = freezer_css_offline, .css_free = freezer_css_free, .attach = freezer_attach, .fork = freezer_fork, .legacy_cftypes = files, }; |
| 3 2 1 1 4 1 4 1 35 22 6 7 5 5 5 5 5 5 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 | /* * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/in.h> #include <linux/module.h> #include <net/tcp.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <net/addrconf.h> #include "rds.h" #include "tcp.h" /* only for info exporting */ static DEFINE_SPINLOCK(rds_tcp_tc_list_lock); static LIST_HEAD(rds_tcp_tc_list); /* rds_tcp_tc_count counts only IPv4 connections. * rds6_tcp_tc_count counts both IPv4 and IPv6 connections. */ static unsigned int rds_tcp_tc_count; #if IS_ENABLED(CONFIG_IPV6) static unsigned int rds6_tcp_tc_count; #endif /* Track rds_tcp_connection structs so they can be cleaned up */ static DEFINE_SPINLOCK(rds_tcp_conn_lock); static LIST_HEAD(rds_tcp_conn_list); static atomic_t rds_tcp_unloading = ATOMIC_INIT(0); static struct kmem_cache *rds_tcp_conn_slab; static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *fpos); static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF; static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF; static struct ctl_table rds_tcp_sysctl_table[] = { #define RDS_TCP_SNDBUF 0 { .procname = "rds_tcp_sndbuf", /* data is per-net pointer */ .maxlen = sizeof(int), .mode = 0644, .proc_handler = rds_tcp_skbuf_handler, .extra1 = &rds_tcp_min_sndbuf, }, #define RDS_TCP_RCVBUF 1 { .procname = "rds_tcp_rcvbuf", /* data is per-net pointer */ .maxlen = sizeof(int), .mode = 0644, .proc_handler = rds_tcp_skbuf_handler, .extra1 = &rds_tcp_min_rcvbuf, }, }; u32 rds_tcp_write_seq(struct rds_tcp_connection *tc) { /* seq# of the last byte of data in tcp send buffer */ return tcp_sk(tc->t_sock->sk)->write_seq; } u32 rds_tcp_snd_una(struct rds_tcp_connection *tc) { return tcp_sk(tc->t_sock->sk)->snd_una; } void rds_tcp_restore_callbacks(struct socket *sock, struct rds_tcp_connection *tc) { rdsdebug("restoring sock %p callbacks from tc %p\n", sock, tc); write_lock_bh(&sock->sk->sk_callback_lock); /* done under the callback_lock to serialize with write_space */ spin_lock(&rds_tcp_tc_list_lock); list_del_init(&tc->t_list_item); #if IS_ENABLED(CONFIG_IPV6) rds6_tcp_tc_count--; #endif if (!tc->t_cpath->cp_conn->c_isv6) rds_tcp_tc_count--; spin_unlock(&rds_tcp_tc_list_lock); tc->t_sock = NULL; sock->sk->sk_write_space = tc->t_orig_write_space; sock->sk->sk_data_ready = tc->t_orig_data_ready; sock->sk->sk_state_change = tc->t_orig_state_change; sock->sk->sk_user_data = NULL; write_unlock_bh(&sock->sk->sk_callback_lock); } /* * rds_tcp_reset_callbacks() switches the to the new sock and * returns the existing tc->t_sock. * * The only functions that set tc->t_sock are rds_tcp_set_callbacks * and rds_tcp_reset_callbacks. Send and receive trust that * it is set. The absence of RDS_CONN_UP bit protects those paths * from being called while it isn't set. */ void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp) { struct rds_tcp_connection *tc = cp->cp_transport_data; struct socket *osock = tc->t_sock; if (!osock) goto newsock; /* Need to resolve a duelling SYN between peers. * We have an outstanding SYN to this peer, which may * potentially have transitioned to the RDS_CONN_UP state, * so we must quiesce any send threads before resetting * cp_transport_data. We quiesce these threads by setting * cp_state to something other than RDS_CONN_UP, and then * waiting for any existing threads in rds_send_xmit to * complete release_in_xmit(). (Subsequent threads entering * rds_send_xmit() will bail on !rds_conn_up(). * * However an incoming syn-ack at this point would end up * marking the conn as RDS_CONN_UP, and would again permit * rds_send_xmi() threads through, so ideally we would * synchronize on RDS_CONN_UP after lock_sock(), but cannot * do that: waiting on !RDS_IN_XMIT after lock_sock() may * end up deadlocking with tcp_sendmsg(), and the RDS_IN_XMIT * would not get set. As a result, we set c_state to * RDS_CONN_RESETTTING, to ensure that rds_tcp_state_change * cannot mark rds_conn_path_up() in the window before lock_sock() */ atomic_set(&cp->cp_state, RDS_CONN_RESETTING); wait_event(cp->cp_waitq, !test_bit(RDS_IN_XMIT, &cp->cp_flags)); /* reset receive side state for rds_tcp_data_recv() for osock */ cancel_delayed_work_sync(&cp->cp_send_w); cancel_delayed_work_sync(&cp->cp_recv_w); lock_sock(osock->sk); if (tc->t_tinc) { rds_inc_put(&tc->t_tinc->ti_inc); tc->t_tinc = NULL; } tc->t_tinc_hdr_rem = sizeof(struct rds_header); tc->t_tinc_data_rem = 0; rds_tcp_restore_callbacks(osock, tc); release_sock(osock->sk); sock_release(osock); newsock: rds_send_path_reset(cp); lock_sock(sock->sk); rds_tcp_set_callbacks(sock, cp); release_sock(sock->sk); } /* Add tc to rds_tcp_tc_list and set tc->t_sock. See comments * above rds_tcp_reset_callbacks for notes about synchronization * with data path */ void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp) { struct rds_tcp_connection *tc = cp->cp_transport_data; rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc); write_lock_bh(&sock->sk->sk_callback_lock); /* done under the callback_lock to serialize with write_space */ spin_lock(&rds_tcp_tc_list_lock); list_add_tail(&tc->t_list_item, &rds_tcp_tc_list); #if IS_ENABLED(CONFIG_IPV6) rds6_tcp_tc_count++; #endif if (!tc->t_cpath->cp_conn->c_isv6) rds_tcp_tc_count++; spin_unlock(&rds_tcp_tc_list_lock); /* accepted sockets need our listen data ready undone */ if (sock->sk->sk_data_ready == rds_tcp_listen_data_ready) sock->sk->sk_data_ready = sock->sk->sk_user_data; tc->t_sock = sock; tc->t_cpath = cp; tc->t_orig_data_ready = sock->sk->sk_data_ready; tc->t_orig_write_space = sock->sk->sk_write_space; tc->t_orig_state_change = sock->sk->sk_state_change; sock->sk->sk_user_data = cp; sock->sk->sk_data_ready = rds_tcp_data_ready; sock->sk->sk_write_space = rds_tcp_write_space; sock->sk->sk_state_change = rds_tcp_state_change; write_unlock_bh(&sock->sk->sk_callback_lock); } /* Handle RDS_INFO_TCP_SOCKETS socket option. It only returns IPv4 * connections for backward compatibility. */ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len, struct rds_info_iterator *iter, struct rds_info_lengths *lens) { struct rds_info_tcp_socket tsinfo; struct rds_tcp_connection *tc; unsigned long flags; spin_lock_irqsave(&rds_tcp_tc_list_lock, flags); if (len / sizeof(tsinfo) < rds_tcp_tc_count) goto out; list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) { struct inet_sock *inet = inet_sk(tc->t_sock->sk); if (tc->t_cpath->cp_conn->c_isv6) continue; tsinfo.local_addr = inet->inet_saddr; tsinfo.local_port = inet->inet_sport; tsinfo.peer_addr = inet->inet_daddr; tsinfo.peer_port = inet->inet_dport; tsinfo.hdr_rem = tc->t_tinc_hdr_rem; tsinfo.data_rem = tc->t_tinc_data_rem; tsinfo.last_sent_nxt = tc->t_last_sent_nxt; tsinfo.last_expected_una = tc->t_last_expected_una; tsinfo.last_seen_una = tc->t_last_seen_una; tsinfo.tos = tc->t_cpath->cp_conn->c_tos; rds_info_copy(iter, &tsinfo, sizeof(tsinfo)); } out: lens->nr = rds_tcp_tc_count; lens->each = sizeof(tsinfo); spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags); } #if IS_ENABLED(CONFIG_IPV6) /* Handle RDS6_INFO_TCP_SOCKETS socket option. It returns both IPv4 and * IPv6 connections. IPv4 connection address is returned in an IPv4 mapped * address. */ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len, struct rds_info_iterator *iter, struct rds_info_lengths *lens) { struct rds6_info_tcp_socket tsinfo6; struct rds_tcp_connection *tc; unsigned long flags; spin_lock_irqsave(&rds_tcp_tc_list_lock, flags); if (len / sizeof(tsinfo6) < rds6_tcp_tc_count) goto out; list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) { struct sock *sk = tc->t_sock->sk; struct inet_sock *inet = inet_sk(sk); tsinfo6.local_addr = sk->sk_v6_rcv_saddr; tsinfo6.local_port = inet->inet_sport; tsinfo6.peer_addr = sk->sk_v6_daddr; tsinfo6.peer_port = inet->inet_dport; tsinfo6.hdr_rem = tc->t_tinc_hdr_rem; tsinfo6.data_rem = tc->t_tinc_data_rem; tsinfo6.last_sent_nxt = tc->t_last_sent_nxt; tsinfo6.last_expected_una = tc->t_last_expected_una; tsinfo6.last_seen_una = tc->t_last_seen_una; rds_info_copy(iter, &tsinfo6, sizeof(tsinfo6)); } out: lens->nr = rds6_tcp_tc_count; lens->each = sizeof(tsinfo6); spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags); } #endif int rds_tcp_laddr_check(struct net *net, const struct in6_addr *addr, __u32 scope_id) { struct net_device *dev = NULL; #if IS_ENABLED(CONFIG_IPV6) int ret; #endif if (ipv6_addr_v4mapped(addr)) { if (inet_addr_type(net, addr->s6_addr32[3]) == RTN_LOCAL) return 0; return -EADDRNOTAVAIL; } /* If the scope_id is specified, check only those addresses * hosted on the specified interface. */ if (scope_id != 0) { rcu_read_lock(); dev = dev_get_by_index_rcu(net, scope_id); /* scope_id is not valid... */ if (!dev) { rcu_read_unlock(); return -EADDRNOTAVAIL; } rcu_read_unlock(); } #if IS_ENABLED(CONFIG_IPV6) ret = ipv6_chk_addr(net, addr, dev, 0); if (ret) return 0; #endif return -EADDRNOTAVAIL; } static void rds_tcp_conn_free(void *arg) { struct rds_tcp_connection *tc = arg; unsigned long flags; rdsdebug("freeing tc %p\n", tc); spin_lock_irqsave(&rds_tcp_conn_lock, flags); if (!tc->t_tcp_node_detached) list_del(&tc->t_tcp_node); spin_unlock_irqrestore(&rds_tcp_conn_lock, flags); kmem_cache_free(rds_tcp_conn_slab, tc); } static int rds_tcp_conn_alloc(struct rds_connection *conn, gfp_t gfp) { struct rds_tcp_connection *tc; int i, j; int ret = 0; for (i = 0; i < RDS_MPATH_WORKERS; i++) { tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp); if (!tc) { ret = -ENOMEM; goto fail; } mutex_init(&tc->t_conn_path_lock); tc->t_sock = NULL; tc->t_tinc = NULL; tc->t_tinc_hdr_rem = sizeof(struct rds_header); tc->t_tinc_data_rem = 0; conn->c_path[i].cp_transport_data = tc; tc->t_cpath = &conn->c_path[i]; tc->t_tcp_node_detached = true; rdsdebug("rds_conn_path [%d] tc %p\n", i, conn->c_path[i].cp_transport_data); } spin_lock_irq(&rds_tcp_conn_lock); for (i = 0; i < RDS_MPATH_WORKERS; i++) { tc = conn->c_path[i].cp_transport_data; tc->t_tcp_node_detached = false; list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list); } spin_unlock_irq(&rds_tcp_conn_lock); fail: if (ret) { for (j = 0; j < i; j++) rds_tcp_conn_free(conn->c_path[j].cp_transport_data); } return ret; } static bool list_has_conn(struct list_head *list, struct rds_connection *conn) { struct rds_tcp_connection *tc, *_tc; list_for_each_entry_safe(tc, _tc, list, t_tcp_node) { if (tc->t_cpath->cp_conn == conn) return true; } return false; } static void rds_tcp_set_unloading(void) { atomic_set(&rds_tcp_unloading, 1); } static bool rds_tcp_is_unloading(struct rds_connection *conn) { return atomic_read(&rds_tcp_unloading) != 0; } static void rds_tcp_destroy_conns(void) { struct rds_tcp_connection *tc, *_tc; LIST_HEAD(tmp_list); /* avoid calling conn_destroy with irqs off */ spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) list_move_tail(&tc->t_tcp_node, &tmp_list); } spin_unlock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) rds_conn_destroy(tc->t_cpath->cp_conn); } static void rds_tcp_exit(void); static u8 rds_tcp_get_tos_map(u8 tos) { /* all user tos mapped to default 0 for TCP transport */ return 0; } struct rds_transport rds_tcp_transport = { .laddr_check = rds_tcp_laddr_check, .xmit_path_prepare = rds_tcp_xmit_path_prepare, .xmit_path_complete = rds_tcp_xmit_path_complete, .xmit = rds_tcp_xmit, .recv_path = rds_tcp_recv_path, .conn_alloc = rds_tcp_conn_alloc, .conn_free = rds_tcp_conn_free, .conn_path_connect = rds_tcp_conn_path_connect, .conn_path_shutdown = rds_tcp_conn_path_shutdown, .inc_copy_to_user = rds_tcp_inc_copy_to_user, .inc_free = rds_tcp_inc_free, .stats_info_copy = rds_tcp_stats_info_copy, .exit = rds_tcp_exit, .get_tos_map = rds_tcp_get_tos_map, .t_owner = THIS_MODULE, .t_name = "tcp", .t_type = RDS_TRANS_TCP, .t_prefer_loopback = 1, .t_mp_capable = 1, .t_unloading = rds_tcp_is_unloading, }; static unsigned int rds_tcp_netid; /* per-network namespace private data for this module */ struct rds_tcp_net { struct socket *rds_tcp_listen_sock; struct work_struct rds_tcp_accept_w; struct ctl_table_header *rds_tcp_sysctl; struct ctl_table *ctl_table; int sndbuf_size; int rcvbuf_size; }; /* All module specific customizations to the RDS-TCP socket should be done in * rds_tcp_tune() and applied after socket creation. */ bool rds_tcp_tune(struct socket *sock) { struct sock *sk = sock->sk; struct net *net = sock_net(sk); struct rds_tcp_net *rtn; tcp_sock_set_nodelay(sock->sk); lock_sock(sk); /* TCP timer functions might access net namespace even after * a process which created this net namespace terminated. */ if (!sk->sk_net_refcnt) { if (!maybe_get_net(net)) { release_sock(sk); return false; } /* Update ns_tracker to current stack trace and refcounted tracker */ __netns_tracker_free(net, &sk->ns_tracker, false); sk->sk_net_refcnt = 1; netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL); sock_inuse_add(net, 1); } rtn = net_generic(net, rds_tcp_netid); if (rtn->sndbuf_size > 0) { sk->sk_sndbuf = rtn->sndbuf_size; sk->sk_userlocks |= SOCK_SNDBUF_LOCK; } if (rtn->rcvbuf_size > 0) { sk->sk_rcvbuf = rtn->rcvbuf_size; sk->sk_userlocks |= SOCK_RCVBUF_LOCK; } release_sock(sk); return true; } static void rds_tcp_accept_worker(struct work_struct *work) { struct rds_tcp_net *rtn = container_of(work, struct rds_tcp_net, rds_tcp_accept_w); while (rds_tcp_accept_one(rtn->rds_tcp_listen_sock) == 0) cond_resched(); } void rds_tcp_accept_work(struct sock *sk) { struct net *net = sock_net(sk); struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); queue_work(rds_wq, &rtn->rds_tcp_accept_w); } static __net_init int rds_tcp_init_net(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); struct ctl_table *tbl; int err = 0; memset(rtn, 0, sizeof(*rtn)); /* {snd, rcv}buf_size default to 0, which implies we let the * stack pick the value, and permit auto-tuning of buffer size. */ if (net == &init_net) { tbl = rds_tcp_sysctl_table; } else { tbl = kmemdup(rds_tcp_sysctl_table, sizeof(rds_tcp_sysctl_table), GFP_KERNEL); if (!tbl) { pr_warn("could not set allocate sysctl table\n"); return -ENOMEM; } rtn->ctl_table = tbl; } tbl[RDS_TCP_SNDBUF].data = &rtn->sndbuf_size; tbl[RDS_TCP_RCVBUF].data = &rtn->rcvbuf_size; rtn->rds_tcp_sysctl = register_net_sysctl_sz(net, "net/rds/tcp", tbl, ARRAY_SIZE(rds_tcp_sysctl_table)); if (!rtn->rds_tcp_sysctl) { pr_warn("could not register sysctl\n"); err = -ENOMEM; goto fail; } #if IS_ENABLED(CONFIG_IPV6) rtn->rds_tcp_listen_sock = rds_tcp_listen_init(net, true); #else rtn->rds_tcp_listen_sock = rds_tcp_listen_init(net, false); #endif if (!rtn->rds_tcp_listen_sock) { pr_warn("could not set up IPv6 listen sock\n"); #if IS_ENABLED(CONFIG_IPV6) /* Try IPv4 as some systems disable IPv6 */ rtn->rds_tcp_listen_sock = rds_tcp_listen_init(net, false); if (!rtn->rds_tcp_listen_sock) { #endif unregister_net_sysctl_table(rtn->rds_tcp_sysctl); rtn->rds_tcp_sysctl = NULL; err = -EAFNOSUPPORT; goto fail; #if IS_ENABLED(CONFIG_IPV6) } #endif } INIT_WORK(&rtn->rds_tcp_accept_w, rds_tcp_accept_worker); return 0; fail: if (net != &init_net) kfree(tbl); return err; } static void rds_tcp_kill_sock(struct net *net) { struct rds_tcp_connection *tc, *_tc; LIST_HEAD(tmp_list); struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); struct socket *lsock = rtn->rds_tcp_listen_sock; rtn->rds_tcp_listen_sock = NULL; rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net) continue; if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list); } else { list_del(&tc->t_tcp_node); tc->t_tcp_node_detached = true; } } spin_unlock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) rds_conn_destroy(tc->t_cpath->cp_conn); } static void __net_exit rds_tcp_exit_net(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); rds_tcp_kill_sock(net); if (rtn->rds_tcp_sysctl) unregister_net_sysctl_table(rtn->rds_tcp_sysctl); if (net != &init_net) kfree(rtn->ctl_table); } static struct pernet_operations rds_tcp_net_ops = { .init = rds_tcp_init_net, .exit = rds_tcp_exit_net, .id = &rds_tcp_netid, .size = sizeof(struct rds_tcp_net), }; void *rds_tcp_listen_sock_def_readable(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); struct socket *lsock = rtn->rds_tcp_listen_sock; if (!lsock) return NULL; return lsock->sk->sk_user_data; } /* when sysctl is used to modify some kernel socket parameters,this * function resets the RDS connections in that netns so that we can * restart with new parameters. The assumption is that such reset * events are few and far-between. */ static void rds_tcp_sysctl_reset(struct net *net) { struct rds_tcp_connection *tc, *_tc; spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; /* reconnect with new parameters */ rds_conn_path_drop(tc->t_cpath, false); } spin_unlock_irq(&rds_tcp_conn_lock); } static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *fpos) { struct net *net = current->nsproxy->net_ns; int err; err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos); if (err < 0) { pr_warn("Invalid input. Must be >= %d\n", *(int *)(ctl->extra1)); return err; } if (write) rds_tcp_sysctl_reset(net); return 0; } static void rds_tcp_exit(void) { rds_tcp_set_unloading(); synchronize_rcu(); rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); #if IS_ENABLED(CONFIG_IPV6) rds_info_deregister_func(RDS6_INFO_TCP_SOCKETS, rds6_tcp_tc_info); #endif unregister_pernet_device(&rds_tcp_net_ops); rds_tcp_destroy_conns(); rds_trans_unregister(&rds_tcp_transport); rds_tcp_recv_exit(); kmem_cache_destroy(rds_tcp_conn_slab); } module_exit(rds_tcp_exit); static int __init rds_tcp_init(void) { int ret; rds_tcp_conn_slab = KMEM_CACHE(rds_tcp_connection, 0); if (!rds_tcp_conn_slab) { ret = -ENOMEM; goto out; } ret = rds_tcp_recv_init(); if (ret) goto out_slab; ret = register_pernet_device(&rds_tcp_net_ops); if (ret) goto out_recv; rds_trans_register(&rds_tcp_transport); rds_info_register_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); #if IS_ENABLED(CONFIG_IPV6) rds_info_register_func(RDS6_INFO_TCP_SOCKETS, rds6_tcp_tc_info); #endif goto out; out_recv: rds_tcp_recv_exit(); out_slab: kmem_cache_destroy(rds_tcp_conn_slab); out: return ret; } module_init(rds_tcp_init); MODULE_AUTHOR("Oracle Corporation <rds-devel@oss.oracle.com>"); MODULE_DESCRIPTION("RDS: TCP transport"); MODULE_LICENSE("Dual BSD/GPL"); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __KVM_FPU_H_ #define __KVM_FPU_H_ #include <asm/fpu/api.h> typedef u32 __attribute__((vector_size(16))) sse128_t; #define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; } #define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; }) #define sse128_hi(x) ({ __sse128_u t; t.vec = x; t.as_u64[1]; }) #define sse128_l0(x) ({ __sse128_u t; t.vec = x; t.as_u32[0]; }) #define sse128_l1(x) ({ __sse128_u t; t.vec = x; t.as_u32[1]; }) #define sse128_l2(x) ({ __sse128_u t; t.vec = x; t.as_u32[2]; }) #define sse128_l3(x) ({ __sse128_u t; t.vec = x; t.as_u32[3]; }) #define sse128(lo, hi) ({ __sse128_u t; t.as_u64[0] = lo; t.as_u64[1] = hi; t.vec; }) static inline void _kvm_read_sse_reg(int reg, sse128_t *data) { switch (reg) { case 0: asm("movdqa %%xmm0, %0" : "=m"(*data)); break; case 1: asm("movdqa %%xmm1, %0" : "=m"(*data)); break; case 2: asm("movdqa %%xmm2, %0" : "=m"(*data)); break; case 3: asm("movdqa %%xmm3, %0" : "=m"(*data)); break; case 4: asm("movdqa %%xmm4, %0" : "=m"(*data)); break; case 5: asm("movdqa %%xmm5, %0" : "=m"(*data)); break; case 6: asm("movdqa %%xmm6, %0" : "=m"(*data)); break; case 7: asm("movdqa %%xmm7, %0" : "=m"(*data)); break; #ifdef CONFIG_X86_64 case 8: asm("movdqa %%xmm8, %0" : "=m"(*data)); break; case 9: asm("movdqa %%xmm9, %0" : "=m"(*data)); break; case 10: asm("movdqa %%xmm10, %0" : "=m"(*data)); break; case 11: asm("movdqa %%xmm11, %0" : "=m"(*data)); break; case 12: asm("movdqa %%xmm12, %0" : "=m"(*data)); break; case 13: asm("movdqa %%xmm13, %0" : "=m"(*data)); break; case 14: asm("movdqa %%xmm14, %0" : "=m"(*data)); break; case 15: asm("movdqa %%xmm15, %0" : "=m"(*data)); break; #endif default: BUG(); } } static inline void _kvm_write_sse_reg(int reg, const sse128_t *data) { switch (reg) { case 0: asm("movdqa %0, %%xmm0" : : "m"(*data)); break; case 1: asm("movdqa %0, %%xmm1" : : "m"(*data)); break; case 2: asm("movdqa %0, %%xmm2" : : "m"(*data)); break; case 3: asm("movdqa %0, %%xmm3" : : "m"(*data)); break; case 4: asm("movdqa %0, %%xmm4" : : "m"(*data)); break; case 5: asm("movdqa %0, %%xmm5" : : "m"(*data)); break; case 6: asm("movdqa %0, %%xmm6" : : "m"(*data)); break; case 7: asm("movdqa %0, %%xmm7" : : "m"(*data)); break; #ifdef CONFIG_X86_64 case 8: asm("movdqa %0, %%xmm8" : : "m"(*data)); break; case 9: asm("movdqa %0, %%xmm9" : : "m"(*data)); break; case 10: asm("movdqa %0, %%xmm10" : : "m"(*data)); break; case 11: asm("movdqa %0, %%xmm11" : : "m"(*data)); break; case 12: asm("movdqa %0, %%xmm12" : : "m"(*data)); break; case 13: asm("movdqa %0, %%xmm13" : : "m"(*data)); break; case 14: asm("movdqa %0, %%xmm14" : : "m"(*data)); break; case 15: asm("movdqa %0, %%xmm15" : : "m"(*data)); break; #endif default: BUG(); } } static inline void _kvm_read_mmx_reg(int reg, u64 *data) { switch (reg) { case 0: asm("movq %%mm0, %0" : "=m"(*data)); break; case 1: asm("movq %%mm1, %0" : "=m"(*data)); break; case 2: asm("movq %%mm2, %0" : "=m"(*data)); break; case 3: asm("movq %%mm3, %0" : "=m"(*data)); break; case 4: asm("movq %%mm4, %0" : "=m"(*data)); break; case 5: asm("movq %%mm5, %0" : "=m"(*data)); break; case 6: asm("movq %%mm6, %0" : "=m"(*data)); break; case 7: asm("movq %%mm7, %0" : "=m"(*data)); break; default: BUG(); } } static inline void _kvm_write_mmx_reg(int reg, const u64 *data) { switch (reg) { case 0: asm("movq %0, %%mm0" : : "m"(*data)); break; case 1: asm("movq %0, %%mm1" : : "m"(*data)); break; case 2: asm("movq %0, %%mm2" : : "m"(*data)); break; case 3: asm("movq %0, %%mm3" : : "m"(*data)); break; case 4: asm("movq %0, %%mm4" : : "m"(*data)); break; case 5: asm("movq %0, %%mm5" : : "m"(*data)); break; case 6: asm("movq %0, %%mm6" : : "m"(*data)); break; case 7: asm("movq %0, %%mm7" : : "m"(*data)); break; default: BUG(); } } static inline void kvm_fpu_get(void) { fpregs_lock(); fpregs_assert_state_consistent(); if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return(); } static inline void kvm_fpu_put(void) { fpregs_unlock(); } static inline void kvm_read_sse_reg(int reg, sse128_t *data) { kvm_fpu_get(); _kvm_read_sse_reg(reg, data); kvm_fpu_put(); } static inline void kvm_write_sse_reg(int reg, const sse128_t *data) { kvm_fpu_get(); _kvm_write_sse_reg(reg, data); kvm_fpu_put(); } static inline void kvm_read_mmx_reg(int reg, u64 *data) { kvm_fpu_get(); _kvm_read_mmx_reg(reg, data); kvm_fpu_put(); } static inline void kvm_write_mmx_reg(int reg, const u64 *data) { kvm_fpu_get(); _kvm_write_mmx_reg(reg, data); kvm_fpu_put(); } #endif |
| 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 | // SPDX-License-Identifier: GPL-2.0 /* * Multipath support for RPC * * Copyright (c) 2015, 2016, Primary Data, Inc. All rights reserved. * * Trond Myklebust <trond.myklebust@primarydata.com> * */ #include <linux/atomic.h> #include <linux/types.h> #include <linux/kref.h> #include <linux/list.h> #include <linux/rcupdate.h> #include <linux/rculist.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/sunrpc/xprt.h> #include <linux/sunrpc/addr.h> #include <linux/sunrpc/xprtmultipath.h> #include "sysfs.h" typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct rpc_xprt_switch *xps, const struct rpc_xprt *cur); static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular; static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin; static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall; static const struct rpc_xprt_iter_ops rpc_xprt_iter_listoffline; static void xprt_switch_add_xprt_locked(struct rpc_xprt_switch *xps, struct rpc_xprt *xprt) { if (unlikely(xprt_get(xprt) == NULL)) return; list_add_tail_rcu(&xprt->xprt_switch, &xps->xps_xprt_list); smp_wmb(); if (xps->xps_nxprts == 0) xps->xps_net = xprt->xprt_net; xps->xps_nxprts++; xps->xps_nactive++; } /** * rpc_xprt_switch_add_xprt - Add a new rpc_xprt to an rpc_xprt_switch * @xps: pointer to struct rpc_xprt_switch * @xprt: pointer to struct rpc_xprt * * Adds xprt to the end of the list of struct rpc_xprt in xps. */ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps, struct rpc_xprt *xprt) { if (xprt == NULL) return; spin_lock(&xps->xps_lock); if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) xprt_switch_add_xprt_locked(xps, xprt); spin_unlock(&xps->xps_lock); rpc_sysfs_xprt_setup(xps, xprt, GFP_KERNEL); } static void xprt_switch_remove_xprt_locked(struct rpc_xprt_switch *xps, struct rpc_xprt *xprt, bool offline) { if (unlikely(xprt == NULL)) return; if (!test_bit(XPRT_OFFLINE, &xprt->state) && offline) xps->xps_nactive--; xps->xps_nxprts--; if (xps->xps_nxprts == 0) xps->xps_net = NULL; smp_wmb(); list_del_rcu(&xprt->xprt_switch); } /** * rpc_xprt_switch_remove_xprt - Removes an rpc_xprt from a rpc_xprt_switch * @xps: pointer to struct rpc_xprt_switch * @xprt: pointer to struct rpc_xprt * @offline: indicates if the xprt that's being removed is in an offline state * * Removes xprt from the list of struct rpc_xprt in xps. */ void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, struct rpc_xprt *xprt, bool offline) { spin_lock(&xps->xps_lock); xprt_switch_remove_xprt_locked(xps, xprt, offline); spin_unlock(&xps->xps_lock); xprt_put(xprt); } static DEFINE_IDA(rpc_xprtswitch_ids); void xprt_multipath_cleanup_ids(void) { ida_destroy(&rpc_xprtswitch_ids); } static int xprt_switch_alloc_id(struct rpc_xprt_switch *xps, gfp_t gfp_flags) { int id; id = ida_alloc(&rpc_xprtswitch_ids, gfp_flags); if (id < 0) return id; xps->xps_id = id; return 0; } static void xprt_switch_free_id(struct rpc_xprt_switch *xps) { ida_free(&rpc_xprtswitch_ids, xps->xps_id); } /** * xprt_switch_alloc - Allocate a new struct rpc_xprt_switch * @xprt: pointer to struct rpc_xprt * @gfp_flags: allocation flags * * On success, returns an initialised struct rpc_xprt_switch, containing * the entry xprt. Returns NULL on failure. */ struct rpc_xprt_switch *xprt_switch_alloc(struct rpc_xprt *xprt, gfp_t gfp_flags) { struct rpc_xprt_switch *xps; xps = kmalloc(sizeof(*xps), gfp_flags); if (xps != NULL) { spin_lock_init(&xps->xps_lock); kref_init(&xps->xps_kref); xprt_switch_alloc_id(xps, gfp_flags); xps->xps_nxprts = xps->xps_nactive = 0; atomic_long_set(&xps->xps_queuelen, 0); xps->xps_net = NULL; INIT_LIST_HEAD(&xps->xps_xprt_list); xps->xps_iter_ops = &rpc_xprt_iter_singular; rpc_sysfs_xprt_switch_setup(xps, xprt, gfp_flags); xprt_switch_add_xprt_locked(xps, xprt); xps->xps_nunique_destaddr_xprts = 1; rpc_sysfs_xprt_setup(xps, xprt, gfp_flags); } return xps; } static void xprt_switch_free_entries(struct rpc_xprt_switch *xps) { spin_lock(&xps->xps_lock); while (!list_empty(&xps->xps_xprt_list)) { struct rpc_xprt *xprt; xprt = list_first_entry(&xps->xps_xprt_list, struct rpc_xprt, xprt_switch); xprt_switch_remove_xprt_locked(xps, xprt, true); spin_unlock(&xps->xps_lock); xprt_put(xprt); spin_lock(&xps->xps_lock); } spin_unlock(&xps->xps_lock); } static void xprt_switch_free(struct kref *kref) { struct rpc_xprt_switch *xps = container_of(kref, struct rpc_xprt_switch, xps_kref); xprt_switch_free_entries(xps); rpc_sysfs_xprt_switch_destroy(xps); xprt_switch_free_id(xps); kfree_rcu(xps, xps_rcu); } /** * xprt_switch_get - Return a reference to a rpc_xprt_switch * @xps: pointer to struct rpc_xprt_switch * * Returns a reference to xps unless the refcount is already zero. */ struct rpc_xprt_switch *xprt_switch_get(struct rpc_xprt_switch *xps) { if (xps != NULL && kref_get_unless_zero(&xps->xps_kref)) return xps; return NULL; } /** * xprt_switch_put - Release a reference to a rpc_xprt_switch * @xps: pointer to struct rpc_xprt_switch * * Release the reference to xps, and free it once the refcount is zero. */ void xprt_switch_put(struct rpc_xprt_switch *xps) { if (xps != NULL) kref_put(&xps->xps_kref, xprt_switch_free); } /** * rpc_xprt_switch_set_roundrobin - Set a round-robin policy on rpc_xprt_switch * @xps: pointer to struct rpc_xprt_switch * * Sets a round-robin default policy for iterators acting on xps. */ void rpc_xprt_switch_set_roundrobin(struct rpc_xprt_switch *xps) { if (READ_ONCE(xps->xps_iter_ops) != &rpc_xprt_iter_roundrobin) WRITE_ONCE(xps->xps_iter_ops, &rpc_xprt_iter_roundrobin); } static const struct rpc_xprt_iter_ops *xprt_iter_ops(const struct rpc_xprt_iter *xpi) { if (xpi->xpi_ops != NULL) return xpi->xpi_ops; return rcu_dereference(xpi->xpi_xpswitch)->xps_iter_ops; } static void xprt_iter_no_rewind(struct rpc_xprt_iter *xpi) { } static void xprt_iter_default_rewind(struct rpc_xprt_iter *xpi) { WRITE_ONCE(xpi->xpi_cursor, NULL); } static bool xprt_is_active(const struct rpc_xprt *xprt) { return (kref_read(&xprt->kref) != 0 && !test_bit(XPRT_OFFLINE, &xprt->state)); } static struct rpc_xprt *xprt_switch_find_first_entry(struct list_head *head) { struct rpc_xprt *pos; list_for_each_entry_rcu(pos, head, xprt_switch) { if (xprt_is_active(pos)) return pos; } return NULL; } static struct rpc_xprt *xprt_switch_find_first_entry_offline(struct list_head *head) { struct rpc_xprt *pos; list_for_each_entry_rcu(pos, head, xprt_switch) { if (!xprt_is_active(pos)) return pos; } return NULL; } static struct rpc_xprt *xprt_iter_first_entry(struct rpc_xprt_iter *xpi) { struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); if (xps == NULL) return NULL; return xprt_switch_find_first_entry(&xps->xps_xprt_list); } static struct rpc_xprt *_xprt_switch_find_current_entry(struct list_head *head, const struct rpc_xprt *cur, bool find_active) { struct rpc_xprt *pos; bool found = false; list_for_each_entry_rcu(pos, head, xprt_switch) { if (cur == pos) found = true; if (found && ((find_active && xprt_is_active(pos)) || (!find_active && !xprt_is_active(pos)))) return pos; } return NULL; } static struct rpc_xprt *xprt_switch_find_current_entry(struct list_head *head, const struct rpc_xprt *cur) { return _xprt_switch_find_current_entry(head, cur, true); } static struct rpc_xprt * _xprt_iter_current_entry(struct rpc_xprt_iter *xpi, struct rpc_xprt *first_entry(struct list_head *head), struct rpc_xprt *current_entry(struct list_head *head, const struct rpc_xprt *cur)) { struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); struct list_head *head; if (xps == NULL) return NULL; head = &xps->xps_xprt_list; if (xpi->xpi_cursor == NULL || xps->xps_nxprts < 2) return first_entry(head); return current_entry(head, xpi->xpi_cursor); } static struct rpc_xprt *xprt_iter_current_entry(struct rpc_xprt_iter *xpi) { return _xprt_iter_current_entry(xpi, xprt_switch_find_first_entry, xprt_switch_find_current_entry); } static struct rpc_xprt *xprt_switch_find_current_entry_offline(struct list_head *head, const struct rpc_xprt *cur) { return _xprt_switch_find_current_entry(head, cur, false); } static struct rpc_xprt *xprt_iter_current_entry_offline(struct rpc_xprt_iter *xpi) { return _xprt_iter_current_entry(xpi, xprt_switch_find_first_entry_offline, xprt_switch_find_current_entry_offline); } static bool __rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, const struct sockaddr *sap) { struct list_head *head; struct rpc_xprt *pos; if (xps == NULL || sap == NULL) return false; head = &xps->xps_xprt_list; list_for_each_entry_rcu(pos, head, xprt_switch) { if (rpc_cmp_addr_port(sap, (struct sockaddr *)&pos->addr)) { pr_info("RPC: addr %s already in xprt switch\n", pos->address_strings[RPC_DISPLAY_ADDR]); return true; } } return false; } bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps, const struct sockaddr *sap) { bool res; rcu_read_lock(); res = __rpc_xprt_switch_has_addr(xps, sap); rcu_read_unlock(); return res; } static struct rpc_xprt *xprt_switch_find_next_entry(struct list_head *head, const struct rpc_xprt *cur, bool check_active) { struct rpc_xprt *pos, *prev = NULL; bool found = false; list_for_each_entry_rcu(pos, head, xprt_switch) { if (cur == prev) found = true; /* for request to return active transports return only * active, for request to return offline transports * return only offline */ if (found && ((check_active && xprt_is_active(pos)) || (!check_active && !xprt_is_active(pos)))) return pos; prev = pos; } return NULL; } static struct rpc_xprt *xprt_switch_set_next_cursor(struct rpc_xprt_switch *xps, struct rpc_xprt **cursor, xprt_switch_find_xprt_t find_next) { struct rpc_xprt *pos, *old; old = smp_load_acquire(cursor); pos = find_next(xps, old); smp_store_release(cursor, pos); return pos; } static struct rpc_xprt *xprt_iter_next_entry_multiple(struct rpc_xprt_iter *xpi, xprt_switch_find_xprt_t find_next) { struct rpc_xprt_switch *xps = rcu_dereference(xpi->xpi_xpswitch); if (xps == NULL) return NULL; return xprt_switch_set_next_cursor(xps, &xpi->xpi_cursor, find_next); } static struct rpc_xprt *__xprt_switch_find_next_entry_roundrobin(struct list_head *head, const struct rpc_xprt *cur) { struct rpc_xprt *ret; ret = xprt_switch_find_next_entry(head, cur, true); if (ret != NULL) return ret; return xprt_switch_find_first_entry(head); } static struct rpc_xprt *xprt_switch_find_next_entry_roundrobin(struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) { struct list_head *head = &xps->xps_xprt_list; struct rpc_xprt *xprt; unsigned int nactive; for (;;) { unsigned long xprt_queuelen, xps_queuelen; xprt = __xprt_switch_find_next_entry_roundrobin(head, cur); if (!xprt) break; xprt_queuelen = atomic_long_read(&xprt->queuelen); xps_queuelen = atomic_long_read(&xps->xps_queuelen); nactive = READ_ONCE(xps->xps_nactive); /* Exit loop if xprt_queuelen <= average queue length */ if (xprt_queuelen * nactive <= xps_queuelen) break; cur = xprt; } return xprt; } static struct rpc_xprt *xprt_iter_next_entry_roundrobin(struct rpc_xprt_iter *xpi) { return xprt_iter_next_entry_multiple(xpi, xprt_switch_find_next_entry_roundrobin); } static struct rpc_xprt *xprt_switch_find_next_entry_all(struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) { return xprt_switch_find_next_entry(&xps->xps_xprt_list, cur, true); } static struct rpc_xprt *xprt_switch_find_next_entry_offline(struct rpc_xprt_switch *xps, const struct rpc_xprt *cur) { return xprt_switch_find_next_entry(&xps->xps_xprt_list, cur, false); } static struct rpc_xprt *xprt_iter_next_entry_all(struct rpc_xprt_iter *xpi) { return xprt_iter_next_entry_multiple(xpi, xprt_switch_find_next_entry_all); } static struct rpc_xprt *xprt_iter_next_entry_offline(struct rpc_xprt_iter *xpi) { return xprt_iter_next_entry_multiple(xpi, xprt_switch_find_next_entry_offline); } /* * xprt_iter_rewind - Resets the xprt iterator * @xpi: pointer to rpc_xprt_iter * * Resets xpi to ensure that it points to the first entry in the list * of transports. */ void xprt_iter_rewind(struct rpc_xprt_iter *xpi) { rcu_read_lock(); xprt_iter_ops(xpi)->xpi_rewind(xpi); rcu_read_unlock(); } static void __xprt_iter_init(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps, const struct rpc_xprt_iter_ops *ops) { rcu_assign_pointer(xpi->xpi_xpswitch, xprt_switch_get(xps)); xpi->xpi_cursor = NULL; xpi->xpi_ops = ops; } /** * xprt_iter_init - Initialise an xprt iterator * @xpi: pointer to rpc_xprt_iter * @xps: pointer to rpc_xprt_switch * * Initialises the iterator to use the default iterator ops * as set in xps. This function is mainly intended for internal * use in the rpc_client. */ void xprt_iter_init(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps) { __xprt_iter_init(xpi, xps, NULL); } /** * xprt_iter_init_listall - Initialise an xprt iterator * @xpi: pointer to rpc_xprt_iter * @xps: pointer to rpc_xprt_switch * * Initialises the iterator to iterate once through the entire list * of entries in xps. */ void xprt_iter_init_listall(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps) { __xprt_iter_init(xpi, xps, &rpc_xprt_iter_listall); } void xprt_iter_init_listoffline(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps) { __xprt_iter_init(xpi, xps, &rpc_xprt_iter_listoffline); } /** * xprt_iter_xchg_switch - Atomically swap out the rpc_xprt_switch * @xpi: pointer to rpc_xprt_iter * @newswitch: pointer to a new rpc_xprt_switch or NULL * * Swaps out the existing xpi->xpi_xpswitch with a new value. */ struct rpc_xprt_switch *xprt_iter_xchg_switch(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *newswitch) { struct rpc_xprt_switch __rcu *oldswitch; /* Atomically swap out the old xpswitch */ oldswitch = xchg(&xpi->xpi_xpswitch, RCU_INITIALIZER(newswitch)); if (newswitch != NULL) xprt_iter_rewind(xpi); return rcu_dereference_protected(oldswitch, true); } /** * xprt_iter_destroy - Destroys the xprt iterator * @xpi: pointer to rpc_xprt_iter */ void xprt_iter_destroy(struct rpc_xprt_iter *xpi) { xprt_switch_put(xprt_iter_xchg_switch(xpi, NULL)); } /** * xprt_iter_xprt - Returns the rpc_xprt pointed to by the cursor * @xpi: pointer to rpc_xprt_iter * * Returns a pointer to the struct rpc_xprt that is currently * pointed to by the cursor. * Caller must be holding rcu_read_lock(). */ struct rpc_xprt *xprt_iter_xprt(struct rpc_xprt_iter *xpi) { WARN_ON_ONCE(!rcu_read_lock_held()); return xprt_iter_ops(xpi)->xpi_xprt(xpi); } static struct rpc_xprt *xprt_iter_get_helper(struct rpc_xprt_iter *xpi, struct rpc_xprt *(*fn)(struct rpc_xprt_iter *)) { struct rpc_xprt *ret; do { ret = fn(xpi); if (ret == NULL) break; ret = xprt_get(ret); } while (ret == NULL); return ret; } /** * xprt_iter_get_xprt - Returns the rpc_xprt pointed to by the cursor * @xpi: pointer to rpc_xprt_iter * * Returns a reference to the struct rpc_xprt that is currently * pointed to by the cursor. */ struct rpc_xprt *xprt_iter_get_xprt(struct rpc_xprt_iter *xpi) { struct rpc_xprt *xprt; rcu_read_lock(); xprt = xprt_iter_get_helper(xpi, xprt_iter_ops(xpi)->xpi_xprt); rcu_read_unlock(); return xprt; } /** * xprt_iter_get_next - Returns the next rpc_xprt following the cursor * @xpi: pointer to rpc_xprt_iter * * Returns a reference to the struct rpc_xprt that immediately follows the * entry pointed to by the cursor. */ struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi) { struct rpc_xprt *xprt; rcu_read_lock(); xprt = xprt_iter_get_helper(xpi, xprt_iter_ops(xpi)->xpi_next); rcu_read_unlock(); return xprt; } /* Policy for always returning the first entry in the rpc_xprt_switch */ static const struct rpc_xprt_iter_ops rpc_xprt_iter_singular = { .xpi_rewind = xprt_iter_no_rewind, .xpi_xprt = xprt_iter_first_entry, .xpi_next = xprt_iter_first_entry, }; /* Policy for round-robin iteration of entries in the rpc_xprt_switch */ static const struct rpc_xprt_iter_ops rpc_xprt_iter_roundrobin = { .xpi_rewind = xprt_iter_default_rewind, .xpi_xprt = xprt_iter_current_entry, .xpi_next = xprt_iter_next_entry_roundrobin, }; /* Policy for once-through iteration of entries in the rpc_xprt_switch */ static const struct rpc_xprt_iter_ops rpc_xprt_iter_listall = { .xpi_rewind = xprt_iter_default_rewind, .xpi_xprt = xprt_iter_current_entry, .xpi_next = xprt_iter_next_entry_all, }; static const struct rpc_xprt_iter_ops rpc_xprt_iter_listoffline = { .xpi_rewind = xprt_iter_default_rewind, .xpi_xprt = xprt_iter_current_entry_offline, .xpi_next = xprt_iter_next_entry_offline, }; |
| 11 11 4 7 7 10 4 2 2 2 3 2 2 2 1 1 2 4 1 1 3 1 1 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 | // SPDX-License-Identifier: GPL-2.0 /* * cfg80211 wext compat for managed mode. * * Copyright 2009 Johannes Berg <johannes@sipsolutions.net> * Copyright (C) 2009, 2020-2023 Intel Corporation */ #include <linux/export.h> #include <linux/etherdevice.h> #include <linux/if_arp.h> #include <linux/slab.h> #include <net/cfg80211.h> #include <net/cfg80211-wext.h> #include "wext-compat.h" #include "nl80211.h" int cfg80211_mgd_wext_connect(struct cfg80211_registered_device *rdev, struct wireless_dev *wdev) { struct cfg80211_cached_keys *ck = NULL; const u8 *prev_bssid = NULL; int err, i; ASSERT_RTNL(); lockdep_assert_wiphy(wdev->wiphy); if (!netif_running(wdev->netdev)) return 0; wdev->wext.connect.ie = wdev->wext.ie; wdev->wext.connect.ie_len = wdev->wext.ie_len; /* Use default background scan period */ wdev->wext.connect.bg_scan_period = -1; if (wdev->wext.keys) { wdev->wext.keys->def = wdev->wext.default_key; if (wdev->wext.default_key != -1) wdev->wext.connect.privacy = true; } if (!wdev->wext.connect.ssid_len) return 0; if (wdev->wext.keys && wdev->wext.keys->def != -1) { ck = kmemdup(wdev->wext.keys, sizeof(*ck), GFP_KERNEL); if (!ck) return -ENOMEM; for (i = 0; i < 4; i++) ck->params[i].key = ck->data[i]; } if (wdev->wext.prev_bssid_valid) prev_bssid = wdev->wext.prev_bssid; err = cfg80211_connect(rdev, wdev->netdev, &wdev->wext.connect, ck, prev_bssid); if (err) kfree_sensitive(ck); return err; } int cfg80211_mgd_wext_siwfreq(struct net_device *dev, struct iw_request_info *info, struct iw_freq *wextfreq, char *extra) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); struct ieee80211_channel *chan = NULL; int err, freq; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; freq = cfg80211_wext_freq(wextfreq); if (freq < 0) return freq; if (freq) { chan = ieee80211_get_channel(wdev->wiphy, freq); if (!chan) return -EINVAL; if (chan->flags & IEEE80211_CHAN_DISABLED) return -EINVAL; } if (wdev->conn) { bool event = true; if (wdev->wext.connect.channel == chan) return 0; /* if SSID set, we'll try right again, avoid event */ if (wdev->wext.connect.ssid_len) event = false; err = cfg80211_disconnect(rdev, dev, WLAN_REASON_DEAUTH_LEAVING, event); if (err) return err; } wdev->wext.connect.channel = chan; return cfg80211_mgd_wext_connect(rdev, wdev); } int cfg80211_mgd_wext_giwfreq(struct net_device *dev, struct iw_request_info *info, struct iw_freq *freq, char *extra) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct ieee80211_channel *chan = NULL; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; if (wdev->valid_links) return -EOPNOTSUPP; if (wdev->links[0].client.current_bss) chan = wdev->links[0].client.current_bss->pub.channel; else if (wdev->wext.connect.channel) chan = wdev->wext.connect.channel; if (chan) { freq->m = chan->center_freq; freq->e = 6; return 0; } /* no channel if not joining */ return -EINVAL; } int cfg80211_mgd_wext_siwessid(struct net_device *dev, struct iw_request_info *info, struct iw_point *data, char *ssid) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); size_t len = data->length; int err; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; if (!data->flags) len = 0; /* iwconfig uses nul termination in SSID.. */ if (len > 0 && ssid[len - 1] == '\0') len--; if (wdev->conn) { bool event = true; if (wdev->wext.connect.ssid && len && len == wdev->wext.connect.ssid_len && memcmp(wdev->wext.connect.ssid, ssid, len) == 0) return 0; /* if SSID set now, we'll try to connect, avoid event */ if (len) event = false; err = cfg80211_disconnect(rdev, dev, WLAN_REASON_DEAUTH_LEAVING, event); if (err) return err; } wdev->wext.prev_bssid_valid = false; wdev->wext.connect.ssid = wdev->wext.ssid; memcpy(wdev->wext.ssid, ssid, len); wdev->wext.connect.ssid_len = len; wdev->wext.connect.crypto.control_port = false; wdev->wext.connect.crypto.control_port_ethertype = cpu_to_be16(ETH_P_PAE); return cfg80211_mgd_wext_connect(rdev, wdev); } int cfg80211_mgd_wext_giwessid(struct net_device *dev, struct iw_request_info *info, struct iw_point *data, char *ssid) { struct wireless_dev *wdev = dev->ieee80211_ptr; int ret = 0; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; if (wdev->valid_links) return -EINVAL; data->flags = 0; if (wdev->links[0].client.current_bss) { const struct element *ssid_elem; rcu_read_lock(); ssid_elem = ieee80211_bss_get_elem( &wdev->links[0].client.current_bss->pub, WLAN_EID_SSID); if (ssid_elem) { data->flags = 1; data->length = ssid_elem->datalen; if (data->length > IW_ESSID_MAX_SIZE) ret = -EINVAL; else memcpy(ssid, ssid_elem->data, data->length); } rcu_read_unlock(); } else if (wdev->wext.connect.ssid && wdev->wext.connect.ssid_len) { data->flags = 1; data->length = wdev->wext.connect.ssid_len; memcpy(ssid, wdev->wext.connect.ssid, data->length); } return ret; } int cfg80211_mgd_wext_siwap(struct net_device *dev, struct iw_request_info *info, struct sockaddr *ap_addr, char *extra) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); u8 *bssid = ap_addr->sa_data; int err; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; if (ap_addr->sa_family != ARPHRD_ETHER) return -EINVAL; /* automatic mode */ if (is_zero_ether_addr(bssid) || is_broadcast_ether_addr(bssid)) bssid = NULL; if (wdev->conn) { /* both automatic */ if (!bssid && !wdev->wext.connect.bssid) return 0; /* fixed already - and no change */ if (wdev->wext.connect.bssid && bssid && ether_addr_equal(bssid, wdev->wext.connect.bssid)) return 0; err = cfg80211_disconnect(rdev, dev, WLAN_REASON_DEAUTH_LEAVING, false); if (err) return err; } if (bssid) { memcpy(wdev->wext.bssid, bssid, ETH_ALEN); wdev->wext.connect.bssid = wdev->wext.bssid; } else wdev->wext.connect.bssid = NULL; return cfg80211_mgd_wext_connect(rdev, wdev); } int cfg80211_mgd_wext_giwap(struct net_device *dev, struct iw_request_info *info, struct sockaddr *ap_addr, char *extra) { struct wireless_dev *wdev = dev->ieee80211_ptr; /* call only for station! */ if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION)) return -EINVAL; ap_addr->sa_family = ARPHRD_ETHER; if (wdev->valid_links) return -EOPNOTSUPP; if (wdev->links[0].client.current_bss) memcpy(ap_addr->sa_data, wdev->links[0].client.current_bss->pub.bssid, ETH_ALEN); else eth_zero_addr(ap_addr->sa_data); return 0; } int cfg80211_wext_siwgenie(struct net_device *dev, struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { struct iw_point *data = &wrqu->data; struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); u8 *ie = extra; int ie_len = data->length, err; if (wdev->iftype != NL80211_IFTYPE_STATION) return -EOPNOTSUPP; if (!ie_len) ie = NULL; wiphy_lock(wdev->wiphy); /* no change */ err = 0; if (wdev->wext.ie_len == ie_len && memcmp(wdev->wext.ie, ie, ie_len) == 0) goto out; if (ie_len) { ie = kmemdup(extra, ie_len, GFP_KERNEL); if (!ie) { err = -ENOMEM; goto out; } } else ie = NULL; kfree(wdev->wext.ie); wdev->wext.ie = ie; wdev->wext.ie_len = ie_len; if (wdev->conn) { err = cfg80211_disconnect(rdev, dev, WLAN_REASON_DEAUTH_LEAVING, false); if (err) goto out; } /* userspace better not think we'll reconnect */ err = 0; out: wiphy_unlock(wdev->wiphy); return err; } int cfg80211_wext_siwmlme(struct net_device *dev, struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct iw_mlme *mlme = (struct iw_mlme *)extra; struct cfg80211_registered_device *rdev; int err; if (!wdev) return -EOPNOTSUPP; rdev = wiphy_to_rdev(wdev->wiphy); if (wdev->iftype != NL80211_IFTYPE_STATION) return -EINVAL; if (mlme->addr.sa_family != ARPHRD_ETHER) return -EINVAL; wiphy_lock(&rdev->wiphy); switch (mlme->cmd) { case IW_MLME_DEAUTH: case IW_MLME_DISASSOC: err = cfg80211_disconnect(rdev, dev, mlme->reason_code, true); break; default: err = -EOPNOTSUPP; break; } wiphy_unlock(&rdev->wiphy); return err; } |
| 3 21 3 25 21 9 21 9 1 7 4 1 1 1 1 6 3 3 33 30 3 1 1 27 1 24 1 1 1 8 6 1 1 2 1 1 5 5 1 1 1 1 1 26 3 26 26 3 3 26 1 9 26 25 1 25 1 25 1 1 24 24 1 1 17 8 16 1 15 1 16 16 16 16 16 6 10 26 11 9 6 9 9 9 9 9 8 1 1 1 9 9 9 9 9 9 9 9 3 3 3 3 33 7 26 17 6 3 3 3 3 9 35 1 1 37 37 46 45 1 1 32 2 30 11 20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2007-2017 Nicira, Inc. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include "flow.h" #include "datapath.h" #include <linux/uaccess.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/if_ether.h> #include <linux/if_vlan.h> #include <net/llc_pdu.h> #include <linux/kernel.h> #include <linux/jhash.h> #include <linux/jiffies.h> #include <linux/llc.h> #include <linux/module.h> #include <linux/in.h> #include <linux/rcupdate.h> #include <linux/if_arp.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/sctp.h> #include <linux/tcp.h> #include <linux/udp.h> #include <linux/icmp.h> #include <linux/icmpv6.h> #include <linux/rculist.h> #include <net/geneve.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/ndisc.h> #include <net/mpls.h> #include <net/vxlan.h> #include <net/tun_proto.h> #include <net/erspan.h> #include "drop.h" #include "flow_netlink.h" struct ovs_len_tbl { int len; const struct ovs_len_tbl *next; }; #define OVS_ATTR_NESTED -1 #define OVS_ATTR_VARIABLE -2 #define OVS_COPY_ACTIONS_MAX_DEPTH 16 static bool actions_may_change_flow(const struct nlattr *actions) { struct nlattr *nla; int rem; nla_for_each_nested(nla, actions, rem) { u16 action = nla_type(nla); switch (action) { case OVS_ACTION_ATTR_OUTPUT: case OVS_ACTION_ATTR_RECIRC: case OVS_ACTION_ATTR_TRUNC: case OVS_ACTION_ATTR_USERSPACE: case OVS_ACTION_ATTR_DROP: case OVS_ACTION_ATTR_PSAMPLE: break; case OVS_ACTION_ATTR_CT: case OVS_ACTION_ATTR_CT_CLEAR: case OVS_ACTION_ATTR_HASH: case OVS_ACTION_ATTR_POP_ETH: case OVS_ACTION_ATTR_POP_MPLS: case OVS_ACTION_ATTR_POP_NSH: case OVS_ACTION_ATTR_POP_VLAN: case OVS_ACTION_ATTR_PUSH_ETH: case OVS_ACTION_ATTR_PUSH_MPLS: case OVS_ACTION_ATTR_PUSH_NSH: case OVS_ACTION_ATTR_PUSH_VLAN: case OVS_ACTION_ATTR_SAMPLE: case OVS_ACTION_ATTR_SET: case OVS_ACTION_ATTR_SET_MASKED: case OVS_ACTION_ATTR_METER: case OVS_ACTION_ATTR_CHECK_PKT_LEN: case OVS_ACTION_ATTR_ADD_MPLS: case OVS_ACTION_ATTR_DEC_TTL: default: return true; } } return false; } static void update_range(struct sw_flow_match *match, size_t offset, size_t size, bool is_mask) { struct sw_flow_key_range *range; size_t start = rounddown(offset, sizeof(long)); size_t end = roundup(offset + size, sizeof(long)); if (!is_mask) range = &match->range; else range = &match->mask->range; if (range->start == range->end) { range->start = start; range->end = end; return; } if (range->start > start) range->start = start; if (range->end < end) range->end = end; } #define SW_FLOW_KEY_PUT(match, field, value, is_mask) \ do { \ update_range(match, offsetof(struct sw_flow_key, field), \ sizeof((match)->key->field), is_mask); \ if (is_mask) \ (match)->mask->key.field = value; \ else \ (match)->key->field = value; \ } while (0) #define SW_FLOW_KEY_MEMCPY_OFFSET(match, offset, value_p, len, is_mask) \ do { \ update_range(match, offset, len, is_mask); \ if (is_mask) \ memcpy((u8 *)&(match)->mask->key + offset, value_p, \ len); \ else \ memcpy((u8 *)(match)->key + offset, value_p, len); \ } while (0) #define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \ SW_FLOW_KEY_MEMCPY_OFFSET(match, offsetof(struct sw_flow_key, field), \ value_p, len, is_mask) #define SW_FLOW_KEY_MEMSET_FIELD(match, field, value, is_mask) \ do { \ update_range(match, offsetof(struct sw_flow_key, field), \ sizeof((match)->key->field), is_mask); \ if (is_mask) \ memset((u8 *)&(match)->mask->key.field, value, \ sizeof((match)->mask->key.field)); \ else \ memset((u8 *)&(match)->key->field, value, \ sizeof((match)->key->field)); \ } while (0) #define SW_FLOW_KEY_BITMAP_COPY(match, field, value_p, nbits, is_mask) ({ \ update_range(match, offsetof(struct sw_flow_key, field), \ bitmap_size(nbits), is_mask); \ bitmap_copy(is_mask ? (match)->mask->key.field : (match)->key->field, \ value_p, nbits); \ }) static bool match_validate(const struct sw_flow_match *match, u64 key_attrs, u64 mask_attrs, bool log) { u64 key_expected = 0; u64 mask_allowed = key_attrs; /* At most allow all key attributes */ /* The following mask attributes allowed only if they * pass the validation tests. */ mask_allowed &= ~((1 << OVS_KEY_ATTR_IPV4) | (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4) | (1 << OVS_KEY_ATTR_IPV6) | (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6) | (1 << OVS_KEY_ATTR_TCP) | (1 << OVS_KEY_ATTR_TCP_FLAGS) | (1 << OVS_KEY_ATTR_UDP) | (1 << OVS_KEY_ATTR_SCTP) | (1 << OVS_KEY_ATTR_ICMP) | (1 << OVS_KEY_ATTR_ICMPV6) | (1 << OVS_KEY_ATTR_ARP) | (1 << OVS_KEY_ATTR_ND) | (1 << OVS_KEY_ATTR_MPLS) | (1 << OVS_KEY_ATTR_NSH)); /* Always allowed mask fields. */ mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL) | (1 << OVS_KEY_ATTR_IN_PORT) | (1 << OVS_KEY_ATTR_ETHERTYPE)); /* Check key attributes. */ if (match->key->eth.type == htons(ETH_P_ARP) || match->key->eth.type == htons(ETH_P_RARP)) { key_expected |= 1 << OVS_KEY_ATTR_ARP; if (match->mask && (match->mask->key.eth.type == htons(0xffff))) mask_allowed |= 1 << OVS_KEY_ATTR_ARP; } if (eth_p_mpls(match->key->eth.type)) { key_expected |= 1 << OVS_KEY_ATTR_MPLS; if (match->mask && (match->mask->key.eth.type == htons(0xffff))) mask_allowed |= 1 << OVS_KEY_ATTR_MPLS; } if (match->key->eth.type == htons(ETH_P_IP)) { key_expected |= 1 << OVS_KEY_ATTR_IPV4; if (match->mask && match->mask->key.eth.type == htons(0xffff)) { mask_allowed |= 1 << OVS_KEY_ATTR_IPV4; mask_allowed |= 1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4; } if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { if (match->key->ip.proto == IPPROTO_UDP) { key_expected |= 1 << OVS_KEY_ATTR_UDP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_UDP; } if (match->key->ip.proto == IPPROTO_SCTP) { key_expected |= 1 << OVS_KEY_ATTR_SCTP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; } if (match->key->ip.proto == IPPROTO_TCP) { key_expected |= 1 << OVS_KEY_ATTR_TCP; key_expected |= 1 << OVS_KEY_ATTR_TCP_FLAGS; if (match->mask && (match->mask->key.ip.proto == 0xff)) { mask_allowed |= 1 << OVS_KEY_ATTR_TCP; mask_allowed |= 1 << OVS_KEY_ATTR_TCP_FLAGS; } } if (match->key->ip.proto == IPPROTO_ICMP) { key_expected |= 1 << OVS_KEY_ATTR_ICMP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_ICMP; } } } if (match->key->eth.type == htons(ETH_P_IPV6)) { key_expected |= 1 << OVS_KEY_ATTR_IPV6; if (match->mask && match->mask->key.eth.type == htons(0xffff)) { mask_allowed |= 1 << OVS_KEY_ATTR_IPV6; mask_allowed |= 1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6; } if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { if (match->key->ip.proto == IPPROTO_UDP) { key_expected |= 1 << OVS_KEY_ATTR_UDP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_UDP; } if (match->key->ip.proto == IPPROTO_SCTP) { key_expected |= 1 << OVS_KEY_ATTR_SCTP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; } if (match->key->ip.proto == IPPROTO_TCP) { key_expected |= 1 << OVS_KEY_ATTR_TCP; key_expected |= 1 << OVS_KEY_ATTR_TCP_FLAGS; if (match->mask && (match->mask->key.ip.proto == 0xff)) { mask_allowed |= 1 << OVS_KEY_ATTR_TCP; mask_allowed |= 1 << OVS_KEY_ATTR_TCP_FLAGS; } } if (match->key->ip.proto == IPPROTO_ICMPV6) { key_expected |= 1 << OVS_KEY_ATTR_ICMPV6; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1 << OVS_KEY_ATTR_ICMPV6; if (match->key->tp.src == htons(NDISC_NEIGHBOUR_SOLICITATION) || match->key->tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) { key_expected |= 1 << OVS_KEY_ATTR_ND; /* Original direction conntrack tuple * uses the same space as the ND fields * in the key, so both are not allowed * at the same time. */ mask_allowed &= ~(1ULL << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6); if (match->mask && (match->mask->key.tp.src == htons(0xff))) mask_allowed |= 1 << OVS_KEY_ATTR_ND; } } } } if (match->key->eth.type == htons(ETH_P_NSH)) { key_expected |= 1 << OVS_KEY_ATTR_NSH; if (match->mask && match->mask->key.eth.type == htons(0xffff)) { mask_allowed |= 1 << OVS_KEY_ATTR_NSH; } } if ((key_attrs & key_expected) != key_expected) { /* Key attributes check failed. */ OVS_NLERR(log, "Missing key (keys=%llx, expected=%llx)", (unsigned long long)key_attrs, (unsigned long long)key_expected); return false; } if ((mask_attrs & mask_allowed) != mask_attrs) { /* Mask attributes check failed. */ OVS_NLERR(log, "Unexpected mask (mask=%llx, allowed=%llx)", (unsigned long long)mask_attrs, (unsigned long long)mask_allowed); return false; } return true; } size_t ovs_tun_key_attr_size(void) { /* Whenever adding new OVS_TUNNEL_KEY_ FIELDS, we should consider * updating this function. */ return nla_total_size_64bit(8) /* OVS_TUNNEL_KEY_ATTR_ID */ + nla_total_size(16) /* OVS_TUNNEL_KEY_ATTR_IPV[46]_SRC */ + nla_total_size(16) /* OVS_TUNNEL_KEY_ATTR_IPV[46]_DST */ + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */ + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */ + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */ + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */ + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_OAM */ + nla_total_size(256) /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */ /* OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and * OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS is mutually exclusive with * OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS and covered by it. */ + nla_total_size(2) /* OVS_TUNNEL_KEY_ATTR_TP_SRC */ + nla_total_size(2); /* OVS_TUNNEL_KEY_ATTR_TP_DST */ } static size_t ovs_nsh_key_attr_size(void) { /* Whenever adding new OVS_NSH_KEY_ FIELDS, we should consider * updating this function. */ return nla_total_size(NSH_BASE_HDR_LEN) /* OVS_NSH_KEY_ATTR_BASE */ /* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are * mutually exclusive, so the bigger one can cover * the small one. */ + nla_total_size(NSH_CTX_HDRS_MAX_LEN); } size_t ovs_key_attr_size(void) { /* Whenever adding new OVS_KEY_ FIELDS, we should consider * updating this function. */ BUILD_BUG_ON(OVS_KEY_ATTR_MAX != 32); return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */ + nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */ + ovs_tun_key_attr_size() + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */ + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */ + nla_total_size(4) /* OVS_KEY_ATTR_DP_HASH */ + nla_total_size(4) /* OVS_KEY_ATTR_RECIRC_ID */ + nla_total_size(4) /* OVS_KEY_ATTR_CT_STATE */ + nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */ + nla_total_size(4) /* OVS_KEY_ATTR_CT_MARK */ + nla_total_size(16) /* OVS_KEY_ATTR_CT_LABELS */ + nla_total_size(40) /* OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6 */ + nla_total_size(0) /* OVS_KEY_ATTR_NSH */ + ovs_nsh_key_attr_size() + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */ + nla_total_size(2) /* OVS_KEY_ATTR_ETHERTYPE */ + nla_total_size(4) /* OVS_KEY_ATTR_VLAN */ + nla_total_size(0) /* OVS_KEY_ATTR_ENCAP */ + nla_total_size(2) /* OVS_KEY_ATTR_ETHERTYPE */ + nla_total_size(40) /* OVS_KEY_ATTR_IPV6 */ + nla_total_size(2) /* OVS_KEY_ATTR_ICMPV6 */ + nla_total_size(28) /* OVS_KEY_ATTR_ND */ + nla_total_size(2); /* OVS_KEY_ATTR_IPV6_EXTHDRS */ } static const struct ovs_len_tbl ovs_vxlan_ext_key_lens[OVS_VXLAN_EXT_MAX + 1] = { [OVS_VXLAN_EXT_GBP] = { .len = sizeof(u32) }, }; static const struct ovs_len_tbl ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = { [OVS_TUNNEL_KEY_ATTR_ID] = { .len = sizeof(u64) }, [OVS_TUNNEL_KEY_ATTR_IPV4_SRC] = { .len = sizeof(u32) }, [OVS_TUNNEL_KEY_ATTR_IPV4_DST] = { .len = sizeof(u32) }, [OVS_TUNNEL_KEY_ATTR_TOS] = { .len = 1 }, [OVS_TUNNEL_KEY_ATTR_TTL] = { .len = 1 }, [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = { .len = 0 }, [OVS_TUNNEL_KEY_ATTR_CSUM] = { .len = 0 }, [OVS_TUNNEL_KEY_ATTR_TP_SRC] = { .len = sizeof(u16) }, [OVS_TUNNEL_KEY_ATTR_TP_DST] = { .len = sizeof(u16) }, [OVS_TUNNEL_KEY_ATTR_OAM] = { .len = 0 }, [OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = { .len = OVS_ATTR_VARIABLE }, [OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS] = { .len = OVS_ATTR_NESTED, .next = ovs_vxlan_ext_key_lens }, [OVS_TUNNEL_KEY_ATTR_IPV6_SRC] = { .len = sizeof(struct in6_addr) }, [OVS_TUNNEL_KEY_ATTR_IPV6_DST] = { .len = sizeof(struct in6_addr) }, [OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS] = { .len = OVS_ATTR_VARIABLE }, [OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE] = { .len = 0 }, }; static const struct ovs_len_tbl ovs_nsh_key_attr_lens[OVS_NSH_KEY_ATTR_MAX + 1] = { [OVS_NSH_KEY_ATTR_BASE] = { .len = sizeof(struct ovs_nsh_key_base) }, [OVS_NSH_KEY_ATTR_MD1] = { .len = sizeof(struct ovs_nsh_key_md1) }, [OVS_NSH_KEY_ATTR_MD2] = { .len = OVS_ATTR_VARIABLE }, }; /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute. */ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { [OVS_KEY_ATTR_ENCAP] = { .len = OVS_ATTR_NESTED }, [OVS_KEY_ATTR_PRIORITY] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_IN_PORT] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_SKB_MARK] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_ETHERNET] = { .len = sizeof(struct ovs_key_ethernet) }, [OVS_KEY_ATTR_VLAN] = { .len = sizeof(__be16) }, [OVS_KEY_ATTR_ETHERTYPE] = { .len = sizeof(__be16) }, [OVS_KEY_ATTR_IPV4] = { .len = sizeof(struct ovs_key_ipv4) }, [OVS_KEY_ATTR_IPV6] = { .len = sizeof(struct ovs_key_ipv6) }, [OVS_KEY_ATTR_TCP] = { .len = sizeof(struct ovs_key_tcp) }, [OVS_KEY_ATTR_TCP_FLAGS] = { .len = sizeof(__be16) }, [OVS_KEY_ATTR_UDP] = { .len = sizeof(struct ovs_key_udp) }, [OVS_KEY_ATTR_SCTP] = { .len = sizeof(struct ovs_key_sctp) }, [OVS_KEY_ATTR_ICMP] = { .len = sizeof(struct ovs_key_icmp) }, [OVS_KEY_ATTR_ICMPV6] = { .len = sizeof(struct ovs_key_icmpv6) }, [OVS_KEY_ATTR_ARP] = { .len = sizeof(struct ovs_key_arp) }, [OVS_KEY_ATTR_ND] = { .len = sizeof(struct ovs_key_nd) }, [OVS_KEY_ATTR_RECIRC_ID] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_DP_HASH] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_TUNNEL] = { .len = OVS_ATTR_NESTED, .next = ovs_tunnel_key_lens, }, [OVS_KEY_ATTR_MPLS] = { .len = OVS_ATTR_VARIABLE }, [OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_CT_ZONE] = { .len = sizeof(u16) }, [OVS_KEY_ATTR_CT_MARK] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) }, [OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4] = { .len = sizeof(struct ovs_key_ct_tuple_ipv4) }, [OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6] = { .len = sizeof(struct ovs_key_ct_tuple_ipv6) }, [OVS_KEY_ATTR_NSH] = { .len = OVS_ATTR_NESTED, .next = ovs_nsh_key_attr_lens, }, [OVS_KEY_ATTR_IPV6_EXTHDRS] = { .len = sizeof(struct ovs_key_ipv6_exthdrs) }, }; static bool check_attr_len(unsigned int attr_len, unsigned int expected_len) { return expected_len == attr_len || expected_len == OVS_ATTR_NESTED || expected_len == OVS_ATTR_VARIABLE; } static bool is_all_zero(const u8 *fp, size_t size) { int i; if (!fp) return false; for (i = 0; i < size; i++) if (fp[i]) return false; return true; } static int __parse_flow_nlattrs(const struct nlattr *attr, const struct nlattr *a[], u64 *attrsp, bool log, bool nz) { const struct nlattr *nla; u64 attrs; int rem; attrs = *attrsp; nla_for_each_nested(nla, attr, rem) { u16 type = nla_type(nla); int expected_len; if (type > OVS_KEY_ATTR_MAX) { OVS_NLERR(log, "Key type %d is out of range max %d", type, OVS_KEY_ATTR_MAX); return -EINVAL; } if (type == OVS_KEY_ATTR_PACKET_TYPE || type == OVS_KEY_ATTR_ND_EXTENSIONS || type == OVS_KEY_ATTR_TUNNEL_INFO) { OVS_NLERR(log, "Key type %d is not supported", type); return -EINVAL; } if (attrs & (1ULL << type)) { OVS_NLERR(log, "Duplicate key (type %d).", type); return -EINVAL; } expected_len = ovs_key_lens[type].len; if (!check_attr_len(nla_len(nla), expected_len)) { OVS_NLERR(log, "Key %d has unexpected len %d expected %d", type, nla_len(nla), expected_len); return -EINVAL; } if (!nz || !is_all_zero(nla_data(nla), nla_len(nla))) { attrs |= 1ULL << type; a[type] = nla; } } if (rem) { OVS_NLERR(log, "Message has %d unknown bytes.", rem); return -EINVAL; } *attrsp = attrs; return 0; } static int parse_flow_mask_nlattrs(const struct nlattr *attr, const struct nlattr *a[], u64 *attrsp, bool log) { return __parse_flow_nlattrs(attr, a, attrsp, log, true); } int parse_flow_nlattrs(const struct nlattr *attr, const struct nlattr *a[], u64 *attrsp, bool log) { return __parse_flow_nlattrs(attr, a, attrsp, log, false); } static int genev_tun_opt_from_nlattr(const struct nlattr *a, struct sw_flow_match *match, bool is_mask, bool log) { unsigned long opt_key_offset; if (nla_len(a) > sizeof(match->key->tun_opts)) { OVS_NLERR(log, "Geneve option length err (len %d, max %zu).", nla_len(a), sizeof(match->key->tun_opts)); return -EINVAL; } if (nla_len(a) % 4 != 0) { OVS_NLERR(log, "Geneve opt len %d is not a multiple of 4.", nla_len(a)); return -EINVAL; } /* We need to record the length of the options passed * down, otherwise packets with the same format but * additional options will be silently matched. */ if (!is_mask) { SW_FLOW_KEY_PUT(match, tun_opts_len, nla_len(a), false); } else { /* This is somewhat unusual because it looks at * both the key and mask while parsing the * attributes (and by extension assumes the key * is parsed first). Normally, we would verify * that each is the correct length and that the * attributes line up in the validate function. * However, that is difficult because this is * variable length and we won't have the * information later. */ if (match->key->tun_opts_len != nla_len(a)) { OVS_NLERR(log, "Geneve option len %d != mask len %d", match->key->tun_opts_len, nla_len(a)); return -EINVAL; } SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true); } opt_key_offset = TUN_METADATA_OFFSET(nla_len(a)); SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, nla_data(a), nla_len(a), is_mask); return 0; } static int vxlan_tun_opt_from_nlattr(const struct nlattr *attr, struct sw_flow_match *match, bool is_mask, bool log) { struct nlattr *a; int rem; unsigned long opt_key_offset; struct vxlan_metadata opts; BUILD_BUG_ON(sizeof(opts) > sizeof(match->key->tun_opts)); memset(&opts, 0, sizeof(opts)); nla_for_each_nested(a, attr, rem) { int type = nla_type(a); if (type > OVS_VXLAN_EXT_MAX) { OVS_NLERR(log, "VXLAN extension %d out of range max %d", type, OVS_VXLAN_EXT_MAX); return -EINVAL; } if (!check_attr_len(nla_len(a), ovs_vxlan_ext_key_lens[type].len)) { OVS_NLERR(log, "VXLAN extension %d has unexpected len %d expected %d", type, nla_len(a), ovs_vxlan_ext_key_lens[type].len); return -EINVAL; } switch (type) { case OVS_VXLAN_EXT_GBP: opts.gbp = nla_get_u32(a); break; default: OVS_NLERR(log, "Unknown VXLAN extension attribute %d", type); return -EINVAL; } } if (rem) { OVS_NLERR(log, "VXLAN extension message has %d unknown bytes.", rem); return -EINVAL; } if (!is_mask) SW_FLOW_KEY_PUT(match, tun_opts_len, sizeof(opts), false); else SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true); opt_key_offset = TUN_METADATA_OFFSET(sizeof(opts)); SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, &opts, sizeof(opts), is_mask); return 0; } static int erspan_tun_opt_from_nlattr(const struct nlattr *a, struct sw_flow_match *match, bool is_mask, bool log) { unsigned long opt_key_offset; BUILD_BUG_ON(sizeof(struct erspan_metadata) > sizeof(match->key->tun_opts)); if (nla_len(a) > sizeof(match->key->tun_opts)) { OVS_NLERR(log, "ERSPAN option length err (len %d, max %zu).", nla_len(a), sizeof(match->key->tun_opts)); return -EINVAL; } if (!is_mask) SW_FLOW_KEY_PUT(match, tun_opts_len, sizeof(struct erspan_metadata), false); else SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true); opt_key_offset = TUN_METADATA_OFFSET(nla_len(a)); SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, nla_data(a), nla_len(a), is_mask); return 0; } static int ip_tun_from_nlattr(const struct nlattr *attr, struct sw_flow_match *match, bool is_mask, bool log) { bool ttl = false, ipv4 = false, ipv6 = false; IP_TUNNEL_DECLARE_FLAGS(tun_flags) = { }; bool info_bridge_mode = false; int opts_type = 0; struct nlattr *a; int rem; nla_for_each_nested(a, attr, rem) { int type = nla_type(a); int err; if (type > OVS_TUNNEL_KEY_ATTR_MAX) { OVS_NLERR(log, "Tunnel attr %d out of range max %d", type, OVS_TUNNEL_KEY_ATTR_MAX); return -EINVAL; } if (!check_attr_len(nla_len(a), ovs_tunnel_key_lens[type].len)) { OVS_NLERR(log, "Tunnel attr %d has unexpected len %d expected %d", type, nla_len(a), ovs_tunnel_key_lens[type].len); return -EINVAL; } switch (type) { case OVS_TUNNEL_KEY_ATTR_ID: SW_FLOW_KEY_PUT(match, tun_key.tun_id, nla_get_be64(a), is_mask); __set_bit(IP_TUNNEL_KEY_BIT, tun_flags); break; case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: SW_FLOW_KEY_PUT(match, tun_key.u.ipv4.src, nla_get_in_addr(a), is_mask); ipv4 = true; break; case OVS_TUNNEL_KEY_ATTR_IPV4_DST: SW_FLOW_KEY_PUT(match, tun_key.u.ipv4.dst, nla_get_in_addr(a), is_mask); ipv4 = true; break; case OVS_TUNNEL_KEY_ATTR_IPV6_SRC: SW_FLOW_KEY_PUT(match, tun_key.u.ipv6.src, nla_get_in6_addr(a), is_mask); ipv6 = true; break; case OVS_TUNNEL_KEY_ATTR_IPV6_DST: SW_FLOW_KEY_PUT(match, tun_key.u.ipv6.dst, nla_get_in6_addr(a), is_mask); ipv6 = true; break; case OVS_TUNNEL_KEY_ATTR_TOS: SW_FLOW_KEY_PUT(match, tun_key.tos, nla_get_u8(a), is_mask); break; case OVS_TUNNEL_KEY_ATTR_TTL: SW_FLOW_KEY_PUT(match, tun_key.ttl, nla_get_u8(a), is_mask); ttl = true; break; case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT: __set_bit(IP_TUNNEL_DONT_FRAGMENT_BIT, tun_flags); break; case OVS_TUNNEL_KEY_ATTR_CSUM: __set_bit(IP_TUNNEL_CSUM_BIT, tun_flags); break; case OVS_TUNNEL_KEY_ATTR_TP_SRC: SW_FLOW_KEY_PUT(match, tun_key.tp_src, nla_get_be16(a), is_mask); break; case OVS_TUNNEL_KEY_ATTR_TP_DST: SW_FLOW_KEY_PUT(match, tun_key.tp_dst, nla_get_be16(a), is_mask); break; case OVS_TUNNEL_KEY_ATTR_OAM: __set_bit(IP_TUNNEL_OAM_BIT, tun_flags); break; case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS: if (opts_type) { OVS_NLERR(log, "Multiple metadata blocks provided"); return -EINVAL; } err = genev_tun_opt_from_nlattr(a, match, is_mask, log); if (err) return err; __set_bit(IP_TUNNEL_GENEVE_OPT_BIT, tun_flags); opts_type = type; break; case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS: if (opts_type) { OVS_NLERR(log, "Multiple metadata blocks provided"); return -EINVAL; } err = vxlan_tun_opt_from_nlattr(a, match, is_mask, log); if (err) return err; __set_bit(IP_TUNNEL_VXLAN_OPT_BIT, tun_flags); opts_type = type; break; case OVS_TUNNEL_KEY_ATTR_PAD: break; case OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS: if (opts_type) { OVS_NLERR(log, "Multiple metadata blocks provided"); return -EINVAL; } err = erspan_tun_opt_from_nlattr(a, match, is_mask, log); if (err) return err; __set_bit(IP_TUNNEL_ERSPAN_OPT_BIT, tun_flags); opts_type = type; break; case OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE: info_bridge_mode = true; ipv4 = true; break; default: OVS_NLERR(log, "Unknown IP tunnel attribute %d", type); return -EINVAL; } } SW_FLOW_KEY_BITMAP_COPY(match, tun_key.tun_flags, tun_flags, __IP_TUNNEL_FLAG_NUM, is_mask); if (is_mask) SW_FLOW_KEY_MEMSET_FIELD(match, tun_proto, 0xff, true); else SW_FLOW_KEY_PUT(match, tun_proto, ipv6 ? AF_INET6 : AF_INET, false); if (rem > 0) { OVS_NLERR(log, "IP tunnel attribute has %d unknown bytes.", rem); return -EINVAL; } if (ipv4 && ipv6) { OVS_NLERR(log, "Mixed IPv4 and IPv6 tunnel attributes"); return -EINVAL; } if (!is_mask) { if (!ipv4 && !ipv6) { OVS_NLERR(log, "IP tunnel dst address not specified"); return -EINVAL; } if (ipv4) { if (info_bridge_mode) { __clear_bit(IP_TUNNEL_KEY_BIT, tun_flags); if (match->key->tun_key.u.ipv4.src || match->key->tun_key.u.ipv4.dst || match->key->tun_key.tp_src || match->key->tun_key.tp_dst || match->key->tun_key.ttl || match->key->tun_key.tos || !ip_tunnel_flags_empty(tun_flags)) { OVS_NLERR(log, "IPv4 tun info is not correct"); return -EINVAL; } } else if (!match->key->tun_key.u.ipv4.dst) { OVS_NLERR(log, "IPv4 tunnel dst address is zero"); return -EINVAL; } } if (ipv6 && ipv6_addr_any(&match->key->tun_key.u.ipv6.dst)) { OVS_NLERR(log, "IPv6 tunnel dst address is zero"); return -EINVAL; } if (!ttl && !info_bridge_mode) { OVS_NLERR(log, "IP tunnel TTL not specified."); return -EINVAL; } } return opts_type; } static int vxlan_opt_to_nlattr(struct sk_buff *skb, const void *tun_opts, int swkey_tun_opts_len) { const struct vxlan_metadata *opts = tun_opts; struct nlattr *nla; nla = nla_nest_start_noflag(skb, OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS); if (!nla) return -EMSGSIZE; if (nla_put_u32(skb, OVS_VXLAN_EXT_GBP, opts->gbp) < 0) return -EMSGSIZE; nla_nest_end(skb, nla); return 0; } static int __ip_tun_to_nlattr(struct sk_buff *skb, const struct ip_tunnel_key *output, const void *tun_opts, int swkey_tun_opts_len, unsigned short tun_proto, u8 mode) { if (test_bit(IP_TUNNEL_KEY_BIT, output->tun_flags) && nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id, OVS_TUNNEL_KEY_ATTR_PAD)) return -EMSGSIZE; if (mode & IP_TUNNEL_INFO_BRIDGE) return nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE) ? -EMSGSIZE : 0; switch (tun_proto) { case AF_INET: if (output->u.ipv4.src && nla_put_in_addr(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->u.ipv4.src)) return -EMSGSIZE; if (output->u.ipv4.dst && nla_put_in_addr(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->u.ipv4.dst)) return -EMSGSIZE; break; case AF_INET6: if (!ipv6_addr_any(&output->u.ipv6.src) && nla_put_in6_addr(skb, OVS_TUNNEL_KEY_ATTR_IPV6_SRC, &output->u.ipv6.src)) return -EMSGSIZE; if (!ipv6_addr_any(&output->u.ipv6.dst) && nla_put_in6_addr(skb, OVS_TUNNEL_KEY_ATTR_IPV6_DST, &output->u.ipv6.dst)) return -EMSGSIZE; break; } if (output->tos && nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->tos)) return -EMSGSIZE; if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ttl)) return -EMSGSIZE; if (test_bit(IP_TUNNEL_DONT_FRAGMENT_BIT, output->tun_flags) && nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT)) return -EMSGSIZE; if (test_bit(IP_TUNNEL_CSUM_BIT, output->tun_flags) && nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM)) return -EMSGSIZE; if (output->tp_src && nla_put_be16(skb, OVS_TUNNEL_KEY_ATTR_TP_SRC, output->tp_src)) return -EMSGSIZE; if (output->tp_dst && nla_put_be16(skb, OVS_TUNNEL_KEY_ATTR_TP_DST, output->tp_dst)) return -EMSGSIZE; if (test_bit(IP_TUNNEL_OAM_BIT, output->tun_flags) && nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM)) return -EMSGSIZE; if (swkey_tun_opts_len) { if (test_bit(IP_TUNNEL_GENEVE_OPT_BIT, output->tun_flags) && nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, swkey_tun_opts_len, tun_opts)) return -EMSGSIZE; else if (test_bit(IP_TUNNEL_VXLAN_OPT_BIT, output->tun_flags) && vxlan_opt_to_nlattr(skb, tun_opts, swkey_tun_opts_len)) return -EMSGSIZE; else if (test_bit(IP_TUNNEL_ERSPAN_OPT_BIT, output->tun_flags) && nla_put(skb, OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS, swkey_tun_opts_len, tun_opts)) return -EMSGSIZE; } return 0; } static int ip_tun_to_nlattr(struct sk_buff *skb, const struct ip_tunnel_key *output, const void *tun_opts, int swkey_tun_opts_len, unsigned short tun_proto, u8 mode) { struct nlattr *nla; int err; nla = nla_nest_start_noflag(skb, OVS_KEY_ATTR_TUNNEL); if (!nla) return -EMSGSIZE; err = __ip_tun_to_nlattr(skb, output, tun_opts, swkey_tun_opts_len, tun_proto, mode); if (err) return err; nla_nest_end(skb, nla); return 0; } int ovs_nla_put_tunnel_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info) { return __ip_tun_to_nlattr(skb, &tun_info->key, ip_tunnel_info_opts(tun_info), tun_info->options_len, ip_tunnel_info_af(tun_info), tun_info->mode); } static int encode_vlan_from_nlattrs(struct sw_flow_match *match, const struct nlattr *a[], bool is_mask, bool inner) { __be16 tci = 0; __be16 tpid = 0; if (a[OVS_KEY_ATTR_VLAN]) tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); if (a[OVS_KEY_ATTR_ETHERTYPE]) tpid = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); if (likely(!inner)) { SW_FLOW_KEY_PUT(match, eth.vlan.tpid, tpid, is_mask); SW_FLOW_KEY_PUT(match, eth.vlan.tci, tci, is_mask); } else { SW_FLOW_KEY_PUT(match, eth.cvlan.tpid, tpid, is_mask); SW_FLOW_KEY_PUT(match, eth.cvlan.tci, tci, is_mask); } return 0; } static int validate_vlan_from_nlattrs(const struct sw_flow_match *match, u64 key_attrs, bool inner, const struct nlattr **a, bool log) { __be16 tci = 0; if (!((key_attrs & (1 << OVS_KEY_ATTR_ETHERNET)) && (key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) && eth_type_vlan(nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE])))) { /* Not a VLAN. */ return 0; } if (!((key_attrs & (1 << OVS_KEY_ATTR_VLAN)) && (key_attrs & (1 << OVS_KEY_ATTR_ENCAP)))) { OVS_NLERR(log, "Invalid %s frame", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } if (a[OVS_KEY_ATTR_VLAN]) tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); if (!(tci & htons(VLAN_CFI_MASK))) { if (tci) { OVS_NLERR(log, "%s TCI does not have VLAN_CFI_MASK bit set.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) { /* Corner case for truncated VLAN header. */ OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } } return 1; } static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match, u64 key_attrs, bool inner, const struct nlattr **a, bool log) { __be16 tci = 0; __be16 tpid = 0; bool encap_valid = !!(match->key->eth.vlan.tci & htons(VLAN_CFI_MASK)); bool i_encap_valid = !!(match->key->eth.cvlan.tci & htons(VLAN_CFI_MASK)); if (!(key_attrs & (1 << OVS_KEY_ATTR_ENCAP))) { /* Not a VLAN. */ return 0; } if ((!inner && !encap_valid) || (inner && !i_encap_valid)) { OVS_NLERR(log, "Encap mask attribute is set for non-%s frame.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } if (a[OVS_KEY_ATTR_VLAN]) tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); if (a[OVS_KEY_ATTR_ETHERTYPE]) tpid = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); if (tpid != htons(0xffff)) { OVS_NLERR(log, "Must have an exact match on %s TPID (mask=%x).", (inner) ? "C-VLAN" : "VLAN", ntohs(tpid)); return -EINVAL; } if (!(tci & htons(VLAN_CFI_MASK))) { OVS_NLERR(log, "%s TCI mask does not have exact match for VLAN_CFI_MASK bit.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } return 1; } static int __parse_vlan_from_nlattrs(struct sw_flow_match *match, u64 *key_attrs, bool inner, const struct nlattr **a, bool is_mask, bool log) { int err; const struct nlattr *encap; if (!is_mask) err = validate_vlan_from_nlattrs(match, *key_attrs, inner, a, log); else err = validate_vlan_mask_from_nlattrs(match, *key_attrs, inner, a, log); if (err <= 0) return err; err = encode_vlan_from_nlattrs(match, a, is_mask, inner); if (err) return err; *key_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP); *key_attrs &= ~(1 << OVS_KEY_ATTR_VLAN); *key_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); encap = a[OVS_KEY_ATTR_ENCAP]; if (!is_mask) err = parse_flow_nlattrs(encap, a, key_attrs, log); else err = parse_flow_mask_nlattrs(encap, a, key_attrs, log); return err; } static int parse_vlan_from_nlattrs(struct sw_flow_match *match, u64 *key_attrs, const struct nlattr **a, bool is_mask, bool log) { int err; bool encap_valid = false; err = __parse_vlan_from_nlattrs(match, key_attrs, false, a, is_mask, log); if (err) return err; encap_valid = !!(match->key->eth.vlan.tci & htons(VLAN_CFI_MASK)); if (encap_valid) { err = __parse_vlan_from_nlattrs(match, key_attrs, true, a, is_mask, log); if (err) return err; } return 0; } static int parse_eth_type_from_nlattrs(struct sw_flow_match *match, u64 *attrs, const struct nlattr **a, bool is_mask, bool log) { __be16 eth_type; eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); if (is_mask) { /* Always exact match EtherType. */ eth_type = htons(0xffff); } else if (!eth_proto_is_802_3(eth_type)) { OVS_NLERR(log, "EtherType %x is less than min %x", ntohs(eth_type), ETH_P_802_3_MIN); return -EINVAL; } SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); return 0; } static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match, u64 *attrs, const struct nlattr **a, bool is_mask, bool log) { u8 mac_proto = MAC_PROTO_ETHERNET; if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) { u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]); SW_FLOW_KEY_PUT(match, ovs_flow_hash, hash_val, is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_DP_HASH); } if (*attrs & (1 << OVS_KEY_ATTR_RECIRC_ID)) { u32 recirc_id = nla_get_u32(a[OVS_KEY_ATTR_RECIRC_ID]); SW_FLOW_KEY_PUT(match, recirc_id, recirc_id, is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_RECIRC_ID); } if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) { SW_FLOW_KEY_PUT(match, phy.priority, nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_PRIORITY); } if (*attrs & (1 << OVS_KEY_ATTR_IN_PORT)) { u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]); if (is_mask) { in_port = 0xffffffff; /* Always exact match in_port. */ } else if (in_port >= DP_MAX_PORTS) { OVS_NLERR(log, "Port %d exceeds max allowable %d", in_port, DP_MAX_PORTS); return -EINVAL; } SW_FLOW_KEY_PUT(match, phy.in_port, in_port, is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT); } else if (!is_mask) { SW_FLOW_KEY_PUT(match, phy.in_port, DP_MAX_PORTS, is_mask); } if (*attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) { uint32_t mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]); SW_FLOW_KEY_PUT(match, phy.skb_mark, mark, is_mask); *attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK); } if (*attrs & (1 << OVS_KEY_ATTR_TUNNEL)) { if (ip_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], match, is_mask, log) < 0) return -EINVAL; *attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL); } if (*attrs & (1 << OVS_KEY_ATTR_CT_STATE) && ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) { u32 ct_state = nla_get_u32(a[OVS_KEY_ATTR_CT_STATE]); if (ct_state & ~CT_SUPPORTED_MASK) { OVS_NLERR(log, "ct_state flags %08x unsupported", ct_state); return -EINVAL; } SW_FLOW_KEY_PUT(match, ct_state, ct_state, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_STATE); } if (*attrs & (1 << OVS_KEY_ATTR_CT_ZONE) && ovs_ct_verify(net, OVS_KEY_ATTR_CT_ZONE)) { u16 ct_zone = nla_get_u16(a[OVS_KEY_ATTR_CT_ZONE]); SW_FLOW_KEY_PUT(match, ct_zone, ct_zone, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_ZONE); } if (*attrs & (1 << OVS_KEY_ATTR_CT_MARK) && ovs_ct_verify(net, OVS_KEY_ATTR_CT_MARK)) { u32 mark = nla_get_u32(a[OVS_KEY_ATTR_CT_MARK]); SW_FLOW_KEY_PUT(match, ct.mark, mark, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_MARK); } if (*attrs & (1 << OVS_KEY_ATTR_CT_LABELS) && ovs_ct_verify(net, OVS_KEY_ATTR_CT_LABELS)) { const struct ovs_key_ct_labels *cl; cl = nla_data(a[OVS_KEY_ATTR_CT_LABELS]); SW_FLOW_KEY_MEMCPY(match, ct.labels, cl->ct_labels, sizeof(*cl), is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS); } if (*attrs & (1ULL << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4)) { const struct ovs_key_ct_tuple_ipv4 *ct; ct = nla_data(a[OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4]); SW_FLOW_KEY_PUT(match, ipv4.ct_orig.src, ct->ipv4_src, is_mask); SW_FLOW_KEY_PUT(match, ipv4.ct_orig.dst, ct->ipv4_dst, is_mask); SW_FLOW_KEY_PUT(match, ct.orig_tp.src, ct->src_port, is_mask); SW_FLOW_KEY_PUT(match, ct.orig_tp.dst, ct->dst_port, is_mask); SW_FLOW_KEY_PUT(match, ct_orig_proto, ct->ipv4_proto, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4); } if (*attrs & (1ULL << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6)) { const struct ovs_key_ct_tuple_ipv6 *ct; ct = nla_data(a[OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6]); SW_FLOW_KEY_MEMCPY(match, ipv6.ct_orig.src, &ct->ipv6_src, sizeof(match->key->ipv6.ct_orig.src), is_mask); SW_FLOW_KEY_MEMCPY(match, ipv6.ct_orig.dst, &ct->ipv6_dst, sizeof(match->key->ipv6.ct_orig.dst), is_mask); SW_FLOW_KEY_PUT(match, ct.orig_tp.src, ct->src_port, is_mask); SW_FLOW_KEY_PUT(match, ct.orig_tp.dst, ct->dst_port, is_mask); SW_FLOW_KEY_PUT(match, ct_orig_proto, ct->ipv6_proto, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6); } /* For layer 3 packets the Ethernet type is provided * and treated as metadata but no MAC addresses are provided. */ if (!(*attrs & (1ULL << OVS_KEY_ATTR_ETHERNET)) && (*attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))) mac_proto = MAC_PROTO_NONE; /* Always exact match mac_proto */ SW_FLOW_KEY_PUT(match, mac_proto, is_mask ? 0xff : mac_proto, is_mask); if (mac_proto == MAC_PROTO_NONE) return parse_eth_type_from_nlattrs(match, attrs, a, is_mask, log); return 0; } int nsh_hdr_from_nlattr(const struct nlattr *attr, struct nshhdr *nh, size_t size) { struct nlattr *a; int rem; u8 flags = 0; u8 ttl = 0; int mdlen = 0; /* validate_nsh has check this, so we needn't do duplicate check here */ if (size < NSH_BASE_HDR_LEN) return -ENOBUFS; nla_for_each_nested(a, attr, rem) { int type = nla_type(a); switch (type) { case OVS_NSH_KEY_ATTR_BASE: { const struct ovs_nsh_key_base *base = nla_data(a); flags = base->flags; ttl = base->ttl; nh->np = base->np; nh->mdtype = base->mdtype; nh->path_hdr = base->path_hdr; break; } case OVS_NSH_KEY_ATTR_MD1: mdlen = nla_len(a); if (mdlen > size - NSH_BASE_HDR_LEN) return -ENOBUFS; memcpy(&nh->md1, nla_data(a), mdlen); break; case OVS_NSH_KEY_ATTR_MD2: mdlen = nla_len(a); if (mdlen > size - NSH_BASE_HDR_LEN) return -ENOBUFS; memcpy(&nh->md2, nla_data(a), mdlen); break; default: return -EINVAL; } } /* nsh header length = NSH_BASE_HDR_LEN + mdlen */ nh->ver_flags_ttl_len = 0; nsh_set_flags_ttl_len(nh, flags, ttl, NSH_BASE_HDR_LEN + mdlen); return 0; } int nsh_key_from_nlattr(const struct nlattr *attr, struct ovs_key_nsh *nsh, struct ovs_key_nsh *nsh_mask) { struct nlattr *a; int rem; /* validate_nsh has check this, so we needn't do duplicate check here */ nla_for_each_nested(a, attr, rem) { int type = nla_type(a); switch (type) { case OVS_NSH_KEY_ATTR_BASE: { const struct ovs_nsh_key_base *base = nla_data(a); const struct ovs_nsh_key_base *base_mask = base + 1; nsh->base = *base; nsh_mask->base = *base_mask; break; } case OVS_NSH_KEY_ATTR_MD1: { const struct ovs_nsh_key_md1 *md1 = nla_data(a); const struct ovs_nsh_key_md1 *md1_mask = md1 + 1; memcpy(nsh->context, md1->context, sizeof(*md1)); memcpy(nsh_mask->context, md1_mask->context, sizeof(*md1_mask)); break; } case OVS_NSH_KEY_ATTR_MD2: /* Not supported yet */ return -ENOTSUPP; default: return -EINVAL; } } return 0; } static int nsh_key_put_from_nlattr(const struct nlattr *attr, struct sw_flow_match *match, bool is_mask, bool is_push_nsh, bool log) { struct nlattr *a; int rem; bool has_base = false; bool has_md1 = false; bool has_md2 = false; u8 mdtype = 0; int mdlen = 0; if (WARN_ON(is_push_nsh && is_mask)) return -EINVAL; nla_for_each_nested(a, attr, rem) { int type = nla_type(a); int i; if (type > OVS_NSH_KEY_ATTR_MAX) { OVS_NLERR(log, "nsh attr %d is out of range max %d", type, OVS_NSH_KEY_ATTR_MAX); return -EINVAL; } if (!check_attr_len(nla_len(a), ovs_nsh_key_attr_lens[type].len)) { OVS_NLERR( log, "nsh attr %d has unexpected len %d expected %d", type, nla_len(a), ovs_nsh_key_attr_lens[type].len ); return -EINVAL; } switch (type) { case OVS_NSH_KEY_ATTR_BASE: { const struct ovs_nsh_key_base *base = nla_data(a); has_base = true; mdtype = base->mdtype; SW_FLOW_KEY_PUT(match, nsh.base.flags, base->flags, is_mask); SW_FLOW_KEY_PUT(match, nsh.base.ttl, base->ttl, is_mask); SW_FLOW_KEY_PUT(match, nsh.base.mdtype, base->mdtype, is_mask); SW_FLOW_KEY_PUT(match, nsh.base.np, base->np, is_mask); SW_FLOW_KEY_PUT(match, nsh.base.path_hdr, base->path_hdr, is_mask); break; } case OVS_NSH_KEY_ATTR_MD1: { const struct ovs_nsh_key_md1 *md1 = nla_data(a); has_md1 = true; for (i = 0; i < NSH_MD1_CONTEXT_SIZE; i++) SW_FLOW_KEY_PUT(match, nsh.context[i], md1->context[i], is_mask); break; } case OVS_NSH_KEY_ATTR_MD2: if (!is_push_nsh) /* Not supported MD type 2 yet */ return -ENOTSUPP; has_md2 = true; mdlen = nla_len(a); if (mdlen > NSH_CTX_HDRS_MAX_LEN || mdlen <= 0) { OVS_NLERR( log, "Invalid MD length %d for MD type %d", mdlen, mdtype ); return -EINVAL; } break; default: OVS_NLERR(log, "Unknown nsh attribute %d", type); return -EINVAL; } } if (rem > 0) { OVS_NLERR(log, "nsh attribute has %d unknown bytes.", rem); return -EINVAL; } if (has_md1 && has_md2) { OVS_NLERR( 1, "invalid nsh attribute: md1 and md2 are exclusive." ); return -EINVAL; } if (!is_mask) { if ((has_md1 && mdtype != NSH_M_TYPE1) || (has_md2 && mdtype != NSH_M_TYPE2)) { OVS_NLERR(1, "nsh attribute has unmatched MD type %d.", mdtype); return -EINVAL; } if (is_push_nsh && (!has_base || (!has_md1 && !has_md2))) { OVS_NLERR( 1, "push_nsh: missing base or metadata attributes" ); return -EINVAL; } } return 0; } static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match, u64 attrs, const struct nlattr **a, bool is_mask, bool log) { int err; err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log); if (err) return err; if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) { const struct ovs_key_ethernet *eth_key; eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]); SW_FLOW_KEY_MEMCPY(match, eth.src, eth_key->eth_src, ETH_ALEN, is_mask); SW_FLOW_KEY_MEMCPY(match, eth.dst, eth_key->eth_dst, ETH_ALEN, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET); if (attrs & (1 << OVS_KEY_ATTR_VLAN)) { /* VLAN attribute is always parsed before getting here since it * may occur multiple times. */ OVS_NLERR(log, "VLAN attribute unexpected."); return -EINVAL; } if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) { err = parse_eth_type_from_nlattrs(match, &attrs, a, is_mask, log); if (err) return err; } else if (!is_mask) { SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask); } } else if (!match->key->eth.type) { OVS_NLERR(log, "Either Ethernet header or EtherType is required."); return -EINVAL; } if (attrs & (1 << OVS_KEY_ATTR_IPV4)) { const struct ovs_key_ipv4 *ipv4_key; ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]); if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) { OVS_NLERR(log, "IPv4 frag type %d is out of range max %d", ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX); return -EINVAL; } SW_FLOW_KEY_PUT(match, ip.proto, ipv4_key->ipv4_proto, is_mask); SW_FLOW_KEY_PUT(match, ip.tos, ipv4_key->ipv4_tos, is_mask); SW_FLOW_KEY_PUT(match, ip.ttl, ipv4_key->ipv4_ttl, is_mask); SW_FLOW_KEY_PUT(match, ip.frag, ipv4_key->ipv4_frag, is_mask); SW_FLOW_KEY_PUT(match, ipv4.addr.src, ipv4_key->ipv4_src, is_mask); SW_FLOW_KEY_PUT(match, ipv4.addr.dst, ipv4_key->ipv4_dst, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_IPV4); } if (attrs & (1 << OVS_KEY_ATTR_IPV6)) { const struct ovs_key_ipv6 *ipv6_key; ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]); if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) { OVS_NLERR(log, "IPv6 frag type %d is out of range max %d", ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX); return -EINVAL; } if (!is_mask && ipv6_key->ipv6_label & htonl(0xFFF00000)) { OVS_NLERR(log, "IPv6 flow label %x is out of range (max=%x)", ntohl(ipv6_key->ipv6_label), (1 << 20) - 1); return -EINVAL; } SW_FLOW_KEY_PUT(match, ipv6.label, ipv6_key->ipv6_label, is_mask); SW_FLOW_KEY_PUT(match, ip.proto, ipv6_key->ipv6_proto, is_mask); SW_FLOW_KEY_PUT(match, ip.tos, ipv6_key->ipv6_tclass, is_mask); SW_FLOW_KEY_PUT(match, ip.ttl, ipv6_key->ipv6_hlimit, is_mask); SW_FLOW_KEY_PUT(match, ip.frag, ipv6_key->ipv6_frag, is_mask); SW_FLOW_KEY_MEMCPY(match, ipv6.addr.src, ipv6_key->ipv6_src, sizeof(match->key->ipv6.addr.src), is_mask); SW_FLOW_KEY_MEMCPY(match, ipv6.addr.dst, ipv6_key->ipv6_dst, sizeof(match->key->ipv6.addr.dst), is_mask); attrs &= ~(1 << OVS_KEY_ATTR_IPV6); } if (attrs & (1ULL << OVS_KEY_ATTR_IPV6_EXTHDRS)) { const struct ovs_key_ipv6_exthdrs *ipv6_exthdrs_key; ipv6_exthdrs_key = nla_data(a[OVS_KEY_ATTR_IPV6_EXTHDRS]); SW_FLOW_KEY_PUT(match, ipv6.exthdrs, ipv6_exthdrs_key->hdrs, is_mask); attrs &= ~(1ULL << OVS_KEY_ATTR_IPV6_EXTHDRS); } if (attrs & (1 << OVS_KEY_ATTR_ARP)) { const struct ovs_key_arp *arp_key; arp_key = nla_data(a[OVS_KEY_ATTR_ARP]); if (!is_mask && (arp_key->arp_op & htons(0xff00))) { OVS_NLERR(log, "Unknown ARP opcode (opcode=%d).", arp_key->arp_op); return -EINVAL; } SW_FLOW_KEY_PUT(match, ipv4.addr.src, arp_key->arp_sip, is_mask); SW_FLOW_KEY_PUT(match, ipv4.addr.dst, arp_key->arp_tip, is_mask); SW_FLOW_KEY_PUT(match, ip.proto, ntohs(arp_key->arp_op), is_mask); SW_FLOW_KEY_MEMCPY(match, ipv4.arp.sha, arp_key->arp_sha, ETH_ALEN, is_mask); SW_FLOW_KEY_MEMCPY(match, ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_ARP); } if (attrs & (1 << OVS_KEY_ATTR_NSH)) { if (nsh_key_put_from_nlattr(a[OVS_KEY_ATTR_NSH], match, is_mask, false, log) < 0) return -EINVAL; attrs &= ~(1 << OVS_KEY_ATTR_NSH); } if (attrs & (1 << OVS_KEY_ATTR_MPLS)) { const struct ovs_key_mpls *mpls_key; u32 hdr_len; u32 label_count, label_count_mask, i; mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]); hdr_len = nla_len(a[OVS_KEY_ATTR_MPLS]); label_count = hdr_len / sizeof(struct ovs_key_mpls); if (label_count == 0 || label_count > MPLS_LABEL_DEPTH || hdr_len % sizeof(struct ovs_key_mpls)) return -EINVAL; label_count_mask = GENMASK(label_count - 1, 0); for (i = 0 ; i < label_count; i++) SW_FLOW_KEY_PUT(match, mpls.lse[i], mpls_key[i].mpls_lse, is_mask); SW_FLOW_KEY_PUT(match, mpls.num_labels_mask, label_count_mask, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_MPLS); } if (attrs & (1 << OVS_KEY_ATTR_TCP)) { const struct ovs_key_tcp *tcp_key; tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]); SW_FLOW_KEY_PUT(match, tp.src, tcp_key->tcp_src, is_mask); SW_FLOW_KEY_PUT(match, tp.dst, tcp_key->tcp_dst, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_TCP); } if (attrs & (1 << OVS_KEY_ATTR_TCP_FLAGS)) { SW_FLOW_KEY_PUT(match, tp.flags, nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]), is_mask); attrs &= ~(1 << OVS_KEY_ATTR_TCP_FLAGS); } if (attrs & (1 << OVS_KEY_ATTR_UDP)) { const struct ovs_key_udp *udp_key; udp_key = nla_data(a[OVS_KEY_ATTR_UDP]); SW_FLOW_KEY_PUT(match, tp.src, udp_key->udp_src, is_mask); SW_FLOW_KEY_PUT(match, tp.dst, udp_key->udp_dst, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_UDP); } if (attrs & (1 << OVS_KEY_ATTR_SCTP)) { const struct ovs_key_sctp *sctp_key; sctp_key = nla_data(a[OVS_KEY_ATTR_SCTP]); SW_FLOW_KEY_PUT(match, tp.src, sctp_key->sctp_src, is_mask); SW_FLOW_KEY_PUT(match, tp.dst, sctp_key->sctp_dst, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_SCTP); } if (attrs & (1 << OVS_KEY_ATTR_ICMP)) { const struct ovs_key_icmp *icmp_key; icmp_key = nla_data(a[OVS_KEY_ATTR_ICMP]); SW_FLOW_KEY_PUT(match, tp.src, htons(icmp_key->icmp_type), is_mask); SW_FLOW_KEY_PUT(match, tp.dst, htons(icmp_key->icmp_code), is_mask); attrs &= ~(1 << OVS_KEY_ATTR_ICMP); } if (attrs & (1 << OVS_KEY_ATTR_ICMPV6)) { const struct ovs_key_icmpv6 *icmpv6_key; icmpv6_key = nla_data(a[OVS_KEY_ATTR_ICMPV6]); SW_FLOW_KEY_PUT(match, tp.src, htons(icmpv6_key->icmpv6_type), is_mask); SW_FLOW_KEY_PUT(match, tp.dst, htons(icmpv6_key->icmpv6_code), is_mask); attrs &= ~(1 << OVS_KEY_ATTR_ICMPV6); } if (attrs & (1 << OVS_KEY_ATTR_ND)) { const struct ovs_key_nd *nd_key; nd_key = nla_data(a[OVS_KEY_ATTR_ND]); SW_FLOW_KEY_MEMCPY(match, ipv6.nd.target, nd_key->nd_target, sizeof(match->key->ipv6.nd.target), is_mask); SW_FLOW_KEY_MEMCPY(match, ipv6.nd.sll, nd_key->nd_sll, ETH_ALEN, is_mask); SW_FLOW_KEY_MEMCPY(match, ipv6.nd.tll, nd_key->nd_tll, ETH_ALEN, is_mask); attrs &= ~(1 << OVS_KEY_ATTR_ND); } if (attrs != 0) { OVS_NLERR(log, "Unknown key attributes %llx", (unsigned long long)attrs); return -EINVAL; } return 0; } static void nlattr_set(struct nlattr *attr, u8 val, const struct ovs_len_tbl *tbl) { struct nlattr *nla; int rem; /* The nlattr stream should already have been validated */ nla_for_each_nested(nla, attr, rem) { if (tbl[nla_type(nla)].len == OVS_ATTR_NESTED) nlattr_set(nla, val, tbl[nla_type(nla)].next ? : tbl); else memset(nla_data(nla), val, nla_len(nla)); if (nla_type(nla) == OVS_KEY_ATTR_CT_STATE) *(u32 *)nla_data(nla) &= CT_SUPPORTED_MASK; } } static void mask_set_nlattr(struct nlattr *attr, u8 val) { nlattr_set(attr, val, ovs_key_lens); } /** * ovs_nla_get_match - parses Netlink attributes into a flow key and * mask. In case the 'mask' is NULL, the flow is treated as exact match * flow. Otherwise, it is treated as a wildcarded flow, except the mask * does not include any don't care bit. * @net: Used to determine per-namespace field support. * @match: receives the extracted flow match information. * @nla_key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute * sequence. The fields should of the packet that triggered the creation * of this flow. * @nla_mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* * Netlink attribute specifies the mask field of the wildcarded flow. * @log: Boolean to allow kernel error logging. Normally true, but when * probing for feature compatibility this should be passed in as false to * suppress unnecessary error logging. */ int ovs_nla_get_match(struct net *net, struct sw_flow_match *match, const struct nlattr *nla_key, const struct nlattr *nla_mask, bool log) { const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; struct nlattr *newmask = NULL; u64 key_attrs = 0; u64 mask_attrs = 0; int err; err = parse_flow_nlattrs(nla_key, a, &key_attrs, log); if (err) return err; err = parse_vlan_from_nlattrs(match, &key_attrs, a, false, log); if (err) return err; err = ovs_key_from_nlattrs(net, match, key_attrs, a, false, log); if (err) return err; if (match->mask) { if (!nla_mask) { /* Create an exact match mask. We need to set to 0xff * all the 'match->mask' fields that have been touched * in 'match->key'. We cannot simply memset * 'match->mask', because padding bytes and fields not * specified in 'match->key' should be left to 0. * Instead, we use a stream of netlink attributes, * copied from 'key' and set to 0xff. * ovs_key_from_nlattrs() will take care of filling * 'match->mask' appropriately. */ newmask = kmemdup(nla_key, nla_total_size(nla_len(nla_key)), GFP_KERNEL); if (!newmask) return -ENOMEM; mask_set_nlattr(newmask, 0xff); /* The userspace does not send tunnel attributes that * are 0, but we should not wildcard them nonetheless. */ if (match->key->tun_proto) SW_FLOW_KEY_MEMSET_FIELD(match, tun_key, 0xff, true); nla_mask = newmask; } err = parse_flow_mask_nlattrs(nla_mask, a, &mask_attrs, log); if (err) goto free_newmask; /* Always match on tci. */ SW_FLOW_KEY_PUT(match, eth.vlan.tci, htons(0xffff), true); SW_FLOW_KEY_PUT(match, eth.cvlan.tci, htons(0xffff), true); err = parse_vlan_from_nlattrs(match, &mask_attrs, a, true, log); if (err) goto free_newmask; err = ovs_key_from_nlattrs(net, match, mask_attrs, a, true, log); if (err) goto free_newmask; } if (!match_validate(match, key_attrs, mask_attrs, log)) err = -EINVAL; free_newmask: kfree(newmask); return err; } static size_t get_ufid_len(const struct nlattr *attr, bool log) { size_t len; if (!attr) return 0; len = nla_len(attr); if (len < 1 || len > MAX_UFID_LENGTH) { OVS_NLERR(log, "ufid size %u bytes exceeds the range (1, %d)", nla_len(attr), MAX_UFID_LENGTH); return 0; } return len; } /* Initializes 'flow->ufid', returning true if 'attr' contains a valid UFID, * or false otherwise. */ bool ovs_nla_get_ufid(struct sw_flow_id *sfid, const struct nlattr *attr, bool log) { sfid->ufid_len = get_ufid_len(attr, log); if (sfid->ufid_len) memcpy(sfid->ufid, nla_data(attr), sfid->ufid_len); return sfid->ufid_len; } int ovs_nla_get_identifier(struct sw_flow_id *sfid, const struct nlattr *ufid, const struct sw_flow_key *key, bool log) { struct sw_flow_key *new_key; if (ovs_nla_get_ufid(sfid, ufid, log)) return 0; /* If UFID was not provided, use unmasked key. */ new_key = kmalloc(sizeof(*new_key), GFP_KERNEL); if (!new_key) return -ENOMEM; memcpy(new_key, key, sizeof(*key)); sfid->unmasked_key = new_key; return 0; } u32 ovs_nla_get_ufid_flags(const struct nlattr *attr) { return attr ? nla_get_u32(attr) : 0; } /** * ovs_nla_get_flow_metadata - parses Netlink attributes into a flow key. * @net: Network namespace. * @key: Receives extracted in_port, priority, tun_key, skb_mark and conntrack * metadata. * @a: Array of netlink attributes holding parsed %OVS_KEY_ATTR_* Netlink * attributes. * @attrs: Bit mask for the netlink attributes included in @a. * @log: Boolean to allow kernel error logging. Normally true, but when * probing for feature compatibility this should be passed in as false to * suppress unnecessary error logging. * * This parses a series of Netlink attributes that form a flow key, which must * take the same form accepted by flow_from_nlattrs(), but only enough of it to * get the metadata, that is, the parts of the flow key that cannot be * extracted from the packet itself. * * This must be called before the packet key fields are filled in 'key'. */ int ovs_nla_get_flow_metadata(struct net *net, const struct nlattr *a[OVS_KEY_ATTR_MAX + 1], u64 attrs, struct sw_flow_key *key, bool log) { struct sw_flow_match match; memset(&match, 0, sizeof(match)); match.key = key; key->ct_state = 0; key->ct_zone = 0; key->ct_orig_proto = 0; memset(&key->ct, 0, sizeof(key->ct)); memset(&key->ipv4.ct_orig, 0, sizeof(key->ipv4.ct_orig)); memset(&key->ipv6.ct_orig, 0, sizeof(key->ipv6.ct_orig)); key->phy.in_port = DP_MAX_PORTS; return metadata_from_nlattrs(net, &match, &attrs, a, false, log); } static int ovs_nla_put_vlan(struct sk_buff *skb, const struct vlan_head *vh, bool is_mask) { __be16 eth_type = !is_mask ? vh->tpid : htons(0xffff); if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) || nla_put_be16(skb, OVS_KEY_ATTR_VLAN, vh->tci)) return -EMSGSIZE; return 0; } static int nsh_key_to_nlattr(const struct ovs_key_nsh *nsh, bool is_mask, struct sk_buff *skb) { struct nlattr *start; start = nla_nest_start_noflag(skb, OVS_KEY_ATTR_NSH); if (!start) return -EMSGSIZE; if (nla_put(skb, OVS_NSH_KEY_ATTR_BASE, sizeof(nsh->base), &nsh->base)) goto nla_put_failure; if (is_mask || nsh->base.mdtype == NSH_M_TYPE1) { if (nla_put(skb, OVS_NSH_KEY_ATTR_MD1, sizeof(nsh->context), nsh->context)) goto nla_put_failure; } /* Don't support MD type 2 yet */ nla_nest_end(skb, start); return 0; nla_put_failure: return -EMSGSIZE; } static int __ovs_nla_put_key(const struct sw_flow_key *swkey, const struct sw_flow_key *output, bool is_mask, struct sk_buff *skb) { struct ovs_key_ethernet *eth_key; struct nlattr *nla; struct nlattr *encap = NULL; struct nlattr *in_encap = NULL; if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id)) goto nla_put_failure; if (nla_put_u32(skb, OVS_KEY_ATTR_DP_HASH, output->ovs_flow_hash)) goto nla_put_failure; if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority)) goto nla_put_failure; if ((swkey->tun_proto || is_mask)) { const void *opts = NULL; if (ip_tunnel_is_options_present(output->tun_key.tun_flags)) opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len); if (ip_tun_to_nlattr(skb, &output->tun_key, opts, swkey->tun_opts_len, swkey->tun_proto, 0)) goto nla_put_failure; } if (swkey->phy.in_port == DP_MAX_PORTS) { if (is_mask && (output->phy.in_port == 0xffff)) if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff)) goto nla_put_failure; } else { u16 upper_u16; upper_u16 = !is_mask ? 0 : 0xffff; if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, (upper_u16 << 16) | output->phy.in_port)) goto nla_put_failure; } if (nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, output->phy.skb_mark)) goto nla_put_failure; if (ovs_ct_put_key(swkey, output, skb)) goto nla_put_failure; if (ovs_key_mac_proto(swkey) == MAC_PROTO_ETHERNET) { nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key)); if (!nla) goto nla_put_failure; eth_key = nla_data(nla); ether_addr_copy(eth_key->eth_src, output->eth.src); ether_addr_copy(eth_key->eth_dst, output->eth.dst); if (swkey->eth.vlan.tci || eth_type_vlan(swkey->eth.type)) { if (ovs_nla_put_vlan(skb, &output->eth.vlan, is_mask)) goto nla_put_failure; encap = nla_nest_start_noflag(skb, OVS_KEY_ATTR_ENCAP); if (!swkey->eth.vlan.tci) goto unencap; if (swkey->eth.cvlan.tci || eth_type_vlan(swkey->eth.type)) { if (ovs_nla_put_vlan(skb, &output->eth.cvlan, is_mask)) goto nla_put_failure; in_encap = nla_nest_start_noflag(skb, OVS_KEY_ATTR_ENCAP); if (!swkey->eth.cvlan.tci) goto unencap; } } if (swkey->eth.type == htons(ETH_P_802_2)) { /* * Ethertype 802.2 is represented in the netlink with omitted * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and * 0xffff in the mask attribute. Ethertype can also * be wildcarded. */ if (is_mask && output->eth.type) if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type)) goto nla_put_failure; goto unencap; } } if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type)) goto nla_put_failure; if (eth_type_vlan(swkey->eth.type)) { /* There are 3 VLAN tags, we don't know anything about the rest * of the packet, so truncate here. */ WARN_ON_ONCE(!(encap && in_encap)); goto unencap; } if (swkey->eth.type == htons(ETH_P_IP)) { struct ovs_key_ipv4 *ipv4_key; nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4, sizeof(*ipv4_key)); if (!nla) goto nla_put_failure; ipv4_key = nla_data(nla); ipv4_key->ipv4_src = output->ipv4.addr.src; ipv4_key->ipv4_dst = output->ipv4.addr.dst; ipv4_key->ipv4_proto = output->ip.proto; ipv4_key->ipv4_tos = output->ip.tos; ipv4_key->ipv4_ttl = output->ip.ttl; ipv4_key->ipv4_frag = output->ip.frag; } else if (swkey->eth.type == htons(ETH_P_IPV6)) { struct ovs_key_ipv6 *ipv6_key; struct ovs_key_ipv6_exthdrs *ipv6_exthdrs_key; nla = nla_reserve(skb, OVS_KEY_ATTR_IPV6, sizeof(*ipv6_key)); if (!nla) goto nla_put_failure; ipv6_key = nla_data(nla); memcpy(ipv6_key->ipv6_src, &output->ipv6.addr.src, sizeof(ipv6_key->ipv6_src)); memcpy(ipv6_key->ipv6_dst, &output->ipv6.addr.dst, sizeof(ipv6_key->ipv6_dst)); ipv6_key->ipv6_label = output->ipv6.label; ipv6_key->ipv6_proto = output->ip.proto; ipv6_key->ipv6_tclass = output->ip.tos; ipv6_key->ipv6_hlimit = output->ip.ttl; ipv6_key->ipv6_frag = output->ip.frag; nla = nla_reserve(skb, OVS_KEY_ATTR_IPV6_EXTHDRS, sizeof(*ipv6_exthdrs_key)); if (!nla) goto nla_put_failure; ipv6_exthdrs_key = nla_data(nla); ipv6_exthdrs_key->hdrs = output->ipv6.exthdrs; } else if (swkey->eth.type == htons(ETH_P_NSH)) { if (nsh_key_to_nlattr(&output->nsh, is_mask, skb)) goto nla_put_failure; } else if (swkey->eth.type == htons(ETH_P_ARP) || swkey->eth.type == htons(ETH_P_RARP)) { struct ovs_key_arp *arp_key; nla = nla_reserve(skb, OVS_KEY_ATTR_ARP, sizeof(*arp_key)); if (!nla) goto nla_put_failure; arp_key = nla_data(nla); memset(arp_key, 0, sizeof(struct ovs_key_arp)); arp_key->arp_sip = output->ipv4.addr.src; arp_key->arp_tip = output->ipv4.addr.dst; arp_key->arp_op = htons(output->ip.proto); ether_addr_copy(arp_key->arp_sha, output->ipv4.arp.sha); ether_addr_copy(arp_key->arp_tha, output->ipv4.arp.tha); } else if (eth_p_mpls(swkey->eth.type)) { u8 i, num_labels; struct ovs_key_mpls *mpls_key; num_labels = hweight_long(output->mpls.num_labels_mask); nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, num_labels * sizeof(*mpls_key)); if (!nla) goto nla_put_failure; mpls_key = nla_data(nla); for (i = 0; i < num_labels; i++) mpls_key[i].mpls_lse = output->mpls.lse[i]; } if ((swkey->eth.type == htons(ETH_P_IP) || swkey->eth.type == htons(ETH_P_IPV6)) && swkey->ip.frag != OVS_FRAG_TYPE_LATER) { if (swkey->ip.proto == IPPROTO_TCP) { struct ovs_key_tcp *tcp_key; nla = nla_reserve(skb, OVS_KEY_ATTR_TCP, sizeof(*tcp_key)); if (!nla) goto nla_put_failure; tcp_key = nla_data(nla); tcp_key->tcp_src = output->tp.src; tcp_key->tcp_dst = output->tp.dst; if (nla_put_be16(skb, OVS_KEY_ATTR_TCP_FLAGS, output->tp.flags)) goto nla_put_failure; } else if (swkey->ip.proto == IPPROTO_UDP) { struct ovs_key_udp *udp_key; nla = nla_reserve(skb, OVS_KEY_ATTR_UDP, sizeof(*udp_key)); if (!nla) goto nla_put_failure; udp_key = nla_data(nla); udp_key->udp_src = output->tp.src; udp_key->udp_dst = output->tp.dst; } else if (swkey->ip.proto == IPPROTO_SCTP) { struct ovs_key_sctp *sctp_key; nla = nla_reserve(skb, OVS_KEY_ATTR_SCTP, sizeof(*sctp_key)); if (!nla) goto nla_put_failure; sctp_key = nla_data(nla); sctp_key->sctp_src = output->tp.src; sctp_key->sctp_dst = output->tp.dst; } else if (swkey->eth.type == htons(ETH_P_IP) && swkey->ip.proto == IPPROTO_ICMP) { struct ovs_key_icmp *icmp_key; nla = nla_reserve(skb, OVS_KEY_ATTR_ICMP, sizeof(*icmp_key)); if (!nla) goto nla_put_failure; icmp_key = nla_data(nla); icmp_key->icmp_type = ntohs(output->tp.src); icmp_key->icmp_code = ntohs(output->tp.dst); } else if (swkey->eth.type == htons(ETH_P_IPV6) && swkey->ip.proto == IPPROTO_ICMPV6) { struct ovs_key_icmpv6 *icmpv6_key; nla = nla_reserve(skb, OVS_KEY_ATTR_ICMPV6, sizeof(*icmpv6_key)); if (!nla) goto nla_put_failure; icmpv6_key = nla_data(nla); icmpv6_key->icmpv6_type = ntohs(output->tp.src); icmpv6_key->icmpv6_code = ntohs(output->tp.dst); if (swkey->tp.src == htons(NDISC_NEIGHBOUR_SOLICITATION) || swkey->tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) { struct ovs_key_nd *nd_key; nla = nla_reserve(skb, OVS_KEY_ATTR_ND, sizeof(*nd_key)); if (!nla) goto nla_put_failure; nd_key = nla_data(nla); memcpy(nd_key->nd_target, &output->ipv6.nd.target, sizeof(nd_key->nd_target)); ether_addr_copy(nd_key->nd_sll, output->ipv6.nd.sll); ether_addr_copy(nd_key->nd_tll, output->ipv6.nd.tll); } } } unencap: if (in_encap) nla_nest_end(skb, in_encap); if (encap) nla_nest_end(skb, encap); return 0; nla_put_failure: return -EMSGSIZE; } int ovs_nla_put_key(const struct sw_flow_key *swkey, const struct sw_flow_key *output, int attr, bool is_mask, struct sk_buff *skb) { int err; struct nlattr *nla; nla = nla_nest_start_noflag(skb, attr); if (!nla) return -EMSGSIZE; err = __ovs_nla_put_key(swkey, output, is_mask, skb); if (err) return err; nla_nest_end(skb, nla); return 0; } /* Called with ovs_mutex or RCU read lock. */ int ovs_nla_put_identifier(const struct sw_flow *flow, struct sk_buff *skb) { if (ovs_identifier_is_ufid(&flow->id)) return nla_put(skb, OVS_FLOW_ATTR_UFID, flow->id.ufid_len, flow->id.ufid); return ovs_nla_put_key(flow->id.unmasked_key, flow->id.unmasked_key, OVS_FLOW_ATTR_KEY, false, skb); } /* Called with ovs_mutex or RCU read lock. */ int ovs_nla_put_masked_key(const struct sw_flow *flow, struct sk_buff *skb) { return ovs_nla_put_key(&flow->key, &flow->key, OVS_FLOW_ATTR_KEY, false, skb); } /* Called with ovs_mutex or RCU read lock. */ int ovs_nla_put_mask(const struct sw_flow *flow, struct sk_buff *skb) { return ovs_nla_put_key(&flow->key, &flow->mask->key, OVS_FLOW_ATTR_MASK, true, skb); } #define MAX_ACTIONS_BUFSIZE (32 * 1024) static struct sw_flow_actions *nla_alloc_flow_actions(int size) { struct sw_flow_actions *sfa; WARN_ON_ONCE(size > MAX_ACTIONS_BUFSIZE); sfa = kmalloc(kmalloc_size_roundup(sizeof(*sfa) + size), GFP_KERNEL); if (!sfa) return ERR_PTR(-ENOMEM); sfa->actions_len = 0; return sfa; } static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len); static void ovs_nla_free_check_pkt_len_action(const struct nlattr *action) { const struct nlattr *a; int rem; nla_for_each_nested(a, action, rem) { switch (nla_type(a)) { case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL: case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); break; } } } static void ovs_nla_free_clone_action(const struct nlattr *action) { const struct nlattr *a = nla_data(action); int rem = nla_len(action); switch (nla_type(a)) { case OVS_CLONE_ATTR_EXEC: /* The real list of actions follows this attribute. */ a = nla_next(a, &rem); ovs_nla_free_nested_actions(a, rem); break; } } static void ovs_nla_free_dec_ttl_action(const struct nlattr *action) { const struct nlattr *a = nla_data(action); switch (nla_type(a)) { case OVS_DEC_TTL_ATTR_ACTION: ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); break; } } static void ovs_nla_free_sample_action(const struct nlattr *action) { const struct nlattr *a = nla_data(action); int rem = nla_len(action); switch (nla_type(a)) { case OVS_SAMPLE_ATTR_ARG: /* The real list of actions follows this attribute. */ a = nla_next(a, &rem); ovs_nla_free_nested_actions(a, rem); break; } } static void ovs_nla_free_set_action(const struct nlattr *a) { const struct nlattr *ovs_key = nla_data(a); struct ovs_tunnel_info *ovs_tun; switch (nla_type(ovs_key)) { case OVS_KEY_ATTR_TUNNEL_INFO: ovs_tun = nla_data(ovs_key); dst_release((struct dst_entry *)ovs_tun->tun_dst); break; } } static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len) { const struct nlattr *a; int rem; /* Whenever new actions are added, the need to update this * function should be considered. */ BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 25); if (!actions) return; nla_for_each_attr(a, actions, len, rem) { switch (nla_type(a)) { case OVS_ACTION_ATTR_CHECK_PKT_LEN: ovs_nla_free_check_pkt_len_action(a); break; case OVS_ACTION_ATTR_CLONE: ovs_nla_free_clone_action(a); break; case OVS_ACTION_ATTR_CT: ovs_ct_free_action(a); break; case OVS_ACTION_ATTR_DEC_TTL: ovs_nla_free_dec_ttl_action(a); break; case OVS_ACTION_ATTR_SAMPLE: ovs_nla_free_sample_action(a); break; case OVS_ACTION_ATTR_SET: ovs_nla_free_set_action(a); break; } } } void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) { if (!sf_acts) return; ovs_nla_free_nested_actions(sf_acts->actions, sf_acts->actions_len); kfree(sf_acts); } static void __ovs_nla_free_flow_actions(struct rcu_head *head) { ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, rcu)); } /* Schedules 'sf_acts' to be freed after the next RCU grace period. * The caller must hold rcu_read_lock for this to be sensible. */ void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts) { call_rcu(&sf_acts->rcu, __ovs_nla_free_flow_actions); } static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, int attr_len, bool log) { struct sw_flow_actions *acts; int new_acts_size; size_t req_size = NLA_ALIGN(attr_len); int next_offset = offsetof(struct sw_flow_actions, actions) + (*sfa)->actions_len; if (req_size <= (ksize(*sfa) - next_offset)) goto out; new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2); if (new_acts_size > MAX_ACTIONS_BUFSIZE) { if ((next_offset + req_size) > MAX_ACTIONS_BUFSIZE) { OVS_NLERR(log, "Flow action size exceeds max %u", MAX_ACTIONS_BUFSIZE); return ERR_PTR(-EMSGSIZE); } new_acts_size = MAX_ACTIONS_BUFSIZE; } acts = nla_alloc_flow_actions(new_acts_size); if (IS_ERR(acts)) return (void *)acts; memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len); acts->actions_len = (*sfa)->actions_len; acts->orig_len = (*sfa)->orig_len; kfree(*sfa); *sfa = acts; out: (*sfa)->actions_len += req_size; return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset); } static struct nlattr *__add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len, bool log) { struct nlattr *a; a = reserve_sfa_size(sfa, nla_attr_size(len), log); if (IS_ERR(a)) return a; a->nla_type = attrtype; a->nla_len = nla_attr_size(len); if (data) memcpy(nla_data(a), data, len); memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len)); return a; } int ovs_nla_add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len, bool log) { struct nlattr *a; a = __add_action(sfa, attrtype, data, len, log); return PTR_ERR_OR_ZERO(a); } static inline int add_nested_action_start(struct sw_flow_actions **sfa, int attrtype, bool log) { int used = (*sfa)->actions_len; int err; err = ovs_nla_add_action(sfa, attrtype, NULL, 0, log); if (err) return err; return used; } static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_offset) { struct nlattr *a = (struct nlattr *) ((unsigned char *)sfa->actions + st_offset); a->nla_len = sfa->actions_len - st_offset; } static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, u32 depth); static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, bool last, u32 depth) { const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; const struct nlattr *probability, *actions; const struct nlattr *a; int rem, start, err; struct sample_arg arg; memset(attrs, 0, sizeof(attrs)); nla_for_each_nested(a, attr, rem) { int type = nla_type(a); if (!type || type > OVS_SAMPLE_ATTR_MAX || attrs[type]) return -EINVAL; attrs[type] = a; } if (rem) return -EINVAL; probability = attrs[OVS_SAMPLE_ATTR_PROBABILITY]; if (!probability || nla_len(probability) != sizeof(u32)) return -EINVAL; actions = attrs[OVS_SAMPLE_ATTR_ACTIONS]; if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN)) return -EINVAL; /* validation done, copy sample action. */ start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log); if (start < 0) return start; /* When both skb and flow may be changed, put the sample * into a deferred fifo. On the other hand, if only skb * may be modified, the actions can be executed in place. * * Do this analysis at the flow installation time. * Set 'clone_action->exec' to true if the actions can be * executed without being deferred. * * If the sample is the last action, it can always be excuted * rather than deferred. */ arg.exec = last || !actions_may_change_flow(actions); arg.probability = nla_get_u32(probability); err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg), log); if (err) return err; err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth + 1); if (err) return err; add_nested_action_end(*sfa, start); return 0; } static int validate_and_copy_dec_ttl(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, u32 depth) { const struct nlattr *attrs[OVS_DEC_TTL_ATTR_MAX + 1]; int start, action_start, err, rem; const struct nlattr *a, *actions; memset(attrs, 0, sizeof(attrs)); nla_for_each_nested(a, attr, rem) { int type = nla_type(a); /* Ignore unknown attributes to be future proof. */ if (type > OVS_DEC_TTL_ATTR_MAX) continue; if (!type || attrs[type]) { OVS_NLERR(log, "Duplicate or invalid key (type %d).", type); return -EINVAL; } attrs[type] = a; } if (rem) { OVS_NLERR(log, "Message has %d unknown bytes.", rem); return -EINVAL; } actions = attrs[OVS_DEC_TTL_ATTR_ACTION]; if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN)) { OVS_NLERR(log, "Missing valid actions attribute."); return -EINVAL; } start = add_nested_action_start(sfa, OVS_ACTION_ATTR_DEC_TTL, log); if (start < 0) return start; action_start = add_nested_action_start(sfa, OVS_DEC_TTL_ATTR_ACTION, log); if (action_start < 0) return action_start; err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth + 1); if (err) return err; add_nested_action_end(*sfa, action_start); add_nested_action_end(*sfa, start); return 0; } static int validate_and_copy_clone(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, bool last, u32 depth) { int start, err; u32 exec; if (nla_len(attr) && nla_len(attr) < NLA_HDRLEN) return -EINVAL; start = add_nested_action_start(sfa, OVS_ACTION_ATTR_CLONE, log); if (start < 0) return start; exec = last || !actions_may_change_flow(attr); err = ovs_nla_add_action(sfa, OVS_CLONE_ATTR_EXEC, &exec, sizeof(exec), log); if (err) return err; err = __ovs_nla_copy_actions(net, attr, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth + 1); if (err) return err; add_nested_action_end(*sfa, start); return 0; } void ovs_match_init(struct sw_flow_match *match, struct sw_flow_key *key, bool reset_key, struct sw_flow_mask *mask) { memset(match, 0, sizeof(*match)); match->key = key; match->mask = mask; if (reset_key) memset(key, 0, sizeof(*key)); if (mask) { memset(&mask->key, 0, sizeof(mask->key)); mask->range.start = mask->range.end = 0; } } static int validate_geneve_opts(struct sw_flow_key *key) { struct geneve_opt *option; int opts_len = key->tun_opts_len; bool crit_opt = false; option = (struct geneve_opt *)TUN_METADATA_OPTS(key, key->tun_opts_len); while (opts_len > 0) { int len; if (opts_len < sizeof(*option)) return -EINVAL; len = sizeof(*option) + option->length * 4; if (len > opts_len) return -EINVAL; crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE); option = (struct geneve_opt *)((u8 *)option + len); opts_len -= len; } if (crit_opt) __set_bit(IP_TUNNEL_CRIT_OPT_BIT, key->tun_key.tun_flags); return 0; } static int validate_and_copy_set_tun(const struct nlattr *attr, struct sw_flow_actions **sfa, bool log) { IP_TUNNEL_DECLARE_FLAGS(dst_opt_type) = { }; struct sw_flow_match match; struct sw_flow_key key; struct metadata_dst *tun_dst; struct ip_tunnel_info *tun_info; struct ovs_tunnel_info *ovs_tun; struct nlattr *a; int err = 0, start, opts_type; ovs_match_init(&match, &key, true, NULL); opts_type = ip_tun_from_nlattr(nla_data(attr), &match, false, log); if (opts_type < 0) return opts_type; if (key.tun_opts_len) { switch (opts_type) { case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS: err = validate_geneve_opts(&key); if (err < 0) return err; __set_bit(IP_TUNNEL_GENEVE_OPT_BIT, dst_opt_type); break; case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS: __set_bit(IP_TUNNEL_VXLAN_OPT_BIT, dst_opt_type); break; case OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS: __set_bit(IP_TUNNEL_ERSPAN_OPT_BIT, dst_opt_type); break; } } start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET, log); if (start < 0) return start; tun_dst = metadata_dst_alloc(key.tun_opts_len, METADATA_IP_TUNNEL, GFP_KERNEL); if (!tun_dst) return -ENOMEM; err = dst_cache_init(&tun_dst->u.tun_info.dst_cache, GFP_KERNEL); if (err) { dst_release((struct dst_entry *)tun_dst); return err; } a = __add_action(sfa, OVS_KEY_ATTR_TUNNEL_INFO, NULL, sizeof(*ovs_tun), log); if (IS_ERR(a)) { dst_release((struct dst_entry *)tun_dst); return PTR_ERR(a); } ovs_tun = nla_data(a); ovs_tun->tun_dst = tun_dst; tun_info = &tun_dst->u.tun_info; tun_info->mode = IP_TUNNEL_INFO_TX; if (key.tun_proto == AF_INET6) tun_info->mode |= IP_TUNNEL_INFO_IPV6; else if (key.tun_proto == AF_INET && key.tun_key.u.ipv4.dst == 0) tun_info->mode |= IP_TUNNEL_INFO_BRIDGE; tun_info->key = key.tun_key; /* We need to store the options in the action itself since * everything else will go away after flow setup. We can append * it to tun_info and then point there. */ ip_tunnel_info_opts_set(tun_info, TUN_METADATA_OPTS(&key, key.tun_opts_len), key.tun_opts_len, dst_opt_type); add_nested_action_end(*sfa, start); return err; } static bool validate_nsh(const struct nlattr *attr, bool is_mask, bool is_push_nsh, bool log) { struct sw_flow_match match; struct sw_flow_key key; int ret = 0; ovs_match_init(&match, &key, true, NULL); ret = nsh_key_put_from_nlattr(attr, &match, is_mask, is_push_nsh, log); return !ret; } /* Return false if there are any non-masked bits set. * Mask follows data immediately, before any netlink padding. */ static bool validate_masked(u8 *data, int len) { u8 *mask = data + len; while (len--) if (*data++ & ~*mask++) return false; return true; } static int validate_set(const struct nlattr *a, const struct sw_flow_key *flow_key, struct sw_flow_actions **sfa, bool *skip_copy, u8 mac_proto, __be16 eth_type, bool masked, bool log) { const struct nlattr *ovs_key = nla_data(a); int key_type = nla_type(ovs_key); size_t key_len; /* There can be only one key in a action */ if (nla_total_size(nla_len(ovs_key)) != nla_len(a)) return -EINVAL; key_len = nla_len(ovs_key); if (masked) key_len /= 2; if (key_type > OVS_KEY_ATTR_MAX || !check_attr_len(key_len, ovs_key_lens[key_type].len)) return -EINVAL; if (masked && !validate_masked(nla_data(ovs_key), key_len)) return -EINVAL; switch (key_type) { case OVS_KEY_ATTR_PRIORITY: case OVS_KEY_ATTR_SKB_MARK: case OVS_KEY_ATTR_CT_MARK: case OVS_KEY_ATTR_CT_LABELS: break; case OVS_KEY_ATTR_ETHERNET: if (mac_proto != MAC_PROTO_ETHERNET) return -EINVAL; break; case OVS_KEY_ATTR_TUNNEL: { int err; if (masked) return -EINVAL; /* Masked tunnel set not supported. */ *skip_copy = true; err = validate_and_copy_set_tun(a, sfa, log); if (err) return err; break; } case OVS_KEY_ATTR_IPV4: { const struct ovs_key_ipv4 *ipv4_key; if (eth_type != htons(ETH_P_IP)) return -EINVAL; ipv4_key = nla_data(ovs_key); if (masked) { const struct ovs_key_ipv4 *mask = ipv4_key + 1; /* Non-writeable fields. */ if (mask->ipv4_proto || mask->ipv4_frag) return -EINVAL; } else { if (ipv4_key->ipv4_proto != flow_key->ip.proto) return -EINVAL; if (ipv4_key->ipv4_frag != flow_key->ip.frag) return -EINVAL; } break; } case OVS_KEY_ATTR_IPV6: { const struct ovs_key_ipv6 *ipv6_key; if (eth_type != htons(ETH_P_IPV6)) return -EINVAL; ipv6_key = nla_data(ovs_key); if (masked) { const struct ovs_key_ipv6 *mask = ipv6_key + 1; /* Non-writeable fields. */ if (mask->ipv6_proto || mask->ipv6_frag) return -EINVAL; /* Invalid bits in the flow label mask? */ if (ntohl(mask->ipv6_label) & 0xFFF00000) return -EINVAL; } else { if (ipv6_key->ipv6_proto != flow_key->ip.proto) return -EINVAL; if (ipv6_key->ipv6_frag != flow_key->ip.frag) return -EINVAL; } if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000) return -EINVAL; break; } case OVS_KEY_ATTR_TCP: if ((eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6)) || flow_key->ip.proto != IPPROTO_TCP) return -EINVAL; break; case OVS_KEY_ATTR_UDP: if ((eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6)) || flow_key->ip.proto != IPPROTO_UDP) return -EINVAL; break; case OVS_KEY_ATTR_MPLS: if (!eth_p_mpls(eth_type)) return -EINVAL; break; case OVS_KEY_ATTR_SCTP: if ((eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6)) || flow_key->ip.proto != IPPROTO_SCTP) return -EINVAL; break; case OVS_KEY_ATTR_NSH: if (eth_type != htons(ETH_P_NSH)) return -EINVAL; if (!validate_nsh(nla_data(a), masked, false, log)) return -EINVAL; break; default: return -EINVAL; } /* Convert non-masked non-tunnel set actions to masked set actions. */ if (!masked && key_type != OVS_KEY_ATTR_TUNNEL) { int start, len = key_len * 2; struct nlattr *at; *skip_copy = true; start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET_TO_MASKED, log); if (start < 0) return start; at = __add_action(sfa, key_type, NULL, len, log); if (IS_ERR(at)) return PTR_ERR(at); memcpy(nla_data(at), nla_data(ovs_key), key_len); /* Key. */ memset(nla_data(at) + key_len, 0xff, key_len); /* Mask. */ /* Clear non-writeable bits from otherwise writeable fields. */ if (key_type == OVS_KEY_ATTR_IPV6) { struct ovs_key_ipv6 *mask = nla_data(at) + key_len; mask->ipv6_label &= htonl(0x000FFFFF); } add_nested_action_end(*sfa, start); } return 0; } static int validate_userspace(const struct nlattr *attr) { static const struct nla_policy userspace_policy[OVS_USERSPACE_ATTR_MAX + 1] = { [OVS_USERSPACE_ATTR_PID] = {.type = NLA_U32 }, [OVS_USERSPACE_ATTR_USERDATA] = {.type = NLA_UNSPEC }, [OVS_USERSPACE_ATTR_EGRESS_TUN_PORT] = {.type = NLA_U32 }, }; struct nlattr *a[OVS_USERSPACE_ATTR_MAX + 1]; int error; error = nla_parse_nested_deprecated(a, OVS_USERSPACE_ATTR_MAX, attr, userspace_policy, NULL); if (error) return error; if (!a[OVS_USERSPACE_ATTR_PID] || !nla_get_u32(a[OVS_USERSPACE_ATTR_PID])) return -EINVAL; return 0; } static const struct nla_policy cpl_policy[OVS_CHECK_PKT_LEN_ATTR_MAX + 1] = { [OVS_CHECK_PKT_LEN_ATTR_PKT_LEN] = {.type = NLA_U16 }, [OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER] = {.type = NLA_NESTED }, [OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL] = {.type = NLA_NESTED }, }; static int validate_and_copy_check_pkt_len(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, bool last, u32 depth) { const struct nlattr *acts_if_greater, *acts_if_lesser_eq; struct nlattr *a[OVS_CHECK_PKT_LEN_ATTR_MAX + 1]; struct check_pkt_len_arg arg; int nested_acts_start; int start, err; err = nla_parse_deprecated_strict(a, OVS_CHECK_PKT_LEN_ATTR_MAX, nla_data(attr), nla_len(attr), cpl_policy, NULL); if (err) return err; if (!a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN] || !nla_get_u16(a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN])) return -EINVAL; acts_if_lesser_eq = a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL]; acts_if_greater = a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER]; /* Both the nested action should be present. */ if (!acts_if_greater || !acts_if_lesser_eq) return -EINVAL; /* validation done, copy the nested actions. */ start = add_nested_action_start(sfa, OVS_ACTION_ATTR_CHECK_PKT_LEN, log); if (start < 0) return start; arg.pkt_len = nla_get_u16(a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN]); arg.exec_for_lesser_equal = last || !actions_may_change_flow(acts_if_lesser_eq); arg.exec_for_greater = last || !actions_may_change_flow(acts_if_greater); err = ovs_nla_add_action(sfa, OVS_CHECK_PKT_LEN_ATTR_ARG, &arg, sizeof(arg), log); if (err) return err; nested_acts_start = add_nested_action_start(sfa, OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL, log); if (nested_acts_start < 0) return nested_acts_start; err = __ovs_nla_copy_actions(net, acts_if_lesser_eq, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth + 1); if (err) return err; add_nested_action_end(*sfa, nested_acts_start); nested_acts_start = add_nested_action_start(sfa, OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER, log); if (nested_acts_start < 0) return nested_acts_start; err = __ovs_nla_copy_actions(net, acts_if_greater, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth + 1); if (err) return err; add_nested_action_end(*sfa, nested_acts_start); add_nested_action_end(*sfa, start); return 0; } static int validate_psample(const struct nlattr *attr) { static const struct nla_policy policy[OVS_PSAMPLE_ATTR_MAX + 1] = { [OVS_PSAMPLE_ATTR_GROUP] = { .type = NLA_U32 }, [OVS_PSAMPLE_ATTR_COOKIE] = { .type = NLA_BINARY, .len = OVS_PSAMPLE_COOKIE_MAX_SIZE, }, }; struct nlattr *a[OVS_PSAMPLE_ATTR_MAX + 1]; int err; if (!IS_ENABLED(CONFIG_PSAMPLE)) return -EOPNOTSUPP; err = nla_parse_nested(a, OVS_PSAMPLE_ATTR_MAX, attr, policy, NULL); if (err) return err; return a[OVS_PSAMPLE_ATTR_GROUP] ? 0 : -EINVAL; } static int copy_action(const struct nlattr *from, struct sw_flow_actions **sfa, bool log) { int totlen = NLA_ALIGN(from->nla_len); struct nlattr *to; to = reserve_sfa_size(sfa, from->nla_len, log); if (IS_ERR(to)) return PTR_ERR(to); memcpy(to, from, totlen); return 0; } static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, bool log, u32 depth) { u8 mac_proto = ovs_key_mac_proto(key); const struct nlattr *a; int rem, err; if (depth > OVS_COPY_ACTIONS_MAX_DEPTH) return -EOVERFLOW; nla_for_each_nested(a, attr, rem) { /* Expected argument lengths, (u32)-1 for variable length. */ static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32), [OVS_ACTION_ATTR_RECIRC] = sizeof(u32), [OVS_ACTION_ATTR_USERSPACE] = (u32)-1, [OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls), [OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16), [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan), [OVS_ACTION_ATTR_POP_VLAN] = 0, [OVS_ACTION_ATTR_SET] = (u32)-1, [OVS_ACTION_ATTR_SET_MASKED] = (u32)-1, [OVS_ACTION_ATTR_SAMPLE] = (u32)-1, [OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash), [OVS_ACTION_ATTR_CT] = (u32)-1, [OVS_ACTION_ATTR_CT_CLEAR] = 0, [OVS_ACTION_ATTR_TRUNC] = sizeof(struct ovs_action_trunc), [OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth), [OVS_ACTION_ATTR_POP_ETH] = 0, [OVS_ACTION_ATTR_PUSH_NSH] = (u32)-1, [OVS_ACTION_ATTR_POP_NSH] = 0, [OVS_ACTION_ATTR_METER] = sizeof(u32), [OVS_ACTION_ATTR_CLONE] = (u32)-1, [OVS_ACTION_ATTR_CHECK_PKT_LEN] = (u32)-1, [OVS_ACTION_ATTR_ADD_MPLS] = sizeof(struct ovs_action_add_mpls), [OVS_ACTION_ATTR_DEC_TTL] = (u32)-1, [OVS_ACTION_ATTR_DROP] = sizeof(u32), [OVS_ACTION_ATTR_PSAMPLE] = (u32)-1, }; const struct ovs_action_push_vlan *vlan; int type = nla_type(a); bool skip_copy; if (type > OVS_ACTION_ATTR_MAX || (action_lens[type] != nla_len(a) && action_lens[type] != (u32)-1)) return -EINVAL; skip_copy = false; switch (type) { case OVS_ACTION_ATTR_UNSPEC: return -EINVAL; case OVS_ACTION_ATTR_USERSPACE: err = validate_userspace(a); if (err) return err; break; case OVS_ACTION_ATTR_OUTPUT: if (nla_get_u32(a) >= DP_MAX_PORTS) return -EINVAL; break; case OVS_ACTION_ATTR_TRUNC: { const struct ovs_action_trunc *trunc = nla_data(a); if (trunc->max_len < ETH_HLEN) return -EINVAL; break; } case OVS_ACTION_ATTR_HASH: { const struct ovs_action_hash *act_hash = nla_data(a); switch (act_hash->hash_alg) { case OVS_HASH_ALG_L4: fallthrough; case OVS_HASH_ALG_SYM_L4: break; default: return -EINVAL; } break; } case OVS_ACTION_ATTR_POP_VLAN: if (mac_proto != MAC_PROTO_ETHERNET) return -EINVAL; vlan_tci = htons(0); break; case OVS_ACTION_ATTR_PUSH_VLAN: if (mac_proto != MAC_PROTO_ETHERNET) return -EINVAL; vlan = nla_data(a); if (!eth_type_vlan(vlan->vlan_tpid)) return -EINVAL; if (!(vlan->vlan_tci & htons(VLAN_CFI_MASK))) return -EINVAL; vlan_tci = vlan->vlan_tci; break; case OVS_ACTION_ATTR_RECIRC: break; case OVS_ACTION_ATTR_ADD_MPLS: { const struct ovs_action_add_mpls *mpls = nla_data(a); if (!eth_p_mpls(mpls->mpls_ethertype)) return -EINVAL; if (mpls->tun_flags & OVS_MPLS_L3_TUNNEL_FLAG_MASK) { if (vlan_tci & htons(VLAN_CFI_MASK) || (eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6) && eth_type != htons(ETH_P_ARP) && eth_type != htons(ETH_P_RARP) && !eth_p_mpls(eth_type))) return -EINVAL; mpls_label_count++; } else { if (mac_proto == MAC_PROTO_ETHERNET) { mpls_label_count = 1; mac_proto = MAC_PROTO_NONE; } else { mpls_label_count++; } } eth_type = mpls->mpls_ethertype; break; } case OVS_ACTION_ATTR_PUSH_MPLS: { const struct ovs_action_push_mpls *mpls = nla_data(a); if (!eth_p_mpls(mpls->mpls_ethertype)) return -EINVAL; /* Prohibit push MPLS other than to a white list * for packets that have a known tag order. */ if (vlan_tci & htons(VLAN_CFI_MASK) || (eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6) && eth_type != htons(ETH_P_ARP) && eth_type != htons(ETH_P_RARP) && !eth_p_mpls(eth_type))) return -EINVAL; eth_type = mpls->mpls_ethertype; mpls_label_count++; break; } case OVS_ACTION_ATTR_POP_MPLS: { __be16 proto; if (vlan_tci & htons(VLAN_CFI_MASK) || !eth_p_mpls(eth_type)) return -EINVAL; /* Disallow subsequent L2.5+ set actions and mpls_pop * actions once the last MPLS label in the packet is * popped as there is no check here to ensure that * the new eth type is valid and thus set actions could * write off the end of the packet or otherwise corrupt * it. * * Support for these actions is planned using packet * recirculation. */ proto = nla_get_be16(a); if (proto == htons(ETH_P_TEB) && mac_proto != MAC_PROTO_NONE) return -EINVAL; mpls_label_count--; if (!eth_p_mpls(proto) || !mpls_label_count) eth_type = htons(0); else eth_type = proto; break; } case OVS_ACTION_ATTR_SET: err = validate_set(a, key, sfa, &skip_copy, mac_proto, eth_type, false, log); if (err) return err; break; case OVS_ACTION_ATTR_SET_MASKED: err = validate_set(a, key, sfa, &skip_copy, mac_proto, eth_type, true, log); if (err) return err; break; case OVS_ACTION_ATTR_SAMPLE: { bool last = nla_is_last(a, rem); err = validate_and_copy_sample(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, log, last, depth); if (err) return err; skip_copy = true; break; } case OVS_ACTION_ATTR_CT: err = ovs_ct_copy_action(net, a, key, sfa, log); if (err) return err; skip_copy = true; break; case OVS_ACTION_ATTR_CT_CLEAR: break; case OVS_ACTION_ATTR_PUSH_ETH: /* Disallow pushing an Ethernet header if one * is already present */ if (mac_proto != MAC_PROTO_NONE) return -EINVAL; mac_proto = MAC_PROTO_ETHERNET; break; case OVS_ACTION_ATTR_POP_ETH: if (mac_proto != MAC_PROTO_ETHERNET) return -EINVAL; if (vlan_tci & htons(VLAN_CFI_MASK)) return -EINVAL; mac_proto = MAC_PROTO_NONE; break; case OVS_ACTION_ATTR_PUSH_NSH: if (mac_proto != MAC_PROTO_ETHERNET) { u8 next_proto; next_proto = tun_p_from_eth_p(eth_type); if (!next_proto) return -EINVAL; } mac_proto = MAC_PROTO_NONE; if (!validate_nsh(nla_data(a), false, true, true)) return -EINVAL; break; case OVS_ACTION_ATTR_POP_NSH: { __be16 inner_proto; if (eth_type != htons(ETH_P_NSH)) return -EINVAL; inner_proto = tun_p_to_eth_p(key->nsh.base.np); if (!inner_proto) return -EINVAL; if (key->nsh.base.np == TUN_P_ETHERNET) mac_proto = MAC_PROTO_ETHERNET; else mac_proto = MAC_PROTO_NONE; break; } case OVS_ACTION_ATTR_METER: /* Non-existent meters are simply ignored. */ break; case OVS_ACTION_ATTR_CLONE: { bool last = nla_is_last(a, rem); err = validate_and_copy_clone(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, log, last, depth); if (err) return err; skip_copy = true; break; } case OVS_ACTION_ATTR_CHECK_PKT_LEN: { bool last = nla_is_last(a, rem); err = validate_and_copy_check_pkt_len(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, log, last, depth); if (err) return err; skip_copy = true; break; } case OVS_ACTION_ATTR_DEC_TTL: err = validate_and_copy_dec_ttl(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, log, depth); if (err) return err; skip_copy = true; break; case OVS_ACTION_ATTR_DROP: if (!nla_is_last(a, rem)) return -EINVAL; break; case OVS_ACTION_ATTR_PSAMPLE: err = validate_psample(a); if (err) return err; break; default: OVS_NLERR(log, "Unknown Action type %d", type); return -EINVAL; } if (!skip_copy) { err = copy_action(a, sfa, log); if (err) return err; } } if (rem > 0) return -EINVAL; return 0; } /* 'key' must be the masked key. */ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, bool log) { int err; u32 mpls_label_count = 0; *sfa = nla_alloc_flow_actions(min(nla_len(attr), MAX_ACTIONS_BUFSIZE)); if (IS_ERR(*sfa)) return PTR_ERR(*sfa); if (eth_p_mpls(key->eth.type)) mpls_label_count = hweight_long(key->mpls.num_labels_mask); (*sfa)->orig_len = nla_len(attr); err = __ovs_nla_copy_actions(net, attr, key, sfa, key->eth.type, key->eth.vlan.tci, mpls_label_count, log, 0); if (err) ovs_nla_free_flow_actions(*sfa); return err; } static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) { struct nlattr *start, *ac_start = NULL, *sample_arg; int err = 0, rem = nla_len(attr); const struct sample_arg *arg; struct nlattr *actions; start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_SAMPLE); if (!start) return -EMSGSIZE; sample_arg = nla_data(attr); arg = nla_data(sample_arg); actions = nla_next(sample_arg, &rem); if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability)) { err = -EMSGSIZE; goto out; } ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS); if (!ac_start) { err = -EMSGSIZE; goto out; } err = ovs_nla_put_actions(actions, rem, skb); out: if (err) { nla_nest_cancel(skb, ac_start); nla_nest_cancel(skb, start); } else { nla_nest_end(skb, ac_start); nla_nest_end(skb, start); } return err; } static int clone_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) { struct nlattr *start; int err = 0, rem = nla_len(attr); start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_CLONE); if (!start) return -EMSGSIZE; /* Skipping the OVS_CLONE_ATTR_EXEC that is always the first attribute. */ attr = nla_next(nla_data(attr), &rem); err = ovs_nla_put_actions(attr, rem, skb); if (err) nla_nest_cancel(skb, start); else nla_nest_end(skb, start); return err; } static int check_pkt_len_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) { struct nlattr *start, *ac_start = NULL; const struct check_pkt_len_arg *arg; const struct nlattr *a, *cpl_arg; int err = 0, rem = nla_len(attr); start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_CHECK_PKT_LEN); if (!start) return -EMSGSIZE; /* The first nested attribute in 'attr' is always * 'OVS_CHECK_PKT_LEN_ATTR_ARG'. */ cpl_arg = nla_data(attr); arg = nla_data(cpl_arg); if (nla_put_u16(skb, OVS_CHECK_PKT_LEN_ATTR_PKT_LEN, arg->pkt_len)) { err = -EMSGSIZE; goto out; } /* Second nested attribute in 'attr' is always * 'OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL'. */ a = nla_next(cpl_arg, &rem); ac_start = nla_nest_start_noflag(skb, OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL); if (!ac_start) { err = -EMSGSIZE; goto out; } err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb); if (err) { nla_nest_cancel(skb, ac_start); goto out; } else { nla_nest_end(skb, ac_start); } /* Third nested attribute in 'attr' is always * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER. */ a = nla_next(a, &rem); ac_start = nla_nest_start_noflag(skb, OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER); if (!ac_start) { err = -EMSGSIZE; goto out; } err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb); if (err) { nla_nest_cancel(skb, ac_start); goto out; } else { nla_nest_end(skb, ac_start); } nla_nest_end(skb, start); return 0; out: nla_nest_cancel(skb, start); return err; } static int dec_ttl_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) { struct nlattr *start, *action_start; const struct nlattr *a; int err = 0, rem; start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_DEC_TTL); if (!start) return -EMSGSIZE; nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) { switch (nla_type(a)) { case OVS_DEC_TTL_ATTR_ACTION: action_start = nla_nest_start_noflag(skb, OVS_DEC_TTL_ATTR_ACTION); if (!action_start) { err = -EMSGSIZE; goto out; } err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb); if (err) goto out; nla_nest_end(skb, action_start); break; default: /* Ignore all other option to be future compatible */ break; } } nla_nest_end(skb, start); return 0; out: nla_nest_cancel(skb, start); return err; } static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb) { const struct nlattr *ovs_key = nla_data(a); int key_type = nla_type(ovs_key); struct nlattr *start; int err; switch (key_type) { case OVS_KEY_ATTR_TUNNEL_INFO: { struct ovs_tunnel_info *ovs_tun = nla_data(ovs_key); struct ip_tunnel_info *tun_info = &ovs_tun->tun_dst->u.tun_info; start = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_SET); if (!start) return -EMSGSIZE; err = ip_tun_to_nlattr(skb, &tun_info->key, ip_tunnel_info_opts(tun_info), tun_info->options_len, ip_tunnel_info_af(tun_info), tun_info->mode); if (err) return err; nla_nest_end(skb, start); break; } default: if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key)) return -EMSGSIZE; break; } return 0; } static int masked_set_action_to_set_action_attr(const struct nlattr *a, struct sk_buff *skb) { const struct nlattr *ovs_key = nla_data(a); struct nlattr *nla; size_t key_len = nla_len(ovs_key) / 2; /* Revert the conversion we did from a non-masked set action to * masked set action. */ nla = nla_nest_start_noflag(skb, OVS_ACTION_ATTR_SET); if (!nla) return -EMSGSIZE; if (nla_put(skb, nla_type(ovs_key), key_len, nla_data(ovs_key))) return -EMSGSIZE; nla_nest_end(skb, nla); return 0; } int ovs_nla_put_actions(const struct nlattr *attr, int len, struct sk_buff *skb) { const struct nlattr *a; int rem, err; nla_for_each_attr(a, attr, len, rem) { int type = nla_type(a); switch (type) { case OVS_ACTION_ATTR_SET: err = set_action_to_attr(a, skb); if (err) return err; break; case OVS_ACTION_ATTR_SET_TO_MASKED: err = masked_set_action_to_set_action_attr(a, skb); if (err) return err; break; case OVS_ACTION_ATTR_SAMPLE: err = sample_action_to_attr(a, skb); if (err) return err; break; case OVS_ACTION_ATTR_CT: err = ovs_ct_action_to_attr(nla_data(a), skb); if (err) return err; break; case OVS_ACTION_ATTR_CLONE: err = clone_action_to_attr(a, skb); if (err) return err; break; case OVS_ACTION_ATTR_CHECK_PKT_LEN: err = check_pkt_len_action_to_attr(a, skb); if (err) return err; break; case OVS_ACTION_ATTR_DEC_TTL: err = dec_ttl_action_to_attr(a, skb); if (err) return err; break; default: if (nla_put(skb, type, nla_len(a), nla_data(a))) return -EMSGSIZE; break; } } return 0; } |
| 85 79 2 6 79 10 75 60 3 10 2 4 71 17 15 5 12 3 3 10 10 63 28 6 15 4 85 29 63 37 9 28 9 75 37 30 25 29 4 10 15 30 12 2 3 13 13 5 1 4 4 26 13 1 2 10 1 9 1 8 8 7 1 8 7 1 5 3 5 4 4 8 5 2 1 4 12 4 8 5 5 2 16 16 16 20 4 16 4 12 27 27 27 28 30 30 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/fs/msdos/namei.c * * Written 1992,1993 by Werner Almesberger * Hidden files 1995 by Albert Cahalan <albert@ccs.neu.edu> <adc@coe.neu.edu> * Rewritten for constant inumbers 1999 by Al Viro */ #include <linux/module.h> #include <linux/iversion.h> #include "fat.h" /* Characters that are undesirable in an MS-DOS file name */ static unsigned char bad_chars[] = "*?<>|\""; static unsigned char bad_if_strict[] = "+=,; "; /***** Formats an MS-DOS file name. Rejects invalid names. */ static int msdos_format_name(const unsigned char *name, int len, unsigned char *res, struct fat_mount_options *opts) /* * name is the proposed name, len is its length, res is * the resulting name, opts->name_check is either (r)elaxed, * (n)ormal or (s)trict, opts->dotsOK allows dots at the * beginning of name (for hidden files) */ { unsigned char *walk; unsigned char c; int space; if (name[0] == '.') { /* dotfile because . and .. already done */ if (opts->dotsOK) { /* Get rid of dot - test for it elsewhere */ name++; len--; } else return -EINVAL; } /* * disallow names that _really_ start with a dot */ space = 1; c = 0; for (walk = res; len && walk - res < 8; walk++) { c = *name++; len--; if (opts->name_check != 'r' && strchr(bad_chars, c)) return -EINVAL; if (opts->name_check == 's' && strchr(bad_if_strict, c)) return -EINVAL; if (c >= 'A' && c <= 'Z' && opts->name_check == 's') return -EINVAL; if (c < ' ' || c == ':' || c == '\\') return -EINVAL; /* * 0xE5 is legal as a first character, but we must substitute * 0x05 because 0xE5 marks deleted files. Yes, DOS really * does this. * It seems that Microsoft hacked DOS to support non-US * characters after the 0xE5 character was already in use to * mark deleted files. */ if ((res == walk) && (c == 0xE5)) c = 0x05; if (c == '.') break; space = (c == ' '); *walk = (!opts->nocase && c >= 'a' && c <= 'z') ? c - 32 : c; } if (space) return -EINVAL; if (opts->name_check == 's' && len && c != '.') { c = *name++; len--; if (c != '.') return -EINVAL; } while (c != '.' && len--) c = *name++; if (c == '.') { while (walk - res < 8) *walk++ = ' '; while (len > 0 && walk - res < MSDOS_NAME) { c = *name++; len--; if (opts->name_check != 'r' && strchr(bad_chars, c)) return -EINVAL; if (opts->name_check == 's' && strchr(bad_if_strict, c)) return -EINVAL; if (c < ' ' || c == ':' || c == '\\') return -EINVAL; if (c == '.') { if (opts->name_check == 's') return -EINVAL; break; } if (c >= 'A' && c <= 'Z' && opts->name_check == 's') return -EINVAL; space = c == ' '; if (!opts->nocase && c >= 'a' && c <= 'z') *walk++ = c - 32; else *walk++ = c; } if (space) return -EINVAL; if (opts->name_check == 's' && len) return -EINVAL; } while (walk - res < MSDOS_NAME) *walk++ = ' '; return 0; } /***** Locates a directory entry. Uses unformatted name. */ static int msdos_find(struct inode *dir, const unsigned char *name, int len, struct fat_slot_info *sinfo) { struct msdos_sb_info *sbi = MSDOS_SB(dir->i_sb); unsigned char msdos_name[MSDOS_NAME]; int err; err = msdos_format_name(name, len, msdos_name, &sbi->options); if (err) return -ENOENT; err = fat_scan(dir, msdos_name, sinfo); if (!err && sbi->options.dotsOK) { if (name[0] == '.') { if (!(sinfo->de->attr & ATTR_HIDDEN)) err = -ENOENT; } else { if (sinfo->de->attr & ATTR_HIDDEN) err = -ENOENT; } if (err) brelse(sinfo->bh); } return err; } /* * Compute the hash for the msdos name corresponding to the dentry. * Note: if the name is invalid, we leave the hash code unchanged so * that the existing dentry can be used. The msdos fs routines will * return ENOENT or EINVAL as appropriate. */ static int msdos_hash(const struct dentry *dentry, struct qstr *qstr) { struct fat_mount_options *options = &MSDOS_SB(dentry->d_sb)->options; unsigned char msdos_name[MSDOS_NAME]; int error; error = msdos_format_name(qstr->name, qstr->len, msdos_name, options); if (!error) qstr->hash = full_name_hash(dentry, msdos_name, MSDOS_NAME); return 0; } /* * Compare two msdos names. If either of the names are invalid, * we fall back to doing the standard name comparison. */ static int msdos_cmp(const struct dentry *dentry, unsigned int len, const char *str, const struct qstr *name) { struct fat_mount_options *options = &MSDOS_SB(dentry->d_sb)->options; unsigned char a_msdos_name[MSDOS_NAME], b_msdos_name[MSDOS_NAME]; int error; error = msdos_format_name(name->name, name->len, a_msdos_name, options); if (error) goto old_compare; error = msdos_format_name(str, len, b_msdos_name, options); if (error) goto old_compare; error = memcmp(a_msdos_name, b_msdos_name, MSDOS_NAME); out: return error; old_compare: error = 1; if (name->len == len) error = memcmp(name->name, str, len); goto out; } static const struct dentry_operations msdos_dentry_operations = { .d_hash = msdos_hash, .d_compare = msdos_cmp, }; /* * AV. Wrappers for FAT sb operations. Is it wise? */ /***** Get inode using directory and name */ static struct dentry *msdos_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct super_block *sb = dir->i_sb; struct fat_slot_info sinfo; struct inode *inode; int err; mutex_lock(&MSDOS_SB(sb)->s_lock); err = msdos_find(dir, dentry->d_name.name, dentry->d_name.len, &sinfo); switch (err) { case -ENOENT: inode = NULL; break; case 0: inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos); brelse(sinfo.bh); break; default: inode = ERR_PTR(err); } mutex_unlock(&MSDOS_SB(sb)->s_lock); return d_splice_alias(inode, dentry); } /***** Creates a directory entry (name is already formatted). */ static int msdos_add_entry(struct inode *dir, const unsigned char *name, int is_dir, int is_hid, int cluster, struct timespec64 *ts, struct fat_slot_info *sinfo) { struct msdos_sb_info *sbi = MSDOS_SB(dir->i_sb); struct msdos_dir_entry de; __le16 time, date; int err; memcpy(de.name, name, MSDOS_NAME); de.attr = is_dir ? ATTR_DIR : ATTR_ARCH; if (is_hid) de.attr |= ATTR_HIDDEN; de.lcase = 0; fat_time_unix2fat(sbi, ts, &time, &date, NULL); de.cdate = de.adate = 0; de.ctime = 0; de.ctime_cs = 0; de.time = time; de.date = date; fat_set_start(&de, cluster); de.size = 0; err = fat_add_entries(dir, &de, 1, sinfo); if (err) return err; fat_truncate_time(dir, ts, S_CTIME|S_MTIME); if (IS_DIRSYNC(dir)) (void)fat_sync_inode(dir); else mark_inode_dirty(dir); return 0; } /***** Create a file */ static int msdos_create(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { struct super_block *sb = dir->i_sb; struct inode *inode = NULL; struct fat_slot_info sinfo; struct timespec64 ts; unsigned char msdos_name[MSDOS_NAME]; int err, is_hid; mutex_lock(&MSDOS_SB(sb)->s_lock); err = msdos_format_name(dentry->d_name.name, dentry->d_name.len, msdos_name, &MSDOS_SB(sb)->options); if (err) goto out; is_hid = (dentry->d_name.name[0] == '.') && (msdos_name[0] != '.'); /* Have to do it due to foo vs. .foo conflicts */ if (!fat_scan(dir, msdos_name, &sinfo)) { brelse(sinfo.bh); err = -EINVAL; goto out; } ts = current_time(dir); err = msdos_add_entry(dir, msdos_name, 0, is_hid, 0, &ts, &sinfo); if (err) goto out; inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos); brelse(sinfo.bh); if (IS_ERR(inode)) { err = PTR_ERR(inode); goto out; } fat_truncate_time(inode, &ts, S_ATIME|S_CTIME|S_MTIME); /* timestamp is already written, so mark_inode_dirty() is unneeded. */ d_instantiate(dentry, inode); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); if (!err) err = fat_flush_inodes(sb, dir, inode); return err; } /***** Remove a directory */ static int msdos_rmdir(struct inode *dir, struct dentry *dentry) { struct super_block *sb = dir->i_sb; struct inode *inode = d_inode(dentry); struct fat_slot_info sinfo; int err; mutex_lock(&MSDOS_SB(sb)->s_lock); err = fat_dir_empty(inode); if (err) goto out; err = msdos_find(dir, dentry->d_name.name, dentry->d_name.len, &sinfo); if (err) goto out; err = fat_remove_entries(dir, &sinfo); /* and releases bh */ if (err) goto out; drop_nlink(dir); clear_nlink(inode); fat_truncate_time(inode, NULL, S_CTIME); fat_detach(inode); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); if (!err) err = fat_flush_inodes(sb, dir, inode); return err; } /***** Make a directory */ static int msdos_mkdir(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode) { struct super_block *sb = dir->i_sb; struct fat_slot_info sinfo; struct inode *inode; unsigned char msdos_name[MSDOS_NAME]; struct timespec64 ts; int err, is_hid, cluster; mutex_lock(&MSDOS_SB(sb)->s_lock); err = msdos_format_name(dentry->d_name.name, dentry->d_name.len, msdos_name, &MSDOS_SB(sb)->options); if (err) goto out; is_hid = (dentry->d_name.name[0] == '.') && (msdos_name[0] != '.'); /* foo vs .foo situation */ if (!fat_scan(dir, msdos_name, &sinfo)) { brelse(sinfo.bh); err = -EINVAL; goto out; } ts = current_time(dir); cluster = fat_alloc_new_dir(dir, &ts); if (cluster < 0) { err = cluster; goto out; } err = msdos_add_entry(dir, msdos_name, 1, is_hid, cluster, &ts, &sinfo); if (err) goto out_free; inc_nlink(dir); inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos); brelse(sinfo.bh); if (IS_ERR(inode)) { err = PTR_ERR(inode); /* the directory was completed, just return a error */ goto out; } set_nlink(inode, 2); fat_truncate_time(inode, &ts, S_ATIME|S_CTIME|S_MTIME); /* timestamp is already written, so mark_inode_dirty() is unneeded. */ d_instantiate(dentry, inode); mutex_unlock(&MSDOS_SB(sb)->s_lock); fat_flush_inodes(sb, dir, inode); return 0; out_free: fat_free_clusters(dir, cluster); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); return err; } /***** Unlink a file */ static int msdos_unlink(struct inode *dir, struct dentry *dentry) { struct inode *inode = d_inode(dentry); struct super_block *sb = inode->i_sb; struct fat_slot_info sinfo; int err; mutex_lock(&MSDOS_SB(sb)->s_lock); err = msdos_find(dir, dentry->d_name.name, dentry->d_name.len, &sinfo); if (err) goto out; err = fat_remove_entries(dir, &sinfo); /* and releases bh */ if (err) goto out; clear_nlink(inode); fat_truncate_time(inode, NULL, S_CTIME); fat_detach(inode); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); if (!err) err = fat_flush_inodes(sb, dir, inode); return err; } static int do_msdos_rename(struct inode *old_dir, unsigned char *old_name, struct dentry *old_dentry, struct inode *new_dir, unsigned char *new_name, struct dentry *new_dentry, int is_hid) { struct buffer_head *dotdot_bh; struct msdos_dir_entry *dotdot_de; struct inode *old_inode, *new_inode; struct fat_slot_info old_sinfo, sinfo; struct timespec64 ts; loff_t new_i_pos; int err, old_attrs, is_dir, update_dotdot, corrupt = 0; old_sinfo.bh = sinfo.bh = dotdot_bh = NULL; old_inode = d_inode(old_dentry); new_inode = d_inode(new_dentry); err = fat_scan(old_dir, old_name, &old_sinfo); if (err) { err = -EIO; goto out; } is_dir = S_ISDIR(old_inode->i_mode); update_dotdot = (is_dir && old_dir != new_dir); if (update_dotdot) { if (fat_get_dotdot_entry(old_inode, &dotdot_bh, &dotdot_de)) { err = -EIO; goto out; } } old_attrs = MSDOS_I(old_inode)->i_attrs; err = fat_scan(new_dir, new_name, &sinfo); if (!err) { if (!new_inode) { /* "foo" -> ".foo" case. just change the ATTR_HIDDEN */ if (sinfo.de != old_sinfo.de) { err = -EINVAL; goto out; } if (is_hid) MSDOS_I(old_inode)->i_attrs |= ATTR_HIDDEN; else MSDOS_I(old_inode)->i_attrs &= ~ATTR_HIDDEN; if (IS_DIRSYNC(old_dir)) { err = fat_sync_inode(old_inode); if (err) { MSDOS_I(old_inode)->i_attrs = old_attrs; goto out; } } else mark_inode_dirty(old_inode); inode_inc_iversion(old_dir); fat_truncate_time(old_dir, NULL, S_CTIME|S_MTIME); if (IS_DIRSYNC(old_dir)) (void)fat_sync_inode(old_dir); else mark_inode_dirty(old_dir); goto out; } } ts = current_time(old_inode); if (new_inode) { if (err) goto out; if (is_dir) { err = fat_dir_empty(new_inode); if (err) goto out; } new_i_pos = MSDOS_I(new_inode)->i_pos; fat_detach(new_inode); } else { err = msdos_add_entry(new_dir, new_name, is_dir, is_hid, 0, &ts, &sinfo); if (err) goto out; new_i_pos = sinfo.i_pos; } inode_inc_iversion(new_dir); fat_detach(old_inode); fat_attach(old_inode, new_i_pos); if (is_hid) MSDOS_I(old_inode)->i_attrs |= ATTR_HIDDEN; else MSDOS_I(old_inode)->i_attrs &= ~ATTR_HIDDEN; if (IS_DIRSYNC(new_dir)) { err = fat_sync_inode(old_inode); if (err) goto error_inode; } else mark_inode_dirty(old_inode); if (update_dotdot) { fat_set_start(dotdot_de, MSDOS_I(new_dir)->i_logstart); mark_buffer_dirty_inode(dotdot_bh, old_inode); if (IS_DIRSYNC(new_dir)) { err = sync_dirty_buffer(dotdot_bh); if (err) goto error_dotdot; } drop_nlink(old_dir); if (!new_inode) inc_nlink(new_dir); } err = fat_remove_entries(old_dir, &old_sinfo); /* and releases bh */ old_sinfo.bh = NULL; if (err) goto error_dotdot; inode_inc_iversion(old_dir); fat_truncate_time(old_dir, &ts, S_CTIME|S_MTIME); if (IS_DIRSYNC(old_dir)) (void)fat_sync_inode(old_dir); else mark_inode_dirty(old_dir); if (new_inode) { drop_nlink(new_inode); if (is_dir) drop_nlink(new_inode); fat_truncate_time(new_inode, &ts, S_CTIME); } out: brelse(sinfo.bh); brelse(dotdot_bh); brelse(old_sinfo.bh); return err; error_dotdot: /* data cluster is shared, serious corruption */ corrupt = 1; if (update_dotdot) { fat_set_start(dotdot_de, MSDOS_I(old_dir)->i_logstart); mark_buffer_dirty_inode(dotdot_bh, old_inode); corrupt |= sync_dirty_buffer(dotdot_bh); } error_inode: fat_detach(old_inode); fat_attach(old_inode, old_sinfo.i_pos); MSDOS_I(old_inode)->i_attrs = old_attrs; if (new_inode) { fat_attach(new_inode, new_i_pos); if (corrupt) corrupt |= fat_sync_inode(new_inode); } else { /* * If new entry was not sharing the data cluster, it * shouldn't be serious corruption. */ int err2 = fat_remove_entries(new_dir, &sinfo); if (corrupt) corrupt |= err2; sinfo.bh = NULL; } if (corrupt < 0) { fat_fs_error(new_dir->i_sb, "%s: Filesystem corrupted (i_pos %lld)", __func__, sinfo.i_pos); } goto out; } /***** Rename, a wrapper for rename_same_dir & rename_diff_dir */ static int msdos_rename(struct mnt_idmap *idmap, struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags) { struct super_block *sb = old_dir->i_sb; unsigned char old_msdos_name[MSDOS_NAME], new_msdos_name[MSDOS_NAME]; int err, is_hid; if (flags & ~RENAME_NOREPLACE) return -EINVAL; mutex_lock(&MSDOS_SB(sb)->s_lock); err = msdos_format_name(old_dentry->d_name.name, old_dentry->d_name.len, old_msdos_name, &MSDOS_SB(old_dir->i_sb)->options); if (err) goto out; err = msdos_format_name(new_dentry->d_name.name, new_dentry->d_name.len, new_msdos_name, &MSDOS_SB(new_dir->i_sb)->options); if (err) goto out; is_hid = (new_dentry->d_name.name[0] == '.') && (new_msdos_name[0] != '.'); err = do_msdos_rename(old_dir, old_msdos_name, old_dentry, new_dir, new_msdos_name, new_dentry, is_hid); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); if (!err) err = fat_flush_inodes(sb, old_dir, new_dir); return err; } static const struct inode_operations msdos_dir_inode_operations = { .create = msdos_create, .lookup = msdos_lookup, .unlink = msdos_unlink, .mkdir = msdos_mkdir, .rmdir = msdos_rmdir, .rename = msdos_rename, .setattr = fat_setattr, .getattr = fat_getattr, .update_time = fat_update_time, }; static void setup(struct super_block *sb) { MSDOS_SB(sb)->dir_ops = &msdos_dir_inode_operations; sb->s_d_op = &msdos_dentry_operations; sb->s_flags |= SB_NOATIME; } static int msdos_fill_super(struct super_block *sb, struct fs_context *fc) { return fat_fill_super(sb, fc, setup); } static int msdos_get_tree(struct fs_context *fc) { return get_tree_bdev(fc, msdos_fill_super); } static int msdos_parse_param(struct fs_context *fc, struct fs_parameter *param) { return fat_parse_param(fc, param, false); } static const struct fs_context_operations msdos_context_ops = { .parse_param = msdos_parse_param, .get_tree = msdos_get_tree, .reconfigure = fat_reconfigure, .free = fat_free_fc, }; static int msdos_init_fs_context(struct fs_context *fc) { int err; /* Initialize with is_vfat == false */ err = fat_init_fs_context(fc, false); if (err) return err; fc->ops = &msdos_context_ops; return 0; } static struct file_system_type msdos_fs_type = { .owner = THIS_MODULE, .name = "msdos", .kill_sb = kill_block_super, .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP, .init_fs_context = msdos_init_fs_context, .parameters = fat_param_spec, }; MODULE_ALIAS_FS("msdos"); static int __init init_msdos_fs(void) { return register_filesystem(&msdos_fs_type); } static void __exit exit_msdos_fs(void) { unregister_filesystem(&msdos_fs_type); } MODULE_LICENSE("GPL"); MODULE_AUTHOR("Werner Almesberger"); MODULE_DESCRIPTION("MS-DOS filesystem support"); module_init(init_msdos_fs) module_exit(exit_msdos_fs) |
| 4 6 10 3 10 8 1 5 2 2 1 4 10 5 2 4 2 3 3 4 2 4 2 5 1 6 2066 12 2055 2 1 1 1 1 10 6 4 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/errno.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/scatterlist.h> #include <linux/mutex.h> #include <linux/timer.h> #include <linux/usb.h> #define SIMPLE_IO_TIMEOUT 10000 /* in milliseconds */ /*-------------------------------------------------------------------------*/ static int override_alt = -1; module_param_named(alt, override_alt, int, 0644); MODULE_PARM_DESC(alt, ">= 0 to override altsetting selection"); static void complicated_callback(struct urb *urb); /*-------------------------------------------------------------------------*/ /* FIXME make these public somewhere; usbdevfs.h? */ /* Parameter for usbtest driver. */ struct usbtest_param_32 { /* inputs */ __u32 test_num; /* 0..(TEST_CASES-1) */ __u32 iterations; __u32 length; __u32 vary; __u32 sglen; /* outputs */ __s32 duration_sec; __s32 duration_usec; }; /* * Compat parameter to the usbtest driver. * This supports older user space binaries compiled with 64 bit compiler. */ struct usbtest_param_64 { /* inputs */ __u32 test_num; /* 0..(TEST_CASES-1) */ __u32 iterations; __u32 length; __u32 vary; __u32 sglen; /* outputs */ __s64 duration_sec; __s64 duration_usec; }; /* IOCTL interface to the driver. */ #define USBTEST_REQUEST_32 _IOWR('U', 100, struct usbtest_param_32) /* COMPAT IOCTL interface to the driver. */ #define USBTEST_REQUEST_64 _IOWR('U', 100, struct usbtest_param_64) /*-------------------------------------------------------------------------*/ #define GENERIC /* let probe() bind using module params */ /* Some devices that can be used for testing will have "real" drivers. * Entries for those need to be enabled here by hand, after disabling * that "real" driver. */ //#define IBOT2 /* grab iBOT2 webcams */ //#define KEYSPAN_19Qi /* grab un-renumerated serial adapter */ /*-------------------------------------------------------------------------*/ struct usbtest_info { const char *name; u8 ep_in; /* bulk/intr source */ u8 ep_out; /* bulk/intr sink */ unsigned autoconf:1; unsigned ctrl_out:1; unsigned iso:1; /* try iso in/out */ unsigned intr:1; /* try interrupt in/out */ int alt; }; /* this is accessed only through usbfs ioctl calls. * one ioctl to issue a test ... one lock per device. * tests create other threads if they need them. * urbs and buffers are allocated dynamically, * and data generated deterministically. */ struct usbtest_dev { struct usb_interface *intf; struct usbtest_info *info; int in_pipe; int out_pipe; int in_iso_pipe; int out_iso_pipe; int in_int_pipe; int out_int_pipe; struct usb_endpoint_descriptor *iso_in, *iso_out; struct usb_endpoint_descriptor *int_in, *int_out; struct mutex lock; #define TBUF_SIZE 256 u8 *buf; }; static struct usb_device *testdev_to_usbdev(struct usbtest_dev *test) { return interface_to_usbdev(test->intf); } /* set up all urbs so they can be used with either bulk or interrupt */ #define INTERRUPT_RATE 1 /* msec/transfer */ #define ERROR(tdev, fmt, args...) \ dev_err(&(tdev)->intf->dev , fmt , ## args) #define WARNING(tdev, fmt, args...) \ dev_warn(&(tdev)->intf->dev , fmt , ## args) #define GUARD_BYTE 0xA5 #define MAX_SGLEN 128 /*-------------------------------------------------------------------------*/ static inline void endpoint_update(int edi, struct usb_host_endpoint **in, struct usb_host_endpoint **out, struct usb_host_endpoint *e) { if (edi) { if (!*in) *in = e; } else { if (!*out) *out = e; } } static int get_endpoints(struct usbtest_dev *dev, struct usb_interface *intf) { int tmp; struct usb_host_interface *alt; struct usb_host_endpoint *in, *out; struct usb_host_endpoint *iso_in, *iso_out; struct usb_host_endpoint *int_in, *int_out; struct usb_device *udev; for (tmp = 0; tmp < intf->num_altsetting; tmp++) { unsigned ep; in = out = NULL; iso_in = iso_out = NULL; int_in = int_out = NULL; alt = intf->altsetting + tmp; if (override_alt >= 0 && override_alt != alt->desc.bAlternateSetting) continue; /* take the first altsetting with in-bulk + out-bulk; * ignore other endpoints and altsettings. */ for (ep = 0; ep < alt->desc.bNumEndpoints; ep++) { struct usb_host_endpoint *e; int edi; e = alt->endpoint + ep; edi = usb_endpoint_dir_in(&e->desc); switch (usb_endpoint_type(&e->desc)) { case USB_ENDPOINT_XFER_BULK: endpoint_update(edi, &in, &out, e); continue; case USB_ENDPOINT_XFER_INT: if (dev->info->intr) endpoint_update(edi, &int_in, &int_out, e); continue; case USB_ENDPOINT_XFER_ISOC: if (dev->info->iso) endpoint_update(edi, &iso_in, &iso_out, e); fallthrough; default: continue; } } if ((in && out) || iso_in || iso_out || int_in || int_out) goto found; } return -EINVAL; found: udev = testdev_to_usbdev(dev); dev->info->alt = alt->desc.bAlternateSetting; if (alt->desc.bAlternateSetting != 0) { tmp = usb_set_interface(udev, alt->desc.bInterfaceNumber, alt->desc.bAlternateSetting); if (tmp < 0) return tmp; } if (in) dev->in_pipe = usb_rcvbulkpipe(udev, in->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); if (out) dev->out_pipe = usb_sndbulkpipe(udev, out->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); if (iso_in) { dev->iso_in = &iso_in->desc; dev->in_iso_pipe = usb_rcvisocpipe(udev, iso_in->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); } if (iso_out) { dev->iso_out = &iso_out->desc; dev->out_iso_pipe = usb_sndisocpipe(udev, iso_out->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); } if (int_in) { dev->int_in = &int_in->desc; dev->in_int_pipe = usb_rcvintpipe(udev, int_in->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); } if (int_out) { dev->int_out = &int_out->desc; dev->out_int_pipe = usb_sndintpipe(udev, int_out->desc.bEndpointAddress & USB_ENDPOINT_NUMBER_MASK); } return 0; } /*-------------------------------------------------------------------------*/ /* Support for testing basic non-queued I/O streams. * * These just package urbs as requests that can be easily canceled. * Each urb's data buffer is dynamically allocated; callers can fill * them with non-zero test data (or test for it) when appropriate. */ static void simple_callback(struct urb *urb) { complete(urb->context); } static struct urb *usbtest_alloc_urb( struct usb_device *udev, int pipe, unsigned long bytes, unsigned transfer_flags, unsigned offset, u8 bInterval, usb_complete_t complete_fn) { struct urb *urb; urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) return urb; if (bInterval) usb_fill_int_urb(urb, udev, pipe, NULL, bytes, complete_fn, NULL, bInterval); else usb_fill_bulk_urb(urb, udev, pipe, NULL, bytes, complete_fn, NULL); urb->interval = (udev->speed == USB_SPEED_HIGH) ? (INTERRUPT_RATE << 3) : INTERRUPT_RATE; urb->transfer_flags = transfer_flags; if (usb_pipein(pipe)) urb->transfer_flags |= URB_SHORT_NOT_OK; if ((bytes + offset) == 0) return urb; if (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) urb->transfer_buffer = usb_alloc_coherent(udev, bytes + offset, GFP_KERNEL, &urb->transfer_dma); else urb->transfer_buffer = kmalloc(bytes + offset, GFP_KERNEL); if (!urb->transfer_buffer) { usb_free_urb(urb); return NULL; } /* To test unaligned transfers add an offset and fill the unused memory with a guard value */ if (offset) { memset(urb->transfer_buffer, GUARD_BYTE, offset); urb->transfer_buffer += offset; if (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) urb->transfer_dma += offset; } /* For inbound transfers use guard byte so that test fails if data not correctly copied */ memset(urb->transfer_buffer, usb_pipein(urb->pipe) ? GUARD_BYTE : 0, bytes); return urb; } static struct urb *simple_alloc_urb( struct usb_device *udev, int pipe, unsigned long bytes, u8 bInterval) { return usbtest_alloc_urb(udev, pipe, bytes, URB_NO_TRANSFER_DMA_MAP, 0, bInterval, simple_callback); } static struct urb *complicated_alloc_urb( struct usb_device *udev, int pipe, unsigned long bytes, u8 bInterval) { return usbtest_alloc_urb(udev, pipe, bytes, URB_NO_TRANSFER_DMA_MAP, 0, bInterval, complicated_callback); } static unsigned pattern; static unsigned mod_pattern; module_param_named(pattern, mod_pattern, uint, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(mod_pattern, "i/o pattern (0 == zeroes)"); static unsigned get_maxpacket(struct usb_device *udev, int pipe) { struct usb_host_endpoint *ep; ep = usb_pipe_endpoint(udev, pipe); return le16_to_cpup(&ep->desc.wMaxPacketSize); } static int ss_isoc_get_packet_num(struct usb_device *udev, int pipe) { struct usb_host_endpoint *ep = usb_pipe_endpoint(udev, pipe); return USB_SS_MULT(ep->ss_ep_comp.bmAttributes) * (1 + ep->ss_ep_comp.bMaxBurst); } static void simple_fill_buf(struct urb *urb) { unsigned i; u8 *buf = urb->transfer_buffer; unsigned len = urb->transfer_buffer_length; unsigned maxpacket; switch (pattern) { default: fallthrough; case 0: memset(buf, 0, len); break; case 1: /* mod63 */ maxpacket = get_maxpacket(urb->dev, urb->pipe); for (i = 0; i < len; i++) *buf++ = (u8) ((i % maxpacket) % 63); break; } } static inline unsigned long buffer_offset(void *buf) { return (unsigned long)buf & (ARCH_KMALLOC_MINALIGN - 1); } static int check_guard_bytes(struct usbtest_dev *tdev, struct urb *urb) { u8 *buf = urb->transfer_buffer; u8 *guard = buf - buffer_offset(buf); unsigned i; for (i = 0; guard < buf; i++, guard++) { if (*guard != GUARD_BYTE) { ERROR(tdev, "guard byte[%d] %d (not %d)\n", i, *guard, GUARD_BYTE); return -EINVAL; } } return 0; } static int simple_check_buf(struct usbtest_dev *tdev, struct urb *urb) { unsigned i; u8 expected; u8 *buf = urb->transfer_buffer; unsigned len = urb->actual_length; unsigned maxpacket = get_maxpacket(urb->dev, urb->pipe); int ret = check_guard_bytes(tdev, urb); if (ret) return ret; for (i = 0; i < len; i++, buf++) { switch (pattern) { /* all-zeroes has no synchronization issues */ case 0: expected = 0; break; /* mod63 stays in sync with short-terminated transfers, * or otherwise when host and gadget agree on how large * each usb transfer request should be. resync is done * with set_interface or set_config. */ case 1: /* mod63 */ expected = (i % maxpacket) % 63; break; /* always fail unsupported patterns */ default: expected = !*buf; break; } if (*buf == expected) continue; ERROR(tdev, "buf[%d] = %d (not %d)\n", i, *buf, expected); return -EINVAL; } return 0; } static void simple_free_urb(struct urb *urb) { unsigned long offset = buffer_offset(urb->transfer_buffer); if (urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) usb_free_coherent( urb->dev, urb->transfer_buffer_length + offset, urb->transfer_buffer - offset, urb->transfer_dma - offset); else kfree(urb->transfer_buffer - offset); usb_free_urb(urb); } static int simple_io( struct usbtest_dev *tdev, struct urb *urb, int iterations, int vary, int expected, const char *label ) { struct usb_device *udev = urb->dev; int max = urb->transfer_buffer_length; struct completion completion; int retval = 0; unsigned long expire; urb->context = &completion; while (retval == 0 && iterations-- > 0) { init_completion(&completion); if (usb_pipeout(urb->pipe)) { simple_fill_buf(urb); urb->transfer_flags |= URB_ZERO_PACKET; } retval = usb_submit_urb(urb, GFP_KERNEL); if (retval != 0) break; expire = msecs_to_jiffies(SIMPLE_IO_TIMEOUT); if (!wait_for_completion_timeout(&completion, expire)) { usb_kill_urb(urb); retval = (urb->status == -ENOENT ? -ETIMEDOUT : urb->status); } else { retval = urb->status; } urb->dev = udev; if (retval == 0 && usb_pipein(urb->pipe)) retval = simple_check_buf(tdev, urb); if (vary) { int len = urb->transfer_buffer_length; len += vary; len %= max; if (len == 0) len = (vary < max) ? vary : max; urb->transfer_buffer_length = len; } /* FIXME if endpoint halted, clear halt (and log) */ } urb->transfer_buffer_length = max; if (expected != retval) dev_err(&udev->dev, "%s failed, iterations left %d, status %d (not %d)\n", label, iterations, retval, expected); return retval; } /*-------------------------------------------------------------------------*/ /* We use scatterlist primitives to test queued I/O. * Yes, this also tests the scatterlist primitives. */ static void free_sglist(struct scatterlist *sg, int nents) { unsigned i; if (!sg) return; for (i = 0; i < nents; i++) { if (!sg_page(&sg[i])) continue; kfree(sg_virt(&sg[i])); } kfree(sg); } static struct scatterlist * alloc_sglist(int nents, int max, int vary, struct usbtest_dev *dev, int pipe) { struct scatterlist *sg; unsigned int n_size = 0; unsigned i; unsigned size = max; unsigned maxpacket = get_maxpacket(interface_to_usbdev(dev->intf), pipe); if (max == 0) return NULL; sg = kmalloc_array(nents, sizeof(*sg), GFP_KERNEL); if (!sg) return NULL; sg_init_table(sg, nents); for (i = 0; i < nents; i++) { char *buf; unsigned j; buf = kzalloc(size, GFP_KERNEL); if (!buf) { free_sglist(sg, i); return NULL; } /* kmalloc pages are always physically contiguous! */ sg_set_buf(&sg[i], buf, size); switch (pattern) { case 0: /* already zeroed */ break; case 1: for (j = 0; j < size; j++) *buf++ = (u8) (((j + n_size) % maxpacket) % 63); n_size += size; break; } if (vary) { size += vary; size %= max; if (size == 0) size = (vary < max) ? vary : max; } } return sg; } struct sg_timeout { struct timer_list timer; struct usb_sg_request *req; }; static void sg_timeout(struct timer_list *t) { struct sg_timeout *timeout = from_timer(timeout, t, timer); usb_sg_cancel(timeout->req); } static int perform_sglist( struct usbtest_dev *tdev, unsigned iterations, int pipe, struct usb_sg_request *req, struct scatterlist *sg, int nents ) { struct usb_device *udev = testdev_to_usbdev(tdev); int retval = 0; struct sg_timeout timeout = { .req = req, }; timer_setup_on_stack(&timeout.timer, sg_timeout, 0); while (retval == 0 && iterations-- > 0) { retval = usb_sg_init(req, udev, pipe, (udev->speed == USB_SPEED_HIGH) ? (INTERRUPT_RATE << 3) : INTERRUPT_RATE, sg, nents, 0, GFP_KERNEL); if (retval) break; mod_timer(&timeout.timer, jiffies + msecs_to_jiffies(SIMPLE_IO_TIMEOUT)); usb_sg_wait(req); if (!del_timer_sync(&timeout.timer)) retval = -ETIMEDOUT; else retval = req->status; destroy_timer_on_stack(&timeout.timer); /* FIXME check resulting data pattern */ /* FIXME if endpoint halted, clear halt (and log) */ } /* FIXME for unlink or fault handling tests, don't report * failure if retval is as we expected ... */ if (retval) ERROR(tdev, "perform_sglist failed, " "iterations left %d, status %d\n", iterations, retval); return retval; } /*-------------------------------------------------------------------------*/ /* unqueued control message testing * * there's a nice set of device functional requirements in chapter 9 of the * usb 2.0 spec, which we can apply to ANY device, even ones that don't use * special test firmware. * * we know the device is configured (or suspended) by the time it's visible * through usbfs. we can't change that, so we won't test enumeration (which * worked 'well enough' to get here, this time), power management (ditto), * or remote wakeup (which needs human interaction). */ static unsigned realworld = 1; module_param(realworld, uint, 0); MODULE_PARM_DESC(realworld, "clear to demand stricter spec compliance"); static int get_altsetting(struct usbtest_dev *dev) { struct usb_interface *iface = dev->intf; struct usb_device *udev = interface_to_usbdev(iface); int retval; retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0), USB_REQ_GET_INTERFACE, USB_DIR_IN|USB_RECIP_INTERFACE, 0, iface->altsetting[0].desc.bInterfaceNumber, dev->buf, 1, USB_CTRL_GET_TIMEOUT); switch (retval) { case 1: return dev->buf[0]; case 0: retval = -ERANGE; fallthrough; default: return retval; } } static int set_altsetting(struct usbtest_dev *dev, int alternate) { struct usb_interface *iface = dev->intf; struct usb_device *udev; if (alternate < 0 || alternate >= 256) return -EINVAL; udev = interface_to_usbdev(iface); return usb_set_interface(udev, iface->altsetting[0].desc.bInterfaceNumber, alternate); } static int is_good_config(struct usbtest_dev *tdev, int len) { struct usb_config_descriptor *config; if (len < (int)sizeof(*config)) return 0; config = (struct usb_config_descriptor *) tdev->buf; switch (config->bDescriptorType) { case USB_DT_CONFIG: case USB_DT_OTHER_SPEED_CONFIG: if (config->bLength != 9) { ERROR(tdev, "bogus config descriptor length\n"); return 0; } /* this bit 'must be 1' but often isn't */ if (!realworld && !(config->bmAttributes & 0x80)) { ERROR(tdev, "high bit of config attributes not set\n"); return 0; } if (config->bmAttributes & 0x1f) { /* reserved == 0 */ ERROR(tdev, "reserved config bits set\n"); return 0; } break; default: return 0; } if (le16_to_cpu(config->wTotalLength) == len) /* read it all */ return 1; if (le16_to_cpu(config->wTotalLength) >= TBUF_SIZE) /* max partial read */ return 1; ERROR(tdev, "bogus config descriptor read size\n"); return 0; } static int is_good_ext(struct usbtest_dev *tdev, u8 *buf) { struct usb_ext_cap_descriptor *ext; u32 attr; ext = (struct usb_ext_cap_descriptor *) buf; if (ext->bLength != USB_DT_USB_EXT_CAP_SIZE) { ERROR(tdev, "bogus usb 2.0 extension descriptor length\n"); return 0; } attr = le32_to_cpu(ext->bmAttributes); /* bits[1:15] is used and others are reserved */ if (attr & ~0xfffe) { /* reserved == 0 */ ERROR(tdev, "reserved bits set\n"); return 0; } return 1; } static int is_good_ss_cap(struct usbtest_dev *tdev, u8 *buf) { struct usb_ss_cap_descriptor *ss; ss = (struct usb_ss_cap_descriptor *) buf; if (ss->bLength != USB_DT_USB_SS_CAP_SIZE) { ERROR(tdev, "bogus superspeed device capability descriptor length\n"); return 0; } /* * only bit[1] of bmAttributes is used for LTM and others are * reserved */ if (ss->bmAttributes & ~0x02) { /* reserved == 0 */ ERROR(tdev, "reserved bits set in bmAttributes\n"); return 0; } /* bits[0:3] of wSpeedSupported is used and others are reserved */ if (le16_to_cpu(ss->wSpeedSupported) & ~0x0f) { /* reserved == 0 */ ERROR(tdev, "reserved bits set in wSpeedSupported\n"); return 0; } return 1; } static int is_good_con_id(struct usbtest_dev *tdev, u8 *buf) { struct usb_ss_container_id_descriptor *con_id; con_id = (struct usb_ss_container_id_descriptor *) buf; if (con_id->bLength != USB_DT_USB_SS_CONTN_ID_SIZE) { ERROR(tdev, "bogus container id descriptor length\n"); return 0; } if (con_id->bReserved) { /* reserved == 0 */ ERROR(tdev, "reserved bits set\n"); return 0; } return 1; } /* sanity test for standard requests working with usb_control_mesg() and some * of the utility functions which use it. * * this doesn't test how endpoint halts behave or data toggles get set, since * we won't do I/O to bulk/interrupt endpoints here (which is how to change * halt or toggle). toggle testing is impractical without support from hcds. * * this avoids failing devices linux would normally work with, by not testing * config/altsetting operations for devices that only support their defaults. * such devices rarely support those needless operations. * * NOTE that since this is a sanity test, it's not examining boundary cases * to see if usbcore, hcd, and device all behave right. such testing would * involve varied read sizes and other operation sequences. */ static int ch9_postconfig(struct usbtest_dev *dev) { struct usb_interface *iface = dev->intf; struct usb_device *udev = interface_to_usbdev(iface); int i, alt, retval; /* [9.2.3] if there's more than one altsetting, we need to be able to * set and get each one. mostly trusts the descriptors from usbcore. */ for (i = 0; i < iface->num_altsetting; i++) { /* 9.2.3 constrains the range here */ alt = iface->altsetting[i].desc.bAlternateSetting; if (alt < 0 || alt >= iface->num_altsetting) { dev_err(&iface->dev, "invalid alt [%d].bAltSetting = %d\n", i, alt); } /* [real world] get/set unimplemented if there's only one */ if (realworld && iface->num_altsetting == 1) continue; /* [9.4.10] set_interface */ retval = set_altsetting(dev, alt); if (retval) { dev_err(&iface->dev, "can't set_interface = %d, %d\n", alt, retval); return retval; } /* [9.4.4] get_interface always works */ retval = get_altsetting(dev); if (retval != alt) { dev_err(&iface->dev, "get alt should be %d, was %d\n", alt, retval); return (retval < 0) ? retval : -EDOM; } } /* [real world] get_config unimplemented if there's only one */ if (!realworld || udev->descriptor.bNumConfigurations != 1) { int expected = udev->actconfig->desc.bConfigurationValue; /* [9.4.2] get_configuration always works * ... although some cheap devices (like one TI Hub I've got) * won't return config descriptors except before set_config. */ retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0), USB_REQ_GET_CONFIGURATION, USB_DIR_IN | USB_RECIP_DEVICE, 0, 0, dev->buf, 1, USB_CTRL_GET_TIMEOUT); if (retval != 1 || dev->buf[0] != expected) { dev_err(&iface->dev, "get config --> %d %d (1 %d)\n", retval, dev->buf[0], expected); return (retval < 0) ? retval : -EDOM; } } /* there's always [9.4.3] a device descriptor [9.6.1] */ retval = usb_get_descriptor(udev, USB_DT_DEVICE, 0, dev->buf, sizeof(udev->descriptor)); if (retval != sizeof(udev->descriptor)) { dev_err(&iface->dev, "dev descriptor --> %d\n", retval); return (retval < 0) ? retval : -EDOM; } /* * there's always [9.4.3] a bos device descriptor [9.6.2] in USB * 3.0 spec */ if (le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0210) { struct usb_bos_descriptor *bos = NULL; struct usb_dev_cap_header *header = NULL; unsigned total, num, length; u8 *buf; retval = usb_get_descriptor(udev, USB_DT_BOS, 0, dev->buf, sizeof(*udev->bos->desc)); if (retval != sizeof(*udev->bos->desc)) { dev_err(&iface->dev, "bos descriptor --> %d\n", retval); return (retval < 0) ? retval : -EDOM; } bos = (struct usb_bos_descriptor *)dev->buf; total = le16_to_cpu(bos->wTotalLength); num = bos->bNumDeviceCaps; if (total > TBUF_SIZE) total = TBUF_SIZE; /* * get generic device-level capability descriptors [9.6.2] * in USB 3.0 spec */ retval = usb_get_descriptor(udev, USB_DT_BOS, 0, dev->buf, total); if (retval != total) { dev_err(&iface->dev, "bos descriptor set --> %d\n", retval); return (retval < 0) ? retval : -EDOM; } length = sizeof(*udev->bos->desc); buf = dev->buf; for (i = 0; i < num; i++) { buf += length; if (buf + sizeof(struct usb_dev_cap_header) > dev->buf + total) break; header = (struct usb_dev_cap_header *)buf; length = header->bLength; if (header->bDescriptorType != USB_DT_DEVICE_CAPABILITY) { dev_warn(&udev->dev, "not device capability descriptor, skip\n"); continue; } switch (header->bDevCapabilityType) { case USB_CAP_TYPE_EXT: if (buf + USB_DT_USB_EXT_CAP_SIZE > dev->buf + total || !is_good_ext(dev, buf)) { dev_err(&iface->dev, "bogus usb 2.0 extension descriptor\n"); return -EDOM; } break; case USB_SS_CAP_TYPE: if (buf + USB_DT_USB_SS_CAP_SIZE > dev->buf + total || !is_good_ss_cap(dev, buf)) { dev_err(&iface->dev, "bogus superspeed device capability descriptor\n"); return -EDOM; } break; case CONTAINER_ID_TYPE: if (buf + USB_DT_USB_SS_CONTN_ID_SIZE > dev->buf + total || !is_good_con_id(dev, buf)) { dev_err(&iface->dev, "bogus container id descriptor\n"); return -EDOM; } break; default: break; } } } /* there's always [9.4.3] at least one config descriptor [9.6.3] */ for (i = 0; i < udev->descriptor.bNumConfigurations; i++) { retval = usb_get_descriptor(udev, USB_DT_CONFIG, i, dev->buf, TBUF_SIZE); if (!is_good_config(dev, retval)) { dev_err(&iface->dev, "config [%d] descriptor --> %d\n", i, retval); return (retval < 0) ? retval : -EDOM; } /* FIXME cross-checking udev->config[i] to make sure usbcore * parsed it right (etc) would be good testing paranoia */ } /* and sometimes [9.2.6.6] speed dependent descriptors */ if (le16_to_cpu(udev->descriptor.bcdUSB) == 0x0200) { struct usb_qualifier_descriptor *d = NULL; /* device qualifier [9.6.2] */ retval = usb_get_descriptor(udev, USB_DT_DEVICE_QUALIFIER, 0, dev->buf, sizeof(struct usb_qualifier_descriptor)); if (retval == -EPIPE) { if (udev->speed == USB_SPEED_HIGH) { dev_err(&iface->dev, "hs dev qualifier --> %d\n", retval); return retval; } /* usb2.0 but not high-speed capable; fine */ } else if (retval != sizeof(struct usb_qualifier_descriptor)) { dev_err(&iface->dev, "dev qualifier --> %d\n", retval); return (retval < 0) ? retval : -EDOM; } else d = (struct usb_qualifier_descriptor *) dev->buf; /* might not have [9.6.2] any other-speed configs [9.6.4] */ if (d) { unsigned max = d->bNumConfigurations; for (i = 0; i < max; i++) { retval = usb_get_descriptor(udev, USB_DT_OTHER_SPEED_CONFIG, i, dev->buf, TBUF_SIZE); if (!is_good_config(dev, retval)) { dev_err(&iface->dev, "other speed config --> %d\n", retval); return (retval < 0) ? retval : -EDOM; } } } } /* FIXME fetch strings from at least the device descriptor */ /* [9.4.5] get_status always works */ retval = usb_get_std_status(udev, USB_RECIP_DEVICE, 0, dev->buf); if (retval) { dev_err(&iface->dev, "get dev status --> %d\n", retval); return retval; } /* FIXME configuration.bmAttributes says if we could try to set/clear * the device's remote wakeup feature ... if we can, test that here */ retval = usb_get_std_status(udev, USB_RECIP_INTERFACE, iface->altsetting[0].desc.bInterfaceNumber, dev->buf); if (retval) { dev_err(&iface->dev, "get interface status --> %d\n", retval); return retval; } /* FIXME get status for each endpoint in the interface */ return 0; } /*-------------------------------------------------------------------------*/ /* use ch9 requests to test whether: * (a) queues work for control, keeping N subtests queued and * active (auto-resubmit) for M loops through the queue. * (b) protocol stalls (control-only) will autorecover. * it's not like bulk/intr; no halt clearing. * (c) short control reads are reported and handled. * (d) queues are always processed in-order */ struct ctrl_ctx { spinlock_t lock; struct usbtest_dev *dev; struct completion complete; unsigned count; unsigned pending; int status; struct urb **urb; struct usbtest_param_32 *param; int last; }; #define NUM_SUBCASES 16 /* how many test subcases here? */ struct subcase { struct usb_ctrlrequest setup; int number; int expected; }; static void ctrl_complete(struct urb *urb) { struct ctrl_ctx *ctx = urb->context; struct usb_ctrlrequest *reqp; struct subcase *subcase; int status = urb->status; unsigned long flags; reqp = (struct usb_ctrlrequest *)urb->setup_packet; subcase = container_of(reqp, struct subcase, setup); spin_lock_irqsave(&ctx->lock, flags); ctx->count--; ctx->pending--; /* queue must transfer and complete in fifo order, unless * usb_unlink_urb() is used to unlink something not at the * physical queue head (not tested). */ if (subcase->number > 0) { if ((subcase->number - ctx->last) != 1) { ERROR(ctx->dev, "subcase %d completed out of order, last %d\n", subcase->number, ctx->last); status = -EDOM; ctx->last = subcase->number; goto error; } } ctx->last = subcase->number; /* succeed or fault in only one way? */ if (status == subcase->expected) status = 0; /* async unlink for cleanup? */ else if (status != -ECONNRESET) { /* some faults are allowed, not required */ if (subcase->expected > 0 && ( ((status == -subcase->expected /* happened */ || status == 0)))) /* didn't */ status = 0; /* sometimes more than one fault is allowed */ else if (subcase->number == 12 && status == -EPIPE) status = 0; else ERROR(ctx->dev, "subtest %d error, status %d\n", subcase->number, status); } /* unexpected status codes mean errors; ideally, in hardware */ if (status) { error: if (ctx->status == 0) { int i; ctx->status = status; ERROR(ctx->dev, "control queue %02x.%02x, err %d, " "%d left, subcase %d, len %d/%d\n", reqp->bRequestType, reqp->bRequest, status, ctx->count, subcase->number, urb->actual_length, urb->transfer_buffer_length); /* FIXME this "unlink everything" exit route should * be a separate test case. */ /* unlink whatever's still pending */ for (i = 1; i < ctx->param->sglen; i++) { struct urb *u = ctx->urb[ (i + subcase->number) % ctx->param->sglen]; if (u == urb || !u->dev) continue; spin_unlock(&ctx->lock); status = usb_unlink_urb(u); spin_lock(&ctx->lock); switch (status) { case -EINPROGRESS: case -EBUSY: case -EIDRM: continue; default: ERROR(ctx->dev, "urb unlink --> %d\n", status); } } status = ctx->status; } } /* resubmit if we need to, else mark this as done */ if ((status == 0) && (ctx->pending < ctx->count)) { status = usb_submit_urb(urb, GFP_ATOMIC); if (status != 0) { ERROR(ctx->dev, "can't resubmit ctrl %02x.%02x, err %d\n", reqp->bRequestType, reqp->bRequest, status); urb->dev = NULL; } else ctx->pending++; } else urb->dev = NULL; /* signal completion when nothing's queued */ if (ctx->pending == 0) complete(&ctx->complete); spin_unlock_irqrestore(&ctx->lock, flags); } static int test_ctrl_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param) { struct usb_device *udev = testdev_to_usbdev(dev); struct urb **urb; struct ctrl_ctx context; int i; if (param->sglen == 0 || param->iterations > UINT_MAX / param->sglen) return -EOPNOTSUPP; spin_lock_init(&context.lock); context.dev = dev; init_completion(&context.complete); context.count = param->sglen * param->iterations; context.pending = 0; context.status = -ENOMEM; context.param = param; context.last = -1; /* allocate and init the urbs we'll queue. * as with bulk/intr sglists, sglen is the queue depth; it also * controls which subtests run (more tests than sglen) or rerun. */ urb = kcalloc(param->sglen, sizeof(struct urb *), GFP_KERNEL); if (!urb) return -ENOMEM; for (i = 0; i < param->sglen; i++) { int pipe = usb_rcvctrlpipe(udev, 0); unsigned len; struct urb *u; struct usb_ctrlrequest req; struct subcase *reqp; /* sign of this variable means: * -: tested code must return this (negative) error code * +: tested code may return this (negative too) error code */ int expected = 0; /* requests here are mostly expected to succeed on any * device, but some are chosen to trigger protocol stalls * or short reads. */ memset(&req, 0, sizeof(req)); req.bRequest = USB_REQ_GET_DESCRIPTOR; req.bRequestType = USB_DIR_IN|USB_RECIP_DEVICE; switch (i % NUM_SUBCASES) { case 0: /* get device descriptor */ req.wValue = cpu_to_le16(USB_DT_DEVICE << 8); len = sizeof(struct usb_device_descriptor); break; case 1: /* get first config descriptor (only) */ req.wValue = cpu_to_le16((USB_DT_CONFIG << 8) | 0); len = sizeof(struct usb_config_descriptor); break; case 2: /* get altsetting (OFTEN STALLS) */ req.bRequest = USB_REQ_GET_INTERFACE; req.bRequestType = USB_DIR_IN|USB_RECIP_INTERFACE; /* index = 0 means first interface */ len = 1; expected = EPIPE; break; case 3: /* get interface status */ req.bRequest = USB_REQ_GET_STATUS; req.bRequestType = USB_DIR_IN|USB_RECIP_INTERFACE; /* interface 0 */ len = 2; break; case 4: /* get device status */ req.bRequest = USB_REQ_GET_STATUS; req.bRequestType = USB_DIR_IN|USB_RECIP_DEVICE; len = 2; break; case 5: /* get device qualifier (MAY STALL) */ req.wValue = cpu_to_le16 (USB_DT_DEVICE_QUALIFIER << 8); len = sizeof(struct usb_qualifier_descriptor); if (udev->speed != USB_SPEED_HIGH) expected = EPIPE; break; case 6: /* get first config descriptor, plus interface */ req.wValue = cpu_to_le16((USB_DT_CONFIG << 8) | 0); len = sizeof(struct usb_config_descriptor); len += sizeof(struct usb_interface_descriptor); break; case 7: /* get interface descriptor (ALWAYS STALLS) */ req.wValue = cpu_to_le16 (USB_DT_INTERFACE << 8); /* interface == 0 */ len = sizeof(struct usb_interface_descriptor); expected = -EPIPE; break; /* NOTE: two consecutive stalls in the queue here. * that tests fault recovery a bit more aggressively. */ case 8: /* clear endpoint halt (MAY STALL) */ req.bRequest = USB_REQ_CLEAR_FEATURE; req.bRequestType = USB_RECIP_ENDPOINT; /* wValue 0 == ep halt */ /* wIndex 0 == ep0 (shouldn't halt!) */ len = 0; pipe = usb_sndctrlpipe(udev, 0); expected = EPIPE; break; case 9: /* get endpoint status */ req.bRequest = USB_REQ_GET_STATUS; req.bRequestType = USB_DIR_IN|USB_RECIP_ENDPOINT; /* endpoint 0 */ len = 2; break; case 10: /* trigger short read (EREMOTEIO) */ req.wValue = cpu_to_le16((USB_DT_CONFIG << 8) | 0); len = 1024; expected = -EREMOTEIO; break; /* NOTE: two consecutive _different_ faults in the queue. */ case 11: /* get endpoint descriptor (ALWAYS STALLS) */ req.wValue = cpu_to_le16(USB_DT_ENDPOINT << 8); /* endpoint == 0 */ len = sizeof(struct usb_interface_descriptor); expected = EPIPE; break; /* NOTE: sometimes even a third fault in the queue! */ case 12: /* get string 0 descriptor (MAY STALL) */ req.wValue = cpu_to_le16(USB_DT_STRING << 8); /* string == 0, for language IDs */ len = sizeof(struct usb_interface_descriptor); /* may succeed when > 4 languages */ expected = EREMOTEIO; /* or EPIPE, if no strings */ break; case 13: /* short read, resembling case 10 */ req.wValue = cpu_to_le16((USB_DT_CONFIG << 8) | 0); /* last data packet "should" be DATA1, not DATA0 */ if (udev->speed == USB_SPEED_SUPER) len = 1024 - 512; else len = 1024 - udev->descriptor.bMaxPacketSize0; expected = -EREMOTEIO; break; case 14: /* short read; try to fill the last packet */ req.wValue = cpu_to_le16((USB_DT_DEVICE << 8) | 0); /* device descriptor size == 18 bytes */ len = udev->descriptor.bMaxPacketSize0; if (udev->speed == USB_SPEED_SUPER) len = 512; switch (len) { case 8: len = 24; break; case 16: len = 32; break; } expected = -EREMOTEIO; break; case 15: req.wValue = cpu_to_le16(USB_DT_BOS << 8); if (udev->bos) len = le16_to_cpu(udev->bos->desc->wTotalLength); else len = sizeof(struct usb_bos_descriptor); if (le16_to_cpu(udev->descriptor.bcdUSB) < 0x0201) expected = -EPIPE; break; default: ERROR(dev, "bogus number of ctrl queue testcases!\n"); context.status = -EINVAL; goto cleanup; } req.wLength = cpu_to_le16(len); urb[i] = u = simple_alloc_urb(udev, pipe, len, 0); if (!u) goto cleanup; reqp = kmalloc(sizeof(*reqp), GFP_KERNEL); if (!reqp) goto cleanup; reqp->setup = req; reqp->number = i % NUM_SUBCASES; reqp->expected = expected; u->setup_packet = (char *) &reqp->setup; u->context = &context; u->complete = ctrl_complete; } /* queue the urbs */ context.urb = urb; spin_lock_irq(&context.lock); for (i = 0; i < param->sglen; i++) { context.status = usb_submit_urb(urb[i], GFP_ATOMIC); if (context.status != 0) { ERROR(dev, "can't submit urb[%d], status %d\n", i, context.status); context.count = context.pending; break; } context.pending++; } spin_unlock_irq(&context.lock); /* FIXME set timer and time out; provide a disconnect hook */ /* wait for the last one to complete */ if (context.pending > 0) wait_for_completion(&context.complete); cleanup: for (i = 0; i < param->sglen; i++) { if (!urb[i]) continue; urb[i]->dev = udev; kfree(urb[i]->setup_packet); simple_free_urb(urb[i]); } kfree(urb); return context.status; } #undef NUM_SUBCASES /*-------------------------------------------------------------------------*/ static void unlink1_callback(struct urb *urb) { int status = urb->status; /* we "know" -EPIPE (stall) never happens */ if (!status) status = usb_submit_urb(urb, GFP_ATOMIC); if (status) { urb->status = status; complete(urb->context); } } static int unlink1(struct usbtest_dev *dev, int pipe, int size, int async) { struct urb *urb; struct completion completion; int retval = 0; init_completion(&completion); urb = simple_alloc_urb(testdev_to_usbdev(dev), pipe, size, 0); if (!urb) return -ENOMEM; urb->context = &completion; urb->complete = unlink1_callback; if (usb_pipeout(urb->pipe)) { simple_fill_buf(urb); urb->transfer_flags |= URB_ZERO_PACKET; } /* keep the endpoint busy. there are lots of hc/hcd-internal * states, and testing should get to all of them over time. * * FIXME want additional tests for when endpoint is STALLing * due to errors, or is just NAKing requests. */ retval = usb_submit_urb(urb, GFP_KERNEL); if (retval != 0) { dev_err(&dev->intf->dev, "submit fail %d\n", retval); return retval; } /* unlinking that should always work. variable delay tests more * hcd states and code paths, even with little other system load. */ msleep(jiffies % (2 * INTERRUPT_RATE)); if (async) { while (!completion_done(&completion)) { retval = usb_unlink_urb(urb); if (retval == 0 && usb_pipein(urb->pipe)) retval = simple_check_buf(dev, urb); switch (retval) { case -EBUSY: case -EIDRM: /* we can't unlink urbs while they're completing * or if they've completed, and we haven't * resubmitted. "normal" drivers would prevent * resubmission, but since we're testing unlink * paths, we can't. */ ERROR(dev, "unlink retry\n"); continue; case 0: case -EINPROGRESS: break; default: dev_err(&dev->intf->dev, "unlink fail %d\n", retval); return retval; } break; } } else usb_kill_urb(urb); wait_for_completion(&completion); retval = urb->status; simple_free_urb(urb); if (async) return (retval == -ECONNRESET) ? 0 : retval - 1000; else return (retval == -ENOENT || retval == -EPERM) ? 0 : retval - 2000; } static int unlink_simple(struct usbtest_dev *dev, int pipe, int len) { int retval = 0; /* test sync and async paths */ retval = unlink1(dev, pipe, len, 1); if (!retval) retval = unlink1(dev, pipe, len, 0); return retval; } /*-------------------------------------------------------------------------*/ struct queued_ctx { struct completion complete; atomic_t pending; unsigned num; int status; struct urb **urbs; }; static void unlink_queued_callback(struct urb *urb) { int status = urb->status; struct queued_ctx *ctx = urb->context; if (ctx->status) goto done; if (urb == ctx->urbs[ctx->num - 4] || urb == ctx->urbs[ctx->num - 2]) { if (status == -ECONNRESET) goto done; /* What error should we report if the URB completed normally? */ } if (status != 0) ctx->status = status; done: if (atomic_dec_and_test(&ctx->pending)) complete(&ctx->complete); } static int unlink_queued(struct usbtest_dev *dev, int pipe, unsigned num, unsigned size) { struct queued_ctx ctx; struct usb_device *udev = testdev_to_usbdev(dev); void *buf; dma_addr_t buf_dma; int i; int retval = -ENOMEM; init_completion(&ctx.complete); atomic_set(&ctx.pending, 1); /* One more than the actual value */ ctx.num = num; ctx.status = 0; buf = usb_alloc_coherent(udev, size, GFP_KERNEL, &buf_dma); if (!buf) return retval; memset(buf, 0, size); /* Allocate and init the urbs we'll queue */ ctx.urbs = kcalloc(num, sizeof(struct urb *), GFP_KERNEL); if (!ctx.urbs) goto free_buf; for (i = 0; i < num; i++) { ctx.urbs[i] = usb_alloc_urb(0, GFP_KERNEL); if (!ctx.urbs[i]) goto free_urbs; usb_fill_bulk_urb(ctx.urbs[i], udev, pipe, buf, size, unlink_queued_callback, &ctx); ctx.urbs[i]->transfer_dma = buf_dma; ctx.urbs[i]->transfer_flags = URB_NO_TRANSFER_DMA_MAP; if (usb_pipeout(ctx.urbs[i]->pipe)) { simple_fill_buf(ctx.urbs[i]); ctx.urbs[i]->transfer_flags |= URB_ZERO_PACKET; } } /* Submit all the URBs and then unlink URBs num - 4 and num - 2. */ for (i = 0; i < num; i++) { atomic_inc(&ctx.pending); retval = usb_submit_urb(ctx.urbs[i], GFP_KERNEL); if (retval != 0) { dev_err(&dev->intf->dev, "submit urbs[%d] fail %d\n", i, retval); atomic_dec(&ctx.pending); ctx.status = retval; break; } } if (i == num) { usb_unlink_urb(ctx.urbs[num - 4]); usb_unlink_urb(ctx.urbs[num - 2]); } else { while (--i >= 0) usb_unlink_urb(ctx.urbs[i]); } if (atomic_dec_and_test(&ctx.pending)) /* The extra count */ complete(&ctx.complete); wait_for_completion(&ctx.complete); retval = ctx.status; free_urbs: for (i = 0; i < num; i++) usb_free_urb(ctx.urbs[i]); kfree(ctx.urbs); free_buf: usb_free_coherent(udev, size, buf, buf_dma); return retval; } /*-------------------------------------------------------------------------*/ static int verify_not_halted(struct usbtest_dev *tdev, int ep, struct urb *urb) { int retval; u16 status; /* shouldn't look or act halted */ retval = usb_get_std_status(urb->dev, USB_RECIP_ENDPOINT, ep, &status); if (retval < 0) { ERROR(tdev, "ep %02x couldn't get no-halt status, %d\n", ep, retval); return retval; } if (status != 0) { ERROR(tdev, "ep %02x bogus status: %04x != 0\n", ep, status); return -EINVAL; } retval = simple_io(tdev, urb, 1, 0, 0, __func__); if (retval != 0) return -EINVAL; return 0; } static int verify_halted(struct usbtest_dev *tdev, int ep, struct urb *urb) { int retval; u16 status; /* should look and act halted */ retval = usb_get_std_status(urb->dev, USB_RECIP_ENDPOINT, ep, &status); if (retval < 0) { ERROR(tdev, "ep %02x couldn't get halt status, %d\n", ep, retval); return retval; } if (status != 1) { ERROR(tdev, "ep %02x bogus status: %04x != 1\n", ep, status); return -EINVAL; } retval = simple_io(tdev, urb, 1, 0, -EPIPE, __func__); if (retval != -EPIPE) return -EINVAL; retval = simple_io(tdev, urb, 1, 0, -EPIPE, "verify_still_halted"); if (retval != -EPIPE) return -EINVAL; return 0; } static int test_halt(struct usbtest_dev *tdev, int ep, struct urb *urb) { int retval; /* shouldn't look or act halted now */ retval = verify_not_halted(tdev, ep, urb); if (retval < 0) return retval; /* set halt (protocol test only), verify it worked */ retval = usb_control_msg(urb->dev, usb_sndctrlpipe(urb->dev, 0), USB_REQ_SET_FEATURE, USB_RECIP_ENDPOINT, USB_ENDPOINT_HALT, ep, NULL, 0, USB_CTRL_SET_TIMEOUT); if (retval < 0) { ERROR(tdev, "ep %02x couldn't set halt, %d\n", ep, retval); return retval; } retval = verify_halted(tdev, ep, urb); if (retval < 0) { int ret; /* clear halt anyways, else further tests will fail */ ret = usb_clear_halt(urb->dev, urb->pipe); if (ret) ERROR(tdev, "ep %02x couldn't clear halt, %d\n", ep, ret); return retval; } /* clear halt (tests API + protocol), verify it worked */ retval = usb_clear_halt(urb->dev, urb->pipe); if (retval < 0) { ERROR(tdev, "ep %02x couldn't clear halt, %d\n", ep, retval); return retval; } retval = verify_not_halted(tdev, ep, urb); if (retval < 0) return retval; /* NOTE: could also verify SET_INTERFACE clear halts ... */ return 0; } static int test_toggle_sync(struct usbtest_dev *tdev, int ep, struct urb *urb) { int retval; /* clear initial data toggle to DATA0 */ retval = usb_clear_halt(urb->dev, urb->pipe); if (retval < 0) { ERROR(tdev, "ep %02x couldn't clear halt, %d\n", ep, retval); return retval; } /* transfer 3 data packets, should be DATA0, DATA1, DATA0 */ retval = simple_io(tdev, urb, 1, 0, 0, __func__); if (retval != 0) return -EINVAL; /* clear halt resets device side data toggle, host should react to it */ retval = usb_clear_halt(urb->dev, urb->pipe); if (retval < 0) { ERROR(tdev, "ep %02x couldn't clear halt, %d\n", ep, retval); return retval; } /* host should use DATA0 again after clear halt */ retval = simple_io(tdev, urb, 1, 0, 0, __func__); return retval; } static int halt_simple(struct usbtest_dev *dev) { int ep; int retval = 0; struct urb *urb; struct usb_device *udev = testdev_to_usbdev(dev); if (udev->speed == USB_SPEED_SUPER) urb = simple_alloc_urb(udev, 0, 1024, 0); else urb = simple_alloc_urb(udev, 0, 512, 0); if (urb == NULL) return -ENOMEM; if (dev->in_pipe) { ep = usb_pipeendpoint(dev->in_pipe) | USB_DIR_IN; urb->pipe = dev->in_pipe; retval = test_halt(dev, ep, urb); if (retval < 0) goto done; } if (dev->out_pipe) { ep = usb_pipeendpoint(dev->out_pipe); urb->pipe = dev->out_pipe; retval = test_halt(dev, ep, urb); } done: simple_free_urb(urb); return retval; } static int toggle_sync_simple(struct usbtest_dev *dev) { int ep; int retval = 0; struct urb *urb; struct usb_device *udev = testdev_to_usbdev(dev); unsigned maxp = get_maxpacket(udev, dev->out_pipe); /* * Create a URB that causes a transfer of uneven amount of data packets * This way the clear toggle has an impact on the data toggle sequence. * Use 2 maxpacket length packets and one zero packet. */ urb = simple_alloc_urb(udev, 0, 2 * maxp, 0); if (urb == NULL) return -ENOMEM; urb->transfer_flags |= URB_ZERO_PACKET; ep = usb_pipeendpoint(dev->out_pipe); urb->pipe = dev->out_pipe; retval = test_toggle_sync(dev, ep, urb); simple_free_urb(urb); return retval; } /*-------------------------------------------------------------------------*/ /* Control OUT tests use the vendor control requests from Intel's * USB 2.0 compliance test device: write a buffer, read it back. * * Intel's spec only _requires_ that it work for one packet, which * is pretty weak. Some HCDs place limits here; most devices will * need to be able to handle more than one OUT data packet. We'll * try whatever we're told to try. */ static int ctrl_out(struct usbtest_dev *dev, unsigned count, unsigned length, unsigned vary, unsigned offset) { unsigned i, j, len; int retval; u8 *buf; char *what = "?"; struct usb_device *udev; if (length < 1 || length > 0xffff || vary >= length) return -EINVAL; buf = kmalloc(length + offset, GFP_KERNEL); if (!buf) return -ENOMEM; buf += offset; udev = testdev_to_usbdev(dev); len = length; retval = 0; /* NOTE: hardware might well act differently if we pushed it * with lots back-to-back queued requests. */ for (i = 0; i < count; i++) { /* write patterned data */ for (j = 0; j < len; j++) buf[j] = (u8)(i + j); retval = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), 0x5b, USB_DIR_OUT|USB_TYPE_VENDOR, 0, 0, buf, len, USB_CTRL_SET_TIMEOUT); if (retval != len) { what = "write"; if (retval >= 0) { ERROR(dev, "ctrl_out, wlen %d (expected %d)\n", retval, len); retval = -EBADMSG; } break; } /* read it back -- assuming nothing intervened!! */ retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0), 0x5c, USB_DIR_IN|USB_TYPE_VENDOR, 0, 0, buf, len, USB_CTRL_GET_TIMEOUT); if (retval != len) { what = "read"; if (retval >= 0) { ERROR(dev, "ctrl_out, rlen %d (expected %d)\n", retval, len); retval = -EBADMSG; } break; } /* fail if we can't verify */ for (j = 0; j < len; j++) { if (buf[j] != (u8)(i + j)) { ERROR(dev, "ctrl_out, byte %d is %d not %d\n", j, buf[j], (u8)(i + j)); retval = -EBADMSG; break; } } if (retval < 0) { what = "verify"; break; } len += vary; /* [real world] the "zero bytes IN" case isn't really used. * hardware can easily trip up in this weird case, since its * status stage is IN, not OUT like other ep0in transfers. */ if (len > length) len = realworld ? 1 : 0; } if (retval < 0) ERROR(dev, "ctrl_out %s failed, code %d, count %d\n", what, retval, i); kfree(buf - offset); return retval; } /*-------------------------------------------------------------------------*/ /* ISO/BULK tests ... mimics common usage * - buffer length is split into N packets (mostly maxpacket sized) * - multi-buffers according to sglen */ struct transfer_context { unsigned count; unsigned pending; spinlock_t lock; struct completion done; int submit_error; unsigned long errors; unsigned long packet_count; struct usbtest_dev *dev; bool is_iso; }; static void complicated_callback(struct urb *urb) { struct transfer_context *ctx = urb->context; unsigned long flags; spin_lock_irqsave(&ctx->lock, flags); ctx->count--; ctx->packet_count += urb->number_of_packets; if (urb->error_count > 0) ctx->errors += urb->error_count; else if (urb->status != 0) ctx->errors += (ctx->is_iso ? urb->number_of_packets : 1); else if (urb->actual_length != urb->transfer_buffer_length) ctx->errors++; else if (check_guard_bytes(ctx->dev, urb) != 0) ctx->errors++; if (urb->status == 0 && ctx->count > (ctx->pending - 1) && !ctx->submit_error) { int status = usb_submit_urb(urb, GFP_ATOMIC); switch (status) { case 0: goto done; default: dev_err(&ctx->dev->intf->dev, "resubmit err %d\n", status); fallthrough; case -ENODEV: /* disconnected */ case -ESHUTDOWN: /* endpoint disabled */ ctx->submit_error = 1; break; } } ctx->pending--; if (ctx->pending == 0) { if (ctx->errors) dev_err(&ctx->dev->intf->dev, "during the test, %lu errors out of %lu\n", ctx->errors, ctx->packet_count); complete(&ctx->done); } done: spin_unlock_irqrestore(&ctx->lock, flags); } static struct urb *iso_alloc_urb( struct usb_device *udev, int pipe, struct usb_endpoint_descriptor *desc, long bytes, unsigned offset ) { struct urb *urb; unsigned i, maxp, packets; if (bytes < 0 || !desc) return NULL; maxp = usb_endpoint_maxp(desc); if (udev->speed >= USB_SPEED_SUPER) maxp *= ss_isoc_get_packet_num(udev, pipe); else maxp *= usb_endpoint_maxp_mult(desc); packets = DIV_ROUND_UP(bytes, maxp); urb = usb_alloc_urb(packets, GFP_KERNEL); if (!urb) return urb; urb->dev = udev; urb->pipe = pipe; urb->number_of_packets = packets; urb->transfer_buffer_length = bytes; urb->transfer_buffer = usb_alloc_coherent(udev, bytes + offset, GFP_KERNEL, &urb->transfer_dma); if (!urb->transfer_buffer) { usb_free_urb(urb); return NULL; } if (offset) { memset(urb->transfer_buffer, GUARD_BYTE, offset); urb->transfer_buffer += offset; urb->transfer_dma += offset; } /* For inbound transfers use guard byte so that test fails if data not correctly copied */ memset(urb->transfer_buffer, usb_pipein(urb->pipe) ? GUARD_BYTE : 0, bytes); for (i = 0; i < packets; i++) { /* here, only the last packet will be short */ urb->iso_frame_desc[i].length = min((unsigned) bytes, maxp); bytes -= urb->iso_frame_desc[i].length; urb->iso_frame_desc[i].offset = maxp * i; } urb->complete = complicated_callback; /* urb->context = SET BY CALLER */ urb->interval = 1 << (desc->bInterval - 1); urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP; return urb; } static int test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, int pipe, struct usb_endpoint_descriptor *desc, unsigned offset) { struct transfer_context context; struct usb_device *udev; unsigned i; unsigned long packets = 0; int status = 0; struct urb **urbs; if (!param->sglen || param->iterations > UINT_MAX / param->sglen) return -EINVAL; if (param->sglen > MAX_SGLEN) return -EINVAL; urbs = kcalloc(param->sglen, sizeof(*urbs), GFP_KERNEL); if (!urbs) return -ENOMEM; memset(&context, 0, sizeof(context)); context.count = param->iterations * param->sglen; context.dev = dev; context.is_iso = !!desc; init_completion(&context.done); spin_lock_init(&context.lock); udev = testdev_to_usbdev(dev); for (i = 0; i < param->sglen; i++) { if (context.is_iso) urbs[i] = iso_alloc_urb(udev, pipe, desc, param->length, offset); else urbs[i] = complicated_alloc_urb(udev, pipe, param->length, 0); if (!urbs[i]) { status = -ENOMEM; goto fail; } packets += urbs[i]->number_of_packets; urbs[i]->context = &context; } packets *= param->iterations; if (context.is_iso) { int transaction_num; if (udev->speed >= USB_SPEED_SUPER) transaction_num = ss_isoc_get_packet_num(udev, pipe); else transaction_num = usb_endpoint_maxp_mult(desc); dev_info(&dev->intf->dev, "iso period %d %sframes, wMaxPacket %d, transactions: %d\n", 1 << (desc->bInterval - 1), (udev->speed >= USB_SPEED_HIGH) ? "micro" : "", usb_endpoint_maxp(desc), transaction_num); dev_info(&dev->intf->dev, "total %lu msec (%lu packets)\n", (packets * (1 << (desc->bInterval - 1))) / ((udev->speed >= USB_SPEED_HIGH) ? 8 : 1), packets); } spin_lock_irq(&context.lock); for (i = 0; i < param->sglen; i++) { ++context.pending; status = usb_submit_urb(urbs[i], GFP_ATOMIC); if (status < 0) { ERROR(dev, "submit iso[%d], error %d\n", i, status); if (i == 0) { spin_unlock_irq(&context.lock); goto fail; } simple_free_urb(urbs[i]); urbs[i] = NULL; context.pending--; context.submit_error = 1; break; } } spin_unlock_irq(&context.lock); wait_for_completion(&context.done); for (i = 0; i < param->sglen; i++) { if (urbs[i]) simple_free_urb(urbs[i]); } /* * Isochronous transfers are expected to fail sometimes. As an * arbitrary limit, we will report an error if any submissions * fail or if the transfer failure rate is > 10%. */ if (status != 0) ; else if (context.submit_error) status = -EACCES; else if (context.errors > (context.is_iso ? context.packet_count / 10 : 0)) status = -EIO; kfree(urbs); return status; fail: for (i = 0; i < param->sglen; i++) { if (urbs[i]) simple_free_urb(urbs[i]); } kfree(urbs); return status; } static int test_unaligned_bulk( struct usbtest_dev *tdev, int pipe, unsigned length, int iterations, unsigned transfer_flags, const char *label) { int retval; struct urb *urb = usbtest_alloc_urb(testdev_to_usbdev(tdev), pipe, length, transfer_flags, 1, 0, simple_callback); if (!urb) return -ENOMEM; retval = simple_io(tdev, urb, iterations, 0, 0, label); simple_free_urb(urb); return retval; } /* Run tests. */ static int usbtest_do_ioctl(struct usb_interface *intf, struct usbtest_param_32 *param) { struct usbtest_dev *dev = usb_get_intfdata(intf); struct usb_device *udev = testdev_to_usbdev(dev); struct urb *urb; struct scatterlist *sg; struct usb_sg_request req; unsigned i; int retval = -EOPNOTSUPP; if (param->iterations <= 0) return -EINVAL; if (param->sglen > MAX_SGLEN) return -EINVAL; /* * Just a bunch of test cases that every HCD is expected to handle. * * Some may need specific firmware, though it'd be good to have * one firmware image to handle all the test cases. * * FIXME add more tests! cancel requests, verify the data, control * queueing, concurrent read+write threads, and so on. */ switch (param->test_num) { case 0: dev_info(&intf->dev, "TEST 0: NOP\n"); retval = 0; break; /* Simple non-queued bulk I/O tests */ case 1: if (dev->out_pipe == 0) break; dev_info(&intf->dev, "TEST 1: write %d bytes %u times\n", param->length, param->iterations); urb = simple_alloc_urb(udev, dev->out_pipe, param->length, 0); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: bulk sink (maybe accepts short writes) */ retval = simple_io(dev, urb, param->iterations, 0, 0, "test1"); simple_free_urb(urb); break; case 2: if (dev->in_pipe == 0) break; dev_info(&intf->dev, "TEST 2: read %d bytes %u times\n", param->length, param->iterations); urb = simple_alloc_urb(udev, dev->in_pipe, param->length, 0); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: bulk source (maybe generates short writes) */ retval = simple_io(dev, urb, param->iterations, 0, 0, "test2"); simple_free_urb(urb); break; case 3: if (dev->out_pipe == 0 || param->vary == 0) break; dev_info(&intf->dev, "TEST 3: write/%d 0..%d bytes %u times\n", param->vary, param->length, param->iterations); urb = simple_alloc_urb(udev, dev->out_pipe, param->length, 0); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: bulk sink (maybe accepts short writes) */ retval = simple_io(dev, urb, param->iterations, param->vary, 0, "test3"); simple_free_urb(urb); break; case 4: if (dev->in_pipe == 0 || param->vary == 0) break; dev_info(&intf->dev, "TEST 4: read/%d 0..%d bytes %u times\n", param->vary, param->length, param->iterations); urb = simple_alloc_urb(udev, dev->in_pipe, param->length, 0); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: bulk source (maybe generates short writes) */ retval = simple_io(dev, urb, param->iterations, param->vary, 0, "test4"); simple_free_urb(urb); break; /* Queued bulk I/O tests */ case 5: if (dev->out_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 5: write %d sglists %d entries of %d bytes\n", param->iterations, param->sglen, param->length); sg = alloc_sglist(param->sglen, param->length, 0, dev, dev->out_pipe); if (!sg) { retval = -ENOMEM; break; } /* FIRMWARE: bulk sink (maybe accepts short writes) */ retval = perform_sglist(dev, param->iterations, dev->out_pipe, &req, sg, param->sglen); free_sglist(sg, param->sglen); break; case 6: if (dev->in_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 6: read %d sglists %d entries of %d bytes\n", param->iterations, param->sglen, param->length); sg = alloc_sglist(param->sglen, param->length, 0, dev, dev->in_pipe); if (!sg) { retval = -ENOMEM; break; } /* FIRMWARE: bulk source (maybe generates short writes) */ retval = perform_sglist(dev, param->iterations, dev->in_pipe, &req, sg, param->sglen); free_sglist(sg, param->sglen); break; case 7: if (dev->out_pipe == 0 || param->sglen == 0 || param->vary == 0) break; dev_info(&intf->dev, "TEST 7: write/%d %d sglists %d entries 0..%d bytes\n", param->vary, param->iterations, param->sglen, param->length); sg = alloc_sglist(param->sglen, param->length, param->vary, dev, dev->out_pipe); if (!sg) { retval = -ENOMEM; break; } /* FIRMWARE: bulk sink (maybe accepts short writes) */ retval = perform_sglist(dev, param->iterations, dev->out_pipe, &req, sg, param->sglen); free_sglist(sg, param->sglen); break; case 8: if (dev->in_pipe == 0 || param->sglen == 0 || param->vary == 0) break; dev_info(&intf->dev, "TEST 8: read/%d %d sglists %d entries 0..%d bytes\n", param->vary, param->iterations, param->sglen, param->length); sg = alloc_sglist(param->sglen, param->length, param->vary, dev, dev->in_pipe); if (!sg) { retval = -ENOMEM; break; } /* FIRMWARE: bulk source (maybe generates short writes) */ retval = perform_sglist(dev, param->iterations, dev->in_pipe, &req, sg, param->sglen); free_sglist(sg, param->sglen); break; /* non-queued sanity tests for control (chapter 9 subset) */ case 9: retval = 0; dev_info(&intf->dev, "TEST 9: ch9 (subset) control tests, %d times\n", param->iterations); for (i = param->iterations; retval == 0 && i--; /* NOP */) retval = ch9_postconfig(dev); if (retval) dev_err(&intf->dev, "ch9 subset failed, " "iterations left %d\n", i); break; /* queued control messaging */ case 10: retval = 0; dev_info(&intf->dev, "TEST 10: queue %d control calls, %d times\n", param->sglen, param->iterations); retval = test_ctrl_queue(dev, param); break; /* simple non-queued unlinks (ring with one urb) */ case 11: if (dev->in_pipe == 0 || !param->length) break; retval = 0; dev_info(&intf->dev, "TEST 11: unlink %d reads of %d\n", param->iterations, param->length); for (i = param->iterations; retval == 0 && i--; /* NOP */) retval = unlink_simple(dev, dev->in_pipe, param->length); if (retval) dev_err(&intf->dev, "unlink reads failed %d, " "iterations left %d\n", retval, i); break; case 12: if (dev->out_pipe == 0 || !param->length) break; retval = 0; dev_info(&intf->dev, "TEST 12: unlink %d writes of %d\n", param->iterations, param->length); for (i = param->iterations; retval == 0 && i--; /* NOP */) retval = unlink_simple(dev, dev->out_pipe, param->length); if (retval) dev_err(&intf->dev, "unlink writes failed %d, " "iterations left %d\n", retval, i); break; /* ep halt tests */ case 13: if (dev->out_pipe == 0 && dev->in_pipe == 0) break; retval = 0; dev_info(&intf->dev, "TEST 13: set/clear %d halts\n", param->iterations); for (i = param->iterations; retval == 0 && i--; /* NOP */) retval = halt_simple(dev); if (retval) ERROR(dev, "halts failed, iterations left %d\n", i); break; /* control write tests */ case 14: if (!dev->info->ctrl_out) break; dev_info(&intf->dev, "TEST 14: %d ep0out, %d..%d vary %d\n", param->iterations, realworld ? 1 : 0, param->length, param->vary); retval = ctrl_out(dev, param->iterations, param->length, param->vary, 0); break; /* iso write tests */ case 15: if (dev->out_iso_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 15: write %d iso, %d entries of %d bytes\n", param->iterations, param->sglen, param->length); /* FIRMWARE: iso sink */ retval = test_queue(dev, param, dev->out_iso_pipe, dev->iso_out, 0); break; /* iso read tests */ case 16: if (dev->in_iso_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 16: read %d iso, %d entries of %d bytes\n", param->iterations, param->sglen, param->length); /* FIRMWARE: iso source */ retval = test_queue(dev, param, dev->in_iso_pipe, dev->iso_in, 0); break; /* FIXME scatterlist cancel (needs helper thread) */ /* Tests for bulk I/O using DMA mapping by core and odd address */ case 17: if (dev->out_pipe == 0) break; dev_info(&intf->dev, "TEST 17: write odd addr %d bytes %u times core map\n", param->length, param->iterations); retval = test_unaligned_bulk( dev, dev->out_pipe, param->length, param->iterations, 0, "test17"); break; case 18: if (dev->in_pipe == 0) break; dev_info(&intf->dev, "TEST 18: read odd addr %d bytes %u times core map\n", param->length, param->iterations); retval = test_unaligned_bulk( dev, dev->in_pipe, param->length, param->iterations, 0, "test18"); break; /* Tests for bulk I/O using premapped coherent buffer and odd address */ case 19: if (dev->out_pipe == 0) break; dev_info(&intf->dev, "TEST 19: write odd addr %d bytes %u times premapped\n", param->length, param->iterations); retval = test_unaligned_bulk( dev, dev->out_pipe, param->length, param->iterations, URB_NO_TRANSFER_DMA_MAP, "test19"); break; case 20: if (dev->in_pipe == 0) break; dev_info(&intf->dev, "TEST 20: read odd addr %d bytes %u times premapped\n", param->length, param->iterations); retval = test_unaligned_bulk( dev, dev->in_pipe, param->length, param->iterations, URB_NO_TRANSFER_DMA_MAP, "test20"); break; /* control write tests with unaligned buffer */ case 21: if (!dev->info->ctrl_out) break; dev_info(&intf->dev, "TEST 21: %d ep0out odd addr, %d..%d vary %d\n", param->iterations, realworld ? 1 : 0, param->length, param->vary); retval = ctrl_out(dev, param->iterations, param->length, param->vary, 1); break; /* unaligned iso tests */ case 22: if (dev->out_iso_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 22: write %d iso odd, %d entries of %d bytes\n", param->iterations, param->sglen, param->length); retval = test_queue(dev, param, dev->out_iso_pipe, dev->iso_out, 1); break; case 23: if (dev->in_iso_pipe == 0 || param->sglen == 0) break; dev_info(&intf->dev, "TEST 23: read %d iso odd, %d entries of %d bytes\n", param->iterations, param->sglen, param->length); retval = test_queue(dev, param, dev->in_iso_pipe, dev->iso_in, 1); break; /* unlink URBs from a bulk-OUT queue */ case 24: if (dev->out_pipe == 0 || !param->length || param->sglen < 4) break; retval = 0; dev_info(&intf->dev, "TEST 24: unlink from %d queues of " "%d %d-byte writes\n", param->iterations, param->sglen, param->length); for (i = param->iterations; retval == 0 && i > 0; --i) { retval = unlink_queued(dev, dev->out_pipe, param->sglen, param->length); if (retval) { dev_err(&intf->dev, "unlink queued writes failed %d, " "iterations left %d\n", retval, i); break; } } break; /* Simple non-queued interrupt I/O tests */ case 25: if (dev->out_int_pipe == 0) break; dev_info(&intf->dev, "TEST 25: write %d bytes %u times\n", param->length, param->iterations); urb = simple_alloc_urb(udev, dev->out_int_pipe, param->length, dev->int_out->bInterval); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: interrupt sink (maybe accepts short writes) */ retval = simple_io(dev, urb, param->iterations, 0, 0, "test25"); simple_free_urb(urb); break; case 26: if (dev->in_int_pipe == 0) break; dev_info(&intf->dev, "TEST 26: read %d bytes %u times\n", param->length, param->iterations); urb = simple_alloc_urb(udev, dev->in_int_pipe, param->length, dev->int_in->bInterval); if (!urb) { retval = -ENOMEM; break; } /* FIRMWARE: interrupt source (maybe generates short writes) */ retval = simple_io(dev, urb, param->iterations, 0, 0, "test26"); simple_free_urb(urb); break; case 27: /* We do performance test, so ignore data compare */ if (dev->out_pipe == 0 || param->sglen == 0 || pattern != 0) break; dev_info(&intf->dev, "TEST 27: bulk write %dMbytes\n", (param->iterations * param->sglen * param->length) / (1024 * 1024)); retval = test_queue(dev, param, dev->out_pipe, NULL, 0); break; case 28: if (dev->in_pipe == 0 || param->sglen == 0 || pattern != 0) break; dev_info(&intf->dev, "TEST 28: bulk read %dMbytes\n", (param->iterations * param->sglen * param->length) / (1024 * 1024)); retval = test_queue(dev, param, dev->in_pipe, NULL, 0); break; /* Test data Toggle/seq_nr clear between bulk out transfers */ case 29: if (dev->out_pipe == 0) break; retval = 0; dev_info(&intf->dev, "TEST 29: Clear toggle between bulk writes %d times\n", param->iterations); for (i = param->iterations; retval == 0 && i > 0; --i) retval = toggle_sync_simple(dev); if (retval) ERROR(dev, "toggle sync failed, iterations left %d\n", i); break; } return retval; } /*-------------------------------------------------------------------------*/ /* We only have this one interface to user space, through usbfs. * User mode code can scan usbfs to find N different devices (maybe on * different busses) to use when testing, and allocate one thread per * test. So discovery is simplified, and we have no device naming issues. * * Don't use these only as stress/load tests. Use them along with * other USB bus activity: plugging, unplugging, mousing, mp3 playback, * video capture, and so on. Run different tests at different times, in * different sequences. Nothing here should interact with other devices, * except indirectly by consuming USB bandwidth and CPU resources for test * threads and request completion. But the only way to know that for sure * is to test when HC queues are in use by many devices. * * WARNING: Because usbfs grabs udev->dev.sem before calling this ioctl(), * it locks out usbcore in certain code paths. Notably, if you disconnect * the device-under-test, hub_wq will wait block forever waiting for the * ioctl to complete ... so that usb_disconnect() can abort the pending * urbs and then call usbtest_disconnect(). To abort a test, you're best * off just killing the userspace task and waiting for it to exit. */ static int usbtest_ioctl(struct usb_interface *intf, unsigned int code, void *buf) { struct usbtest_dev *dev = usb_get_intfdata(intf); struct usbtest_param_64 *param_64 = buf; struct usbtest_param_32 temp; struct usbtest_param_32 *param_32 = buf; struct timespec64 start; struct timespec64 end; struct timespec64 duration; int retval = -EOPNOTSUPP; /* FIXME USBDEVFS_CONNECTINFO doesn't say how fast the device is. */ pattern = mod_pattern; if (mutex_lock_interruptible(&dev->lock)) return -ERESTARTSYS; /* FIXME: What if a system sleep starts while a test is running? */ /* some devices, like ez-usb default devices, need a non-default * altsetting to have any active endpoints. some tests change * altsettings; force a default so most tests don't need to check. */ if (dev->info->alt >= 0) { if (intf->altsetting->desc.bInterfaceNumber) { retval = -ENODEV; goto free_mutex; } retval = set_altsetting(dev, dev->info->alt); if (retval) { dev_err(&intf->dev, "set altsetting to %d failed, %d\n", dev->info->alt, retval); goto free_mutex; } } switch (code) { case USBTEST_REQUEST_64: temp.test_num = param_64->test_num; temp.iterations = param_64->iterations; temp.length = param_64->length; temp.sglen = param_64->sglen; temp.vary = param_64->vary; param_32 = &temp; break; case USBTEST_REQUEST_32: break; default: retval = -EOPNOTSUPP; goto free_mutex; } ktime_get_ts64(&start); retval = usbtest_do_ioctl(intf, param_32); if (retval < 0) goto free_mutex; ktime_get_ts64(&end); duration = timespec64_sub(end, start); temp.duration_sec = duration.tv_sec; temp.duration_usec = duration.tv_nsec/NSEC_PER_USEC; switch (code) { case USBTEST_REQUEST_32: param_32->duration_sec = temp.duration_sec; param_32->duration_usec = temp.duration_usec; break; case USBTEST_REQUEST_64: param_64->duration_sec = temp.duration_sec; param_64->duration_usec = temp.duration_usec; break; } free_mutex: mutex_unlock(&dev->lock); return retval; } /*-------------------------------------------------------------------------*/ static unsigned force_interrupt; module_param(force_interrupt, uint, 0); MODULE_PARM_DESC(force_interrupt, "0 = test default; else interrupt"); #ifdef GENERIC static unsigned short vendor; module_param(vendor, ushort, 0); MODULE_PARM_DESC(vendor, "vendor code (from usb-if)"); static unsigned short product; module_param(product, ushort, 0); MODULE_PARM_DESC(product, "product code (from vendor)"); #endif static int usbtest_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_device *udev; struct usbtest_dev *dev; struct usbtest_info *info; char *rtest, *wtest; char *irtest, *iwtest; char *intrtest, *intwtest; udev = interface_to_usbdev(intf); #ifdef GENERIC /* specify devices by module parameters? */ if (id->match_flags == 0) { /* vendor match required, product match optional */ if (!vendor || le16_to_cpu(udev->descriptor.idVendor) != (u16)vendor) return -ENODEV; if (product && le16_to_cpu(udev->descriptor.idProduct) != (u16)product) return -ENODEV; dev_info(&intf->dev, "matched module params, " "vend=0x%04x prod=0x%04x\n", le16_to_cpu(udev->descriptor.idVendor), le16_to_cpu(udev->descriptor.idProduct)); } #endif dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return -ENOMEM; info = (struct usbtest_info *) id->driver_info; dev->info = info; mutex_init(&dev->lock); dev->intf = intf; /* cacheline-aligned scratch for i/o */ dev->buf = kmalloc(TBUF_SIZE, GFP_KERNEL); if (dev->buf == NULL) { kfree(dev); return -ENOMEM; } /* NOTE this doesn't yet test the handful of difference that are * visible with high speed interrupts: bigger maxpacket (1K) and * "high bandwidth" modes (up to 3 packets/uframe). */ rtest = wtest = ""; irtest = iwtest = ""; intrtest = intwtest = ""; if (force_interrupt || udev->speed == USB_SPEED_LOW) { if (info->ep_in) { dev->in_pipe = usb_rcvintpipe(udev, info->ep_in); rtest = " intr-in"; } if (info->ep_out) { dev->out_pipe = usb_sndintpipe(udev, info->ep_out); wtest = " intr-out"; } } else { if (override_alt >= 0 || info->autoconf) { int status; status = get_endpoints(dev, intf); if (status < 0) { WARNING(dev, "couldn't get endpoints, %d\n", status); kfree(dev->buf); kfree(dev); return status; } /* may find bulk or ISO pipes */ } else { if (info->ep_in) dev->in_pipe = usb_rcvbulkpipe(udev, info->ep_in); if (info->ep_out) dev->out_pipe = usb_sndbulkpipe(udev, info->ep_out); } if (dev->in_pipe) rtest = " bulk-in"; if (dev->out_pipe) wtest = " bulk-out"; if (dev->in_iso_pipe) irtest = " iso-in"; if (dev->out_iso_pipe) iwtest = " iso-out"; if (dev->in_int_pipe) intrtest = " int-in"; if (dev->out_int_pipe) intwtest = " int-out"; } usb_set_intfdata(intf, dev); dev_info(&intf->dev, "%s\n", info->name); dev_info(&intf->dev, "%s {control%s%s%s%s%s%s%s} tests%s\n", usb_speed_string(udev->speed), info->ctrl_out ? " in/out" : "", rtest, wtest, irtest, iwtest, intrtest, intwtest, info->alt >= 0 ? " (+alt)" : ""); return 0; } static int usbtest_suspend(struct usb_interface *intf, pm_message_t message) { return 0; } static int usbtest_resume(struct usb_interface *intf) { return 0; } static void usbtest_disconnect(struct usb_interface *intf) { struct usbtest_dev *dev = usb_get_intfdata(intf); usb_set_intfdata(intf, NULL); dev_dbg(&intf->dev, "disconnect\n"); kfree(dev->buf); kfree(dev); } /* Basic testing only needs a device that can source or sink bulk traffic. * Any device can test control transfers (default with GENERIC binding). * * Several entries work with the default EP0 implementation that's built * into EZ-USB chips. There's a default vendor ID which can be overridden * by (very) small config EEPROMS, but otherwise all these devices act * identically until firmware is loaded: only EP0 works. It turns out * to be easy to make other endpoints work, without modifying that EP0 * behavior. For now, we expect that kind of firmware. */ /* an21xx or fx versions of ez-usb */ static struct usbtest_info ez1_info = { .name = "EZ-USB device", .ep_in = 2, .ep_out = 2, .alt = 1, }; /* fx2 version of ez-usb */ static struct usbtest_info ez2_info = { .name = "FX2 device", .ep_in = 6, .ep_out = 2, .alt = 1, }; /* ezusb family device with dedicated usb test firmware, */ static struct usbtest_info fw_info = { .name = "usb test device", .ep_in = 2, .ep_out = 2, .alt = 1, .autoconf = 1, /* iso and ctrl_out need autoconf */ .ctrl_out = 1, .iso = 1, /* iso_ep's are #8 in/out */ }; /* peripheral running Linux and 'zero.c' test firmware, or * its user-mode cousin. different versions of this use * different hardware with the same vendor/product codes. * host side MUST rely on the endpoint descriptors. */ static struct usbtest_info gz_info = { .name = "Linux gadget zero", .autoconf = 1, .ctrl_out = 1, .iso = 1, .intr = 1, .alt = 0, }; static struct usbtest_info um_info = { .name = "Linux user mode test driver", .autoconf = 1, .alt = -1, }; static struct usbtest_info um2_info = { .name = "Linux user mode ISO test driver", .autoconf = 1, .iso = 1, .alt = -1, }; #ifdef IBOT2 /* this is a nice source of high speed bulk data; * uses an FX2, with firmware provided in the device */ static struct usbtest_info ibot2_info = { .name = "iBOT2 webcam", .ep_in = 2, .alt = -1, }; #endif #ifdef GENERIC /* we can use any device to test control traffic */ static struct usbtest_info generic_info = { .name = "Generic USB device", .alt = -1, }; #endif static const struct usb_device_id id_table[] = { /*-------------------------------------------------------------*/ /* EZ-USB devices which download firmware to replace (or in our * case augment) the default device implementation. */ /* generic EZ-USB FX controller */ { USB_DEVICE(0x0547, 0x2235), .driver_info = (unsigned long) &ez1_info, }, /* CY3671 development board with EZ-USB FX */ { USB_DEVICE(0x0547, 0x0080), .driver_info = (unsigned long) &ez1_info, }, /* generic EZ-USB FX2 controller (or development board) */ { USB_DEVICE(0x04b4, 0x8613), .driver_info = (unsigned long) &ez2_info, }, /* re-enumerated usb test device firmware */ { USB_DEVICE(0xfff0, 0xfff0), .driver_info = (unsigned long) &fw_info, }, /* "Gadget Zero" firmware runs under Linux */ { USB_DEVICE(0x0525, 0xa4a0), .driver_info = (unsigned long) &gz_info, }, /* so does a user-mode variant */ { USB_DEVICE(0x0525, 0xa4a4), .driver_info = (unsigned long) &um_info, }, /* ... and a user-mode variant that talks iso */ { USB_DEVICE(0x0525, 0xa4a3), .driver_info = (unsigned long) &um2_info, }, #ifdef KEYSPAN_19Qi /* Keyspan 19qi uses an21xx (original EZ-USB) */ /* this does not coexist with the real Keyspan 19qi driver! */ { USB_DEVICE(0x06cd, 0x010b), .driver_info = (unsigned long) &ez1_info, }, #endif /*-------------------------------------------------------------*/ #ifdef IBOT2 /* iBOT2 makes a nice source of high speed bulk-in data */ /* this does not coexist with a real iBOT2 driver! */ { USB_DEVICE(0x0b62, 0x0059), .driver_info = (unsigned long) &ibot2_info, }, #endif /*-------------------------------------------------------------*/ #ifdef GENERIC /* module params can specify devices to use for control tests */ { .driver_info = (unsigned long) &generic_info, }, #endif /*-------------------------------------------------------------*/ { } }; MODULE_DEVICE_TABLE(usb, id_table); static struct usb_driver usbtest_driver = { .name = "usbtest", .id_table = id_table, .probe = usbtest_probe, .unlocked_ioctl = usbtest_ioctl, .disconnect = usbtest_disconnect, .suspend = usbtest_suspend, .resume = usbtest_resume, }; /*-------------------------------------------------------------------------*/ static int __init usbtest_init(void) { #ifdef GENERIC if (vendor) pr_debug("params: vend=0x%04x prod=0x%04x\n", vendor, product); #endif return usb_register(&usbtest_driver); } module_init(usbtest_init); static void __exit usbtest_exit(void) { usb_deregister(&usbtest_driver); } module_exit(usbtest_exit); MODULE_DESCRIPTION("USB Core/HCD Testing Driver"); MODULE_LICENSE("GPL"); |
| 6 6 6 6 6 6 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 | /* * Copyright (c) 2014 Chelsio, Inc. All rights reserved. * Copyright (c) 2014 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include "iwpm_util.h" #define IWPM_MAPINFO_HASH_SIZE 512 #define IWPM_MAPINFO_HASH_MASK (IWPM_MAPINFO_HASH_SIZE - 1) #define IWPM_REMINFO_HASH_SIZE 64 #define IWPM_REMINFO_HASH_MASK (IWPM_REMINFO_HASH_SIZE - 1) #define IWPM_MSG_SIZE 512 static LIST_HEAD(iwpm_nlmsg_req_list); static DEFINE_SPINLOCK(iwpm_nlmsg_req_lock); static struct hlist_head *iwpm_hash_bucket; static DEFINE_SPINLOCK(iwpm_mapinfo_lock); static struct hlist_head *iwpm_reminfo_bucket; static DEFINE_SPINLOCK(iwpm_reminfo_lock); static struct iwpm_admin_data iwpm_admin; /** * iwpm_init - Allocate resources for the iwarp port mapper * @nl_client: The index of the netlink client * * Should be called when network interface goes up. */ int iwpm_init(u8 nl_client) { iwpm_hash_bucket = kcalloc(IWPM_MAPINFO_HASH_SIZE, sizeof(struct hlist_head), GFP_KERNEL); if (!iwpm_hash_bucket) return -ENOMEM; iwpm_reminfo_bucket = kcalloc(IWPM_REMINFO_HASH_SIZE, sizeof(struct hlist_head), GFP_KERNEL); if (!iwpm_reminfo_bucket) { kfree(iwpm_hash_bucket); return -ENOMEM; } iwpm_set_registration(nl_client, IWPM_REG_UNDEF); pr_debug("%s: Mapinfo and reminfo tables are created\n", __func__); return 0; } static void free_hash_bucket(void); static void free_reminfo_bucket(void); /** * iwpm_exit - Deallocate resources for the iwarp port mapper * @nl_client: The index of the netlink client * * Should be called when network interface goes down. */ int iwpm_exit(u8 nl_client) { free_hash_bucket(); free_reminfo_bucket(); pr_debug("%s: Resources are destroyed\n", __func__); iwpm_set_registration(nl_client, IWPM_REG_UNDEF); return 0; } static struct hlist_head *get_mapinfo_hash_bucket(struct sockaddr_storage *, struct sockaddr_storage *); /** * iwpm_create_mapinfo - Store local and mapped IPv4/IPv6 address * info in a hash table * @local_sockaddr: Local ip/tcp address * @mapped_sockaddr: Mapped local ip/tcp address * @nl_client: The index of the netlink client * @map_flags: IWPM mapping flags */ int iwpm_create_mapinfo(struct sockaddr_storage *local_sockaddr, struct sockaddr_storage *mapped_sockaddr, u8 nl_client, u32 map_flags) { struct hlist_head *hash_bucket_head = NULL; struct iwpm_mapping_info *map_info; unsigned long flags; int ret = -EINVAL; map_info = kzalloc(sizeof(struct iwpm_mapping_info), GFP_KERNEL); if (!map_info) return -ENOMEM; memcpy(&map_info->local_sockaddr, local_sockaddr, sizeof(struct sockaddr_storage)); memcpy(&map_info->mapped_sockaddr, mapped_sockaddr, sizeof(struct sockaddr_storage)); map_info->nl_client = nl_client; map_info->map_flags = map_flags; spin_lock_irqsave(&iwpm_mapinfo_lock, flags); if (iwpm_hash_bucket) { hash_bucket_head = get_mapinfo_hash_bucket( &map_info->local_sockaddr, &map_info->mapped_sockaddr); if (hash_bucket_head) { hlist_add_head(&map_info->hlist_node, hash_bucket_head); ret = 0; } } spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); if (!hash_bucket_head) kfree(map_info); return ret; } /** * iwpm_remove_mapinfo - Remove local and mapped IPv4/IPv6 address * info from the hash table * @local_sockaddr: Local ip/tcp address * @mapped_local_addr: Mapped local ip/tcp address * * Returns err code if mapping info is not found in the hash table, * otherwise returns 0 */ int iwpm_remove_mapinfo(struct sockaddr_storage *local_sockaddr, struct sockaddr_storage *mapped_local_addr) { struct hlist_node *tmp_hlist_node; struct hlist_head *hash_bucket_head; struct iwpm_mapping_info *map_info = NULL; unsigned long flags; int ret = -EINVAL; spin_lock_irqsave(&iwpm_mapinfo_lock, flags); if (iwpm_hash_bucket) { hash_bucket_head = get_mapinfo_hash_bucket( local_sockaddr, mapped_local_addr); if (!hash_bucket_head) goto remove_mapinfo_exit; hlist_for_each_entry_safe(map_info, tmp_hlist_node, hash_bucket_head, hlist_node) { if (!iwpm_compare_sockaddr(&map_info->mapped_sockaddr, mapped_local_addr)) { hlist_del_init(&map_info->hlist_node); kfree(map_info); ret = 0; break; } } } remove_mapinfo_exit: spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); return ret; } static void free_hash_bucket(void) { struct hlist_node *tmp_hlist_node; struct iwpm_mapping_info *map_info; unsigned long flags; int i; /* remove all the mapinfo data from the list */ spin_lock_irqsave(&iwpm_mapinfo_lock, flags); for (i = 0; i < IWPM_MAPINFO_HASH_SIZE; i++) { hlist_for_each_entry_safe(map_info, tmp_hlist_node, &iwpm_hash_bucket[i], hlist_node) { hlist_del_init(&map_info->hlist_node); kfree(map_info); } } /* free the hash list */ kfree(iwpm_hash_bucket); iwpm_hash_bucket = NULL; spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); } static void free_reminfo_bucket(void) { struct hlist_node *tmp_hlist_node; struct iwpm_remote_info *rem_info; unsigned long flags; int i; /* remove all the remote info from the list */ spin_lock_irqsave(&iwpm_reminfo_lock, flags); for (i = 0; i < IWPM_REMINFO_HASH_SIZE; i++) { hlist_for_each_entry_safe(rem_info, tmp_hlist_node, &iwpm_reminfo_bucket[i], hlist_node) { hlist_del_init(&rem_info->hlist_node); kfree(rem_info); } } /* free the hash list */ kfree(iwpm_reminfo_bucket); iwpm_reminfo_bucket = NULL; spin_unlock_irqrestore(&iwpm_reminfo_lock, flags); } static struct hlist_head *get_reminfo_hash_bucket(struct sockaddr_storage *, struct sockaddr_storage *); void iwpm_add_remote_info(struct iwpm_remote_info *rem_info) { struct hlist_head *hash_bucket_head; unsigned long flags; spin_lock_irqsave(&iwpm_reminfo_lock, flags); if (iwpm_reminfo_bucket) { hash_bucket_head = get_reminfo_hash_bucket( &rem_info->mapped_loc_sockaddr, &rem_info->mapped_rem_sockaddr); if (hash_bucket_head) hlist_add_head(&rem_info->hlist_node, hash_bucket_head); } spin_unlock_irqrestore(&iwpm_reminfo_lock, flags); } /** * iwpm_get_remote_info - Get the remote connecting peer address info * * @mapped_loc_addr: Mapped local address of the listening peer * @mapped_rem_addr: Mapped remote address of the connecting peer * @remote_addr: To store the remote address of the connecting peer * @nl_client: The index of the netlink client * * The remote address info is retrieved and provided to the client in * the remote_addr. After that it is removed from the hash table */ int iwpm_get_remote_info(struct sockaddr_storage *mapped_loc_addr, struct sockaddr_storage *mapped_rem_addr, struct sockaddr_storage *remote_addr, u8 nl_client) { struct hlist_node *tmp_hlist_node; struct hlist_head *hash_bucket_head; struct iwpm_remote_info *rem_info = NULL; unsigned long flags; int ret = -EINVAL; spin_lock_irqsave(&iwpm_reminfo_lock, flags); if (iwpm_reminfo_bucket) { hash_bucket_head = get_reminfo_hash_bucket( mapped_loc_addr, mapped_rem_addr); if (!hash_bucket_head) goto get_remote_info_exit; hlist_for_each_entry_safe(rem_info, tmp_hlist_node, hash_bucket_head, hlist_node) { if (!iwpm_compare_sockaddr(&rem_info->mapped_loc_sockaddr, mapped_loc_addr) && !iwpm_compare_sockaddr(&rem_info->mapped_rem_sockaddr, mapped_rem_addr)) { memcpy(remote_addr, &rem_info->remote_sockaddr, sizeof(struct sockaddr_storage)); iwpm_print_sockaddr(remote_addr, "get_remote_info: Remote sockaddr:"); hlist_del_init(&rem_info->hlist_node); kfree(rem_info); ret = 0; break; } } } get_remote_info_exit: spin_unlock_irqrestore(&iwpm_reminfo_lock, flags); return ret; } struct iwpm_nlmsg_request *iwpm_get_nlmsg_request(__u32 nlmsg_seq, u8 nl_client, gfp_t gfp) { struct iwpm_nlmsg_request *nlmsg_request; unsigned long flags; nlmsg_request = kzalloc(sizeof(struct iwpm_nlmsg_request), gfp); if (!nlmsg_request) return NULL; spin_lock_irqsave(&iwpm_nlmsg_req_lock, flags); list_add_tail(&nlmsg_request->inprocess_list, &iwpm_nlmsg_req_list); spin_unlock_irqrestore(&iwpm_nlmsg_req_lock, flags); kref_init(&nlmsg_request->kref); kref_get(&nlmsg_request->kref); nlmsg_request->nlmsg_seq = nlmsg_seq; nlmsg_request->nl_client = nl_client; nlmsg_request->request_done = 0; nlmsg_request->err_code = 0; sema_init(&nlmsg_request->sem, 1); down(&nlmsg_request->sem); return nlmsg_request; } void iwpm_free_nlmsg_request(struct kref *kref) { struct iwpm_nlmsg_request *nlmsg_request; unsigned long flags; nlmsg_request = container_of(kref, struct iwpm_nlmsg_request, kref); spin_lock_irqsave(&iwpm_nlmsg_req_lock, flags); list_del_init(&nlmsg_request->inprocess_list); spin_unlock_irqrestore(&iwpm_nlmsg_req_lock, flags); if (!nlmsg_request->request_done) pr_debug("%s Freeing incomplete nlmsg request (seq = %u).\n", __func__, nlmsg_request->nlmsg_seq); kfree(nlmsg_request); } struct iwpm_nlmsg_request *iwpm_find_nlmsg_request(__u32 echo_seq) { struct iwpm_nlmsg_request *nlmsg_request; struct iwpm_nlmsg_request *found_request = NULL; unsigned long flags; spin_lock_irqsave(&iwpm_nlmsg_req_lock, flags); list_for_each_entry(nlmsg_request, &iwpm_nlmsg_req_list, inprocess_list) { if (nlmsg_request->nlmsg_seq == echo_seq) { found_request = nlmsg_request; kref_get(&nlmsg_request->kref); break; } } spin_unlock_irqrestore(&iwpm_nlmsg_req_lock, flags); return found_request; } int iwpm_wait_complete_req(struct iwpm_nlmsg_request *nlmsg_request) { int ret; ret = down_timeout(&nlmsg_request->sem, IWPM_NL_TIMEOUT); if (ret) { ret = -EINVAL; pr_info("%s: Timeout %d sec for netlink request (seq = %u)\n", __func__, (IWPM_NL_TIMEOUT/HZ), nlmsg_request->nlmsg_seq); } else { ret = nlmsg_request->err_code; } kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request); return ret; } int iwpm_get_nlmsg_seq(void) { return atomic_inc_return(&iwpm_admin.nlmsg_seq); } /* valid client */ u32 iwpm_get_registration(u8 nl_client) { return iwpm_admin.reg_list[nl_client]; } /* valid client */ void iwpm_set_registration(u8 nl_client, u32 reg) { iwpm_admin.reg_list[nl_client] = reg; } /* valid client */ u32 iwpm_check_registration(u8 nl_client, u32 reg) { return (iwpm_get_registration(nl_client) & reg); } int iwpm_compare_sockaddr(struct sockaddr_storage *a_sockaddr, struct sockaddr_storage *b_sockaddr) { if (a_sockaddr->ss_family != b_sockaddr->ss_family) return 1; if (a_sockaddr->ss_family == AF_INET) { struct sockaddr_in *a4_sockaddr = (struct sockaddr_in *)a_sockaddr; struct sockaddr_in *b4_sockaddr = (struct sockaddr_in *)b_sockaddr; if (!memcmp(&a4_sockaddr->sin_addr, &b4_sockaddr->sin_addr, sizeof(struct in_addr)) && a4_sockaddr->sin_port == b4_sockaddr->sin_port) return 0; } else if (a_sockaddr->ss_family == AF_INET6) { struct sockaddr_in6 *a6_sockaddr = (struct sockaddr_in6 *)a_sockaddr; struct sockaddr_in6 *b6_sockaddr = (struct sockaddr_in6 *)b_sockaddr; if (!memcmp(&a6_sockaddr->sin6_addr, &b6_sockaddr->sin6_addr, sizeof(struct in6_addr)) && a6_sockaddr->sin6_port == b6_sockaddr->sin6_port) return 0; } else { pr_err("%s: Invalid sockaddr family\n", __func__); } return 1; } struct sk_buff *iwpm_create_nlmsg(u32 nl_op, struct nlmsghdr **nlh, int nl_client) { struct sk_buff *skb = NULL; skb = dev_alloc_skb(IWPM_MSG_SIZE); if (!skb) goto create_nlmsg_exit; if (!(ibnl_put_msg(skb, nlh, 0, 0, nl_client, nl_op, NLM_F_REQUEST))) { pr_warn("%s: Unable to put the nlmsg header\n", __func__); dev_kfree_skb(skb); skb = NULL; } create_nlmsg_exit: return skb; } int iwpm_parse_nlmsg(struct netlink_callback *cb, int policy_max, const struct nla_policy *nlmsg_policy, struct nlattr *nltb[], const char *msg_type) { int nlh_len = 0; int ret; const char *err_str = ""; ret = nlmsg_validate_deprecated(cb->nlh, nlh_len, policy_max - 1, nlmsg_policy, NULL); if (ret) { err_str = "Invalid attribute"; goto parse_nlmsg_error; } ret = nlmsg_parse_deprecated(cb->nlh, nlh_len, nltb, policy_max - 1, nlmsg_policy, NULL); if (ret) { err_str = "Unable to parse the nlmsg"; goto parse_nlmsg_error; } ret = iwpm_validate_nlmsg_attr(nltb, policy_max); if (ret) { err_str = "Invalid NULL attribute"; goto parse_nlmsg_error; } return 0; parse_nlmsg_error: pr_warn("%s: %s (msg type %s ret = %d)\n", __func__, err_str, msg_type, ret); return ret; } void iwpm_print_sockaddr(struct sockaddr_storage *sockaddr, char *msg) { struct sockaddr_in6 *sockaddr_v6; struct sockaddr_in *sockaddr_v4; switch (sockaddr->ss_family) { case AF_INET: sockaddr_v4 = (struct sockaddr_in *)sockaddr; pr_debug("%s IPV4 %pI4: %u(0x%04X)\n", msg, &sockaddr_v4->sin_addr, ntohs(sockaddr_v4->sin_port), ntohs(sockaddr_v4->sin_port)); break; case AF_INET6: sockaddr_v6 = (struct sockaddr_in6 *)sockaddr; pr_debug("%s IPV6 %pI6: %u(0x%04X)\n", msg, &sockaddr_v6->sin6_addr, ntohs(sockaddr_v6->sin6_port), ntohs(sockaddr_v6->sin6_port)); break; default: break; } } static u32 iwpm_ipv6_jhash(struct sockaddr_in6 *ipv6_sockaddr) { u32 ipv6_hash = jhash(&ipv6_sockaddr->sin6_addr, sizeof(struct in6_addr), 0); u32 hash = jhash_2words(ipv6_hash, (__force u32) ipv6_sockaddr->sin6_port, 0); return hash; } static u32 iwpm_ipv4_jhash(struct sockaddr_in *ipv4_sockaddr) { u32 ipv4_hash = jhash(&ipv4_sockaddr->sin_addr, sizeof(struct in_addr), 0); u32 hash = jhash_2words(ipv4_hash, (__force u32) ipv4_sockaddr->sin_port, 0); return hash; } static int get_hash_bucket(struct sockaddr_storage *a_sockaddr, struct sockaddr_storage *b_sockaddr, u32 *hash) { u32 a_hash, b_hash; if (a_sockaddr->ss_family == AF_INET) { a_hash = iwpm_ipv4_jhash((struct sockaddr_in *) a_sockaddr); b_hash = iwpm_ipv4_jhash((struct sockaddr_in *) b_sockaddr); } else if (a_sockaddr->ss_family == AF_INET6) { a_hash = iwpm_ipv6_jhash((struct sockaddr_in6 *) a_sockaddr); b_hash = iwpm_ipv6_jhash((struct sockaddr_in6 *) b_sockaddr); } else { pr_err("%s: Invalid sockaddr family\n", __func__); return -EINVAL; } if (a_hash == b_hash) /* if port mapper isn't available */ *hash = a_hash; else *hash = jhash_2words(a_hash, b_hash, 0); return 0; } static struct hlist_head *get_mapinfo_hash_bucket(struct sockaddr_storage *local_sockaddr, struct sockaddr_storage *mapped_sockaddr) { u32 hash; int ret; ret = get_hash_bucket(local_sockaddr, mapped_sockaddr, &hash); if (ret) return NULL; return &iwpm_hash_bucket[hash & IWPM_MAPINFO_HASH_MASK]; } static struct hlist_head *get_reminfo_hash_bucket(struct sockaddr_storage *mapped_loc_sockaddr, struct sockaddr_storage *mapped_rem_sockaddr) { u32 hash; int ret; ret = get_hash_bucket(mapped_loc_sockaddr, mapped_rem_sockaddr, &hash); if (ret) return NULL; return &iwpm_reminfo_bucket[hash & IWPM_REMINFO_HASH_MASK]; } static int send_mapinfo_num(u32 mapping_num, u8 nl_client, int iwpm_pid) { struct sk_buff *skb = NULL; struct nlmsghdr *nlh; u32 msg_seq; const char *err_str = ""; int ret = -EINVAL; skb = iwpm_create_nlmsg(RDMA_NL_IWPM_MAPINFO_NUM, &nlh, nl_client); if (!skb) { err_str = "Unable to create a nlmsg"; goto mapinfo_num_error; } nlh->nlmsg_seq = iwpm_get_nlmsg_seq(); msg_seq = 0; err_str = "Unable to put attribute of mapinfo number nlmsg"; ret = ibnl_put_attr(skb, nlh, sizeof(u32), &msg_seq, IWPM_NLA_MAPINFO_SEQ); if (ret) goto mapinfo_num_error; ret = ibnl_put_attr(skb, nlh, sizeof(u32), &mapping_num, IWPM_NLA_MAPINFO_SEND_NUM); if (ret) goto mapinfo_num_error; nlmsg_end(skb, nlh); ret = rdma_nl_unicast(&init_net, skb, iwpm_pid); if (ret) { skb = NULL; err_str = "Unable to send a nlmsg"; goto mapinfo_num_error; } pr_debug("%s: Sent mapping number = %u\n", __func__, mapping_num); return 0; mapinfo_num_error: pr_info("%s: %s\n", __func__, err_str); dev_kfree_skb(skb); return ret; } static int send_nlmsg_done(struct sk_buff *skb, u8 nl_client, int iwpm_pid) { struct nlmsghdr *nlh = NULL; int ret = 0; if (!skb) return ret; if (!(ibnl_put_msg(skb, &nlh, 0, 0, nl_client, RDMA_NL_IWPM_MAPINFO, NLM_F_MULTI))) { pr_warn("%s Unable to put NLMSG_DONE\n", __func__); dev_kfree_skb(skb); return -ENOMEM; } nlh->nlmsg_type = NLMSG_DONE; ret = rdma_nl_unicast(&init_net, skb, iwpm_pid); if (ret) pr_warn("%s Unable to send a nlmsg\n", __func__); return ret; } int iwpm_send_mapinfo(u8 nl_client, int iwpm_pid) { struct iwpm_mapping_info *map_info; struct sk_buff *skb = NULL; struct nlmsghdr *nlh; int skb_num = 0, mapping_num = 0; int i = 0, nlmsg_bytes = 0; unsigned long flags; const char *err_str = ""; int ret; skb = dev_alloc_skb(NLMSG_GOODSIZE); if (!skb) { ret = -ENOMEM; err_str = "Unable to allocate skb"; goto send_mapping_info_exit; } skb_num++; spin_lock_irqsave(&iwpm_mapinfo_lock, flags); ret = -EINVAL; for (i = 0; i < IWPM_MAPINFO_HASH_SIZE; i++) { hlist_for_each_entry(map_info, &iwpm_hash_bucket[i], hlist_node) { if (map_info->nl_client != nl_client) continue; nlh = NULL; if (!(ibnl_put_msg(skb, &nlh, 0, 0, nl_client, RDMA_NL_IWPM_MAPINFO, NLM_F_MULTI))) { ret = -ENOMEM; err_str = "Unable to put the nlmsg header"; goto send_mapping_info_unlock; } err_str = "Unable to put attribute of the nlmsg"; ret = ibnl_put_attr(skb, nlh, sizeof(struct sockaddr_storage), &map_info->local_sockaddr, IWPM_NLA_MAPINFO_LOCAL_ADDR); if (ret) goto send_mapping_info_unlock; ret = ibnl_put_attr(skb, nlh, sizeof(struct sockaddr_storage), &map_info->mapped_sockaddr, IWPM_NLA_MAPINFO_MAPPED_ADDR); if (ret) goto send_mapping_info_unlock; if (iwpm_ulib_version > IWPM_UABI_VERSION_MIN) { ret = ibnl_put_attr(skb, nlh, sizeof(u32), &map_info->map_flags, IWPM_NLA_MAPINFO_FLAGS); if (ret) goto send_mapping_info_unlock; } nlmsg_end(skb, nlh); iwpm_print_sockaddr(&map_info->local_sockaddr, "send_mapping_info: Local sockaddr:"); iwpm_print_sockaddr(&map_info->mapped_sockaddr, "send_mapping_info: Mapped local sockaddr:"); mapping_num++; nlmsg_bytes += nlh->nlmsg_len; /* check if all mappings can fit in one skb */ if (NLMSG_GOODSIZE - nlmsg_bytes < nlh->nlmsg_len * 2) { /* and leave room for NLMSG_DONE */ nlmsg_bytes = 0; skb_num++; spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); /* send the skb */ ret = send_nlmsg_done(skb, nl_client, iwpm_pid); skb = NULL; if (ret) { err_str = "Unable to send map info"; goto send_mapping_info_exit; } if (skb_num == IWPM_MAPINFO_SKB_COUNT) { ret = -ENOMEM; err_str = "Insufficient skbs for map info"; goto send_mapping_info_exit; } skb = dev_alloc_skb(NLMSG_GOODSIZE); if (!skb) { ret = -ENOMEM; err_str = "Unable to allocate skb"; goto send_mapping_info_exit; } spin_lock_irqsave(&iwpm_mapinfo_lock, flags); } } } send_mapping_info_unlock: spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); send_mapping_info_exit: if (ret) { pr_warn("%s: %s (ret = %d)\n", __func__, err_str, ret); dev_kfree_skb(skb); return ret; } send_nlmsg_done(skb, nl_client, iwpm_pid); return send_mapinfo_num(mapping_num, nl_client, iwpm_pid); } int iwpm_mapinfo_available(void) { unsigned long flags; int full_bucket = 0, i = 0; spin_lock_irqsave(&iwpm_mapinfo_lock, flags); if (iwpm_hash_bucket) { for (i = 0; i < IWPM_MAPINFO_HASH_SIZE; i++) { if (!hlist_empty(&iwpm_hash_bucket[i])) { full_bucket = 1; break; } } } spin_unlock_irqrestore(&iwpm_mapinfo_lock, flags); return full_bucket; } int iwpm_send_hello(u8 nl_client, int iwpm_pid, u16 abi_version) { struct sk_buff *skb = NULL; struct nlmsghdr *nlh; const char *err_str; int ret = -EINVAL; skb = iwpm_create_nlmsg(RDMA_NL_IWPM_HELLO, &nlh, nl_client); if (!skb) { err_str = "Unable to create a nlmsg"; goto hello_num_error; } nlh->nlmsg_seq = iwpm_get_nlmsg_seq(); err_str = "Unable to put attribute of abi_version into nlmsg"; ret = ibnl_put_attr(skb, nlh, sizeof(u16), &abi_version, IWPM_NLA_HELLO_ABI_VERSION); if (ret) goto hello_num_error; nlmsg_end(skb, nlh); ret = rdma_nl_unicast(&init_net, skb, iwpm_pid); if (ret) { skb = NULL; err_str = "Unable to send a nlmsg"; goto hello_num_error; } pr_debug("%s: Sent hello abi_version = %u\n", __func__, abi_version); return 0; hello_num_error: pr_info("%s: %s\n", __func__, err_str); dev_kfree_skb(skb); return ret; } |
| 55 55 122 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_TRACE_EVENT_H #define _LINUX_TRACE_EVENT_H #include <linux/ring_buffer.h> #include <linux/trace_seq.h> #include <linux/percpu.h> #include <linux/hardirq.h> #include <linux/perf_event.h> #include <linux/tracepoint.h> struct trace_array; struct array_buffer; struct tracer; struct dentry; struct bpf_prog; union bpf_attr; /* Used for event string fields when they are NULL */ #define EVENT_NULL_STR "(null)" const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); #if BITS_PER_LONG == 32 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, unsigned long long flags, const struct trace_print_flags_u64 *flag_array); const char *trace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, const struct trace_print_flags_u64 *symbol_array); #endif const char *trace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr, unsigned int bitmask_size); const char *trace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len, bool concatenate); const char *trace_print_array_seq(struct trace_seq *p, const void *buf, int count, size_t el_size); const char * trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str, int prefix_type, int rowsize, int groupsize, const void *buf, size_t len, bool ascii); struct trace_iterator; struct trace_event; int trace_raw_output_prep(struct trace_iterator *iter, struct trace_event *event); extern __printf(2, 3) void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...); /* Used to find the offset and length of dynamic fields in trace events */ struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 len; u16 offset; #else u16 offset; u16 len; #endif } __packed; /* * The trace entry - the most basic unit of tracing. This is what * is printed in the end as a single line in the trace output, such as: * * bash-15816 [01] 235.197585: idle_cpu <- irq_enter */ struct trace_entry { unsigned short type; unsigned char flags; unsigned char preempt_count; int pid; }; #define TRACE_EVENT_TYPE_MAX \ ((1 << (sizeof(((struct trace_entry *)0)->type) * 8)) - 1) /* * Trace iterator - used by printout routines who present trace * results to users and which routines might sleep, etc: */ struct trace_iterator { struct trace_array *tr; struct tracer *trace; struct array_buffer *array_buffer; void *private; int cpu_file; struct mutex mutex; struct ring_buffer_iter **buffer_iter; unsigned long iter_flags; void *temp; /* temp holder */ unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; cpumask_var_t started; /* Set when the file is closed to prevent new waiters */ bool closed; /* it's true when current open file is snapshot */ bool snapshot; /* The below is zeroed out in pipe_read */ struct trace_seq seq; struct trace_entry *ent; unsigned long lost_events; int leftover; int ent_size; int cpu; u64 ts; loff_t pos; long idx; /* All new field here will be zeroed out in pipe_read */ }; enum trace_iter_flags { TRACE_FILE_LAT_FMT = 1, TRACE_FILE_ANNOTATE = 2, TRACE_FILE_TIME_IN_NS = 4, }; typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter, int flags, struct trace_event *event); struct trace_event_functions { trace_print_func trace; trace_print_func raw; trace_print_func hex; trace_print_func binary; }; struct trace_event { struct hlist_node node; int type; struct trace_event_functions *funcs; }; extern int register_trace_event(struct trace_event *event); extern int unregister_trace_event(struct trace_event *event); /* Return values for print_line callback */ enum print_line_t { TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */ TRACE_TYPE_HANDLED = 1, TRACE_TYPE_UNHANDLED = 2, /* Relay to other output functions */ TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */ }; enum print_line_t trace_handle_return(struct trace_seq *s); static inline void tracing_generic_entry_update(struct trace_entry *entry, unsigned short type, unsigned int trace_ctx) { entry->preempt_count = trace_ctx & 0xff; entry->pid = current->pid; entry->type = type; entry->flags = trace_ctx >> 16; } unsigned int tracing_gen_ctx_irq_test(unsigned int irqs_status); enum trace_flag_type { TRACE_FLAG_IRQS_OFF = 0x01, TRACE_FLAG_IRQS_NOSUPPORT = 0x02, TRACE_FLAG_NEED_RESCHED = 0x04, TRACE_FLAG_HARDIRQ = 0x08, TRACE_FLAG_SOFTIRQ = 0x10, TRACE_FLAG_PREEMPT_RESCHED = 0x20, TRACE_FLAG_NMI = 0x40, TRACE_FLAG_BH_OFF = 0x80, }; #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags) { unsigned int irq_status = irqs_disabled_flags(irqflags) ? TRACE_FLAG_IRQS_OFF : 0; return tracing_gen_ctx_irq_test(irq_status); } static inline unsigned int tracing_gen_ctx(void) { unsigned long irqflags; local_save_flags(irqflags); return tracing_gen_ctx_flags(irqflags); } #else static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags) { return tracing_gen_ctx_irq_test(TRACE_FLAG_IRQS_NOSUPPORT); } static inline unsigned int tracing_gen_ctx(void) { return tracing_gen_ctx_irq_test(TRACE_FLAG_IRQS_NOSUPPORT); } #endif static inline unsigned int tracing_gen_ctx_dec(void) { unsigned int trace_ctx; trace_ctx = tracing_gen_ctx(); /* * Subtract one from the preemption counter if preemption is enabled, * see trace_event_buffer_reserve()for details. */ if (IS_ENABLED(CONFIG_PREEMPTION)) trace_ctx--; return trace_ctx; } struct trace_event_file; struct ring_buffer_event * trace_event_buffer_lock_reserve(struct trace_buffer **current_buffer, struct trace_event_file *trace_file, int type, unsigned long len, unsigned int trace_ctx); #define TRACE_RECORD_CMDLINE BIT(0) #define TRACE_RECORD_TGID BIT(1) void tracing_record_taskinfo(struct task_struct *task, int flags); void tracing_record_taskinfo_sched_switch(struct task_struct *prev, struct task_struct *next, int flags); void tracing_record_cmdline(struct task_struct *task); void tracing_record_tgid(struct task_struct *task); int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...) __printf(3, 4); struct event_filter; enum trace_reg { TRACE_REG_REGISTER, TRACE_REG_UNREGISTER, #ifdef CONFIG_PERF_EVENTS TRACE_REG_PERF_REGISTER, TRACE_REG_PERF_UNREGISTER, TRACE_REG_PERF_OPEN, TRACE_REG_PERF_CLOSE, /* * These (ADD/DEL) use a 'boolean' return value, where 1 (true) means a * custom action was taken and the default action is not to be * performed. */ TRACE_REG_PERF_ADD, TRACE_REG_PERF_DEL, #endif }; struct trace_event_call; #define TRACE_FUNCTION_TYPE ((const char *)~0UL) struct trace_event_fields { const char *type; union { struct { const char *name; const int size; const int align; const int is_signed; const int filter_type; const int len; }; int (*define_fields)(struct trace_event_call *); }; }; struct trace_event_class { const char *system; void *probe; #ifdef CONFIG_PERF_EVENTS void *perf_probe; #endif int (*reg)(struct trace_event_call *event, enum trace_reg type, void *data); struct trace_event_fields *fields_array; struct list_head *(*get_fields)(struct trace_event_call *); struct list_head fields; int (*raw_init)(struct trace_event_call *); }; extern int trace_event_reg(struct trace_event_call *event, enum trace_reg type, void *data); struct trace_event_buffer { struct trace_buffer *buffer; struct ring_buffer_event *event; struct trace_event_file *trace_file; void *entry; unsigned int trace_ctx; struct pt_regs *regs; }; void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer, struct trace_event_file *trace_file, unsigned long len); void trace_event_buffer_commit(struct trace_event_buffer *fbuffer); enum { TRACE_EVENT_FL_FILTERED_BIT, TRACE_EVENT_FL_CAP_ANY_BIT, TRACE_EVENT_FL_NO_SET_FILTER_BIT, TRACE_EVENT_FL_IGNORE_ENABLE_BIT, TRACE_EVENT_FL_TRACEPOINT_BIT, TRACE_EVENT_FL_DYNAMIC_BIT, TRACE_EVENT_FL_KPROBE_BIT, TRACE_EVENT_FL_UPROBE_BIT, TRACE_EVENT_FL_EPROBE_BIT, TRACE_EVENT_FL_FPROBE_BIT, TRACE_EVENT_FL_CUSTOM_BIT, }; /* * Event flags: * FILTERED - The event has a filter attached * CAP_ANY - Any user can enable for perf * NO_SET_FILTER - Set when filter has error and is to be ignored * IGNORE_ENABLE - For trace internal events, do not enable with debugfs file * TRACEPOINT - Event is a tracepoint * DYNAMIC - Event is a dynamic event (created at run time) * KPROBE - Event is a kprobe * UPROBE - Event is a uprobe * EPROBE - Event is an event probe * FPROBE - Event is an function probe * CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint) * This is set when the custom event has not been attached * to a tracepoint yet, then it is cleared when it is. */ enum { TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT), TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), TRACE_EVENT_FL_DYNAMIC = (1 << TRACE_EVENT_FL_DYNAMIC_BIT), TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT), TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT), TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT), TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT), TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT), }; #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE) struct trace_event_call { struct list_head list; struct trace_event_class *class; union { char *name; /* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */ struct tracepoint *tp; }; struct trace_event event; char *print_fmt; struct event_filter *filter; /* * Static events can disappear with modules, * where as dynamic ones need their own ref count. */ union { void *module; atomic_t refcnt; }; void *data; /* See the TRACE_EVENT_FL_* flags above */ int flags; /* static flags of different events */ #ifdef CONFIG_PERF_EVENTS int perf_refcount; struct hlist_head __percpu *perf_events; struct bpf_prog_array __rcu *prog_array; int (*perf_perm)(struct trace_event_call *, struct perf_event *); #endif }; #ifdef CONFIG_DYNAMIC_EVENTS bool trace_event_dyn_try_get_ref(struct trace_event_call *call); void trace_event_dyn_put_ref(struct trace_event_call *call); bool trace_event_dyn_busy(struct trace_event_call *call); #else static inline bool trace_event_dyn_try_get_ref(struct trace_event_call *call) { /* Without DYNAMIC_EVENTS configured, nothing should be calling this */ return false; } static inline void trace_event_dyn_put_ref(struct trace_event_call *call) { } static inline bool trace_event_dyn_busy(struct trace_event_call *call) { /* Nothing should call this without DYNAIMIC_EVENTS configured. */ return true; } #endif static inline bool trace_event_try_get_ref(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_DYNAMIC) return trace_event_dyn_try_get_ref(call); else return try_module_get(call->module); } static inline void trace_event_put_ref(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_DYNAMIC) trace_event_dyn_put_ref(call); else module_put(call->module); } #ifdef CONFIG_PERF_EVENTS static inline bool bpf_prog_array_valid(struct trace_event_call *call) { /* * This inline function checks whether call->prog_array * is valid or not. The function is called in various places, * outside rcu_read_lock/unlock, as a heuristic to speed up execution. * * If this function returns true, and later call->prog_array * becomes false inside rcu_read_lock/unlock region, * we bail out then. If this function return false, * there is a risk that we might miss a few events if the checking * were delayed until inside rcu_read_lock/unlock region and * call->prog_array happened to become non-NULL then. * * Here, READ_ONCE() is used instead of rcu_access_pointer(). * rcu_access_pointer() requires the actual definition of * "struct bpf_prog_array" while READ_ONCE() only needs * a declaration of the same type. */ return !!READ_ONCE(call->prog_array); } #endif static inline const char * trace_event_name(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_CUSTOM) return call->name; else if (call->flags & TRACE_EVENT_FL_TRACEPOINT) return call->tp ? call->tp->name : NULL; else return call->name; } static inline struct list_head * trace_get_fields(struct trace_event_call *event_call) { if (!event_call->class->get_fields) return &event_call->class->fields; return event_call->class->get_fields(event_call); } struct trace_subsystem_dir; enum { EVENT_FILE_FL_ENABLED_BIT, EVENT_FILE_FL_RECORDED_CMD_BIT, EVENT_FILE_FL_RECORDED_TGID_BIT, EVENT_FILE_FL_FILTERED_BIT, EVENT_FILE_FL_NO_SET_FILTER_BIT, EVENT_FILE_FL_SOFT_MODE_BIT, EVENT_FILE_FL_SOFT_DISABLED_BIT, EVENT_FILE_FL_TRIGGER_MODE_BIT, EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, EVENT_FILE_FL_FREED_BIT, }; extern struct trace_event_file *trace_get_event_file(const char *instance, const char *system, const char *event); extern void trace_put_event_file(struct trace_event_file *file); #define MAX_DYNEVENT_CMD_LEN (2048) enum dynevent_type { DYNEVENT_TYPE_SYNTH = 1, DYNEVENT_TYPE_KPROBE, DYNEVENT_TYPE_NONE, }; struct dynevent_cmd; typedef int (*dynevent_create_fn_t)(struct dynevent_cmd *cmd); struct dynevent_cmd { struct seq_buf seq; const char *event_name; unsigned int n_fields; enum dynevent_type type; dynevent_create_fn_t run_command; void *private_data; }; extern int dynevent_create(struct dynevent_cmd *cmd); extern int synth_event_delete(const char *name); extern void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen); extern int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name, struct module *mod, ...); #define synth_event_gen_cmd_start(cmd, name, mod, ...) \ __synth_event_gen_cmd_start(cmd, name, mod, ## __VA_ARGS__, NULL) struct synth_field_desc { const char *type; const char *name; }; extern int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name, struct module *mod, struct synth_field_desc *fields, unsigned int n_fields); extern int synth_event_create(const char *name, struct synth_field_desc *fields, unsigned int n_fields, struct module *mod); extern int synth_event_add_field(struct dynevent_cmd *cmd, const char *type, const char *name); extern int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name); extern int synth_event_add_fields(struct dynevent_cmd *cmd, struct synth_field_desc *fields, unsigned int n_fields); #define synth_event_gen_cmd_end(cmd) \ dynevent_create(cmd) struct synth_event; struct synth_event_trace_state { struct trace_event_buffer fbuffer; struct synth_trace_event *entry; struct trace_buffer *buffer; struct synth_event *event; unsigned int cur_field; unsigned int n_u64; bool disabled; bool add_next; bool add_name; }; extern int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...); extern int synth_event_trace_array(struct trace_event_file *file, u64 *vals, unsigned int n_vals); extern int synth_event_trace_start(struct trace_event_file *file, struct synth_event_trace_state *trace_state); extern int synth_event_add_next_val(u64 val, struct synth_event_trace_state *trace_state); extern int synth_event_add_val(const char *field_name, u64 val, struct synth_event_trace_state *trace_state); extern int synth_event_trace_end(struct synth_event_trace_state *trace_state); extern int kprobe_event_delete(const char *name); extern void kprobe_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen); #define kprobe_event_gen_cmd_start(cmd, name, loc, ...) \ __kprobe_event_gen_cmd_start(cmd, false, name, loc, ## __VA_ARGS__, NULL) #define kretprobe_event_gen_cmd_start(cmd, name, loc, ...) \ __kprobe_event_gen_cmd_start(cmd, true, name, loc, ## __VA_ARGS__, NULL) extern int __kprobe_event_gen_cmd_start(struct dynevent_cmd *cmd, bool kretprobe, const char *name, const char *loc, ...); #define kprobe_event_add_fields(cmd, ...) \ __kprobe_event_add_fields(cmd, ## __VA_ARGS__, NULL) #define kprobe_event_add_field(cmd, field) \ __kprobe_event_add_fields(cmd, field, NULL) extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); #define kprobe_event_gen_cmd_end(cmd) \ dynevent_create(cmd) #define kretprobe_event_gen_cmd_end(cmd) \ dynevent_create(cmd) /* * Event file flags: * ENABLED - The event is enabled * RECORDED_CMD - The comms should be recorded at sched_switch * RECORDED_TGID - The tgids should be recorded at sched_switch * FILTERED - The event has a filter attached * NO_SET_FILTER - Set when filter has error and is to be ignored * SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED * SOFT_DISABLED - When set, do not trace the event (even though its * tracepoint may be enabled) * TRIGGER_MODE - When set, invoke the triggers associated with the event * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), EVENT_FILE_FL_RECORDED_CMD = (1 << EVENT_FILE_FL_RECORDED_CMD_BIT), EVENT_FILE_FL_RECORDED_TGID = (1 << EVENT_FILE_FL_RECORDED_TGID_BIT), EVENT_FILE_FL_FILTERED = (1 << EVENT_FILE_FL_FILTERED_BIT), EVENT_FILE_FL_NO_SET_FILTER = (1 << EVENT_FILE_FL_NO_SET_FILTER_BIT), EVENT_FILE_FL_SOFT_MODE = (1 << EVENT_FILE_FL_SOFT_MODE_BIT), EVENT_FILE_FL_SOFT_DISABLED = (1 << EVENT_FILE_FL_SOFT_DISABLED_BIT), EVENT_FILE_FL_TRIGGER_MODE = (1 << EVENT_FILE_FL_TRIGGER_MODE_BIT), EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), }; struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; struct eventfs_inode *ei; struct trace_array *tr; struct trace_subsystem_dir *system; struct list_head triggers; /* * 32 bit flags: * bit 0: enabled * bit 1: enabled cmd record * bit 2: enable/disable with the soft disable bit * bit 3: soft disabled * bit 4: trigger enabled * * Note: The bits must be set atomically to prevent races * from other writers. Reads of flags do not need to be in * sync as they occur in critical sections. But the way flags * is currently used, these changes do not affect the code * except that when a change is made, it may have a slight * delay in propagating the changes to other CPUs due to * caching and such. Which is mostly OK ;-) */ unsigned long flags; refcount_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; #define __TRACE_EVENT_FLAGS(name, value) \ static int __init trace_init_flags_##name(void) \ { \ event_##name.flags |= value; \ return 0; \ } \ early_initcall(trace_init_flags_##name); #define __TRACE_EVENT_PERF_PERM(name, expr...) \ static int perf_perm_##name(struct trace_event_call *tp_event, \ struct perf_event *p_event) \ { \ return ({ expr; }); \ } \ static int __init trace_init_perf_perm_##name(void) \ { \ event_##name.perf_perm = &perf_perm_##name; \ return 0; \ } \ early_initcall(trace_init_perf_perm_##name); #define PERF_MAX_TRACE_SIZE 8192 #define MAX_FILTER_STR_VAL 256U /* Should handle KSYM_SYMBOL_LEN */ enum event_trigger_type { ETT_NONE = (0), ETT_TRACE_ONOFF = (1 << 0), ETT_SNAPSHOT = (1 << 1), ETT_STACKTRACE = (1 << 2), ETT_EVENT_ENABLE = (1 << 3), ETT_EVENT_HIST = (1 << 4), ETT_HIST_ENABLE = (1 << 5), ETT_EVENT_EPROBE = (1 << 6), }; extern int filter_match_preds(struct event_filter *filter, void *rec); extern enum event_trigger_type event_triggers_call(struct trace_event_file *file, struct trace_buffer *buffer, void *rec, struct ring_buffer_event *event); extern void event_triggers_post_call(struct trace_event_file *file, enum event_trigger_type tt); bool trace_event_ignore_this_pid(struct trace_event_file *trace_file); bool __trace_trigger_soft_disabled(struct trace_event_file *file); /** * trace_trigger_soft_disabled - do triggers and test if soft disabled * @file: The file pointer of the event to test * * If any triggers without filters are attached to this event, they * will be called here. If the event is soft disabled and has no * triggers that require testing the fields, it will return true, * otherwise false. */ static __always_inline bool trace_trigger_soft_disabled(struct trace_event_file *file) { unsigned long eflags = file->flags; if (likely(!(eflags & (EVENT_FILE_FL_TRIGGER_MODE | EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_PID_FILTER)))) return false; if (likely(eflags & EVENT_FILE_FL_TRIGGER_COND)) return false; return __trace_trigger_soft_disabled(file); } #ifdef CONFIG_BPF_EVENTS unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie); void perf_event_detach_bpf_prog(struct perf_event *event); int perf_event_query_prog_array(struct perf_event *event, void __user *info); struct bpf_raw_tp_link; int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name); void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr, unsigned long *missed); int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { return 1; } static inline int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie) { return -EOPNOTSUPP; } static inline void perf_event_detach_bpf_prog(struct perf_event *event) { } static inline int perf_event_query_prog_array(struct perf_event *event, void __user *info) { return -EOPNOTSUPP; } struct bpf_raw_tp_link; static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link) { return -EOPNOTSUPP; } static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link) { return -EOPNOTSUPP; } static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name) { return NULL; } static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) { } static inline int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr, unsigned long *missed) { return -EOPNOTSUPP; } static inline int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } static inline int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } #endif enum { FILTER_OTHER = 0, FILTER_STATIC_STRING, FILTER_DYN_STRING, FILTER_RDYN_STRING, FILTER_PTR_STRING, FILTER_TRACE_FN, FILTER_CPUMASK, FILTER_COMM, FILTER_CPU, FILTER_STACKTRACE, }; extern int trace_event_raw_init(struct trace_event_call *call); extern int trace_define_field(struct trace_event_call *call, const char *type, const char *name, int offset, int size, int is_signed, int filter_type); extern int trace_add_event_call(struct trace_event_call *call); extern int trace_remove_event_call(struct trace_event_call *call); extern int trace_event_get_offsets(struct trace_event_call *call); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); int trace_array_set_clr_event(struct trace_array *tr, const char *system, const char *event, bool enable); /* * The double __builtin_constant_p is because gcc will give us an error * if we try to allocate the static variable to fmt if it is not a * constant. Even with the outer if statement optimizing out. */ #define event_trace_printk(ip, fmt, args...) \ do { \ __trace_printk_check_format(fmt, ##args); \ tracing_record_cmdline(current); \ if (__builtin_constant_p(fmt)) { \ static const char *trace_printk_fmt \ __section("__trace_printk_fmt") = \ __builtin_constant_p(fmt) ? fmt : NULL; \ \ __trace_bprintk(ip, trace_printk_fmt, ##args); \ } else \ __trace_printk(ip, fmt, ##args); \ } while (0) #ifdef CONFIG_PERF_EVENTS struct perf_event; DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); extern int perf_trace_init(struct perf_event *event); extern void perf_trace_destroy(struct perf_event *event); extern int perf_trace_add(struct perf_event *event, int flags); extern void perf_trace_del(struct perf_event *event, int flags); #ifdef CONFIG_KPROBE_EVENTS extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe); extern void perf_kprobe_destroy(struct perf_event *event); extern int bpf_get_kprobe_info(const struct perf_event *event, u32 *fd_type, const char **symbol, u64 *probe_offset, u64 *probe_addr, unsigned long *missed, bool perf_type_tracepoint); #endif #ifdef CONFIG_UPROBE_EVENTS extern int perf_uprobe_init(struct perf_event *event, unsigned long ref_ctr_offset, bool is_retprobe); extern void perf_uprobe_destroy(struct perf_event *event); extern int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type, const char **filename, u64 *probe_offset, u64 *probe_addr, bool perf_type_tracepoint); #endif extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); extern void ftrace_profile_free_filter(struct perf_event *event); void perf_trace_buf_update(void *record, u16 type); void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp); int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie); void perf_event_free_bpf_prog(struct perf_event *event); void bpf_trace_run1(struct bpf_raw_tp_link *link, u64 arg1); void bpf_trace_run2(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2); void bpf_trace_run3(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3); void bpf_trace_run4(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4); void bpf_trace_run5(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5); void bpf_trace_run6(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6); void bpf_trace_run7(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7); void bpf_trace_run8(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8); void bpf_trace_run9(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9); void bpf_trace_run10(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10); void bpf_trace_run11(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10, u64 arg11); void bpf_trace_run12(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12); void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx, struct trace_event_call *call, u64 count, struct pt_regs *regs, struct hlist_head *head, struct task_struct *task); static inline void perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type, u64 count, struct pt_regs *regs, void *head, struct task_struct *task) { perf_tp_event(type, count, raw_data, size, regs, head, rctx, task); } #endif #define TRACE_EVENT_STR_MAX 512 /* * gcc warns that you can not use a va_list in an inlined * function. But lets me make it into a macro :-/ */ #define __trace_event_vstr_len(fmt, va) \ ({ \ va_list __ap; \ int __ret; \ \ va_copy(__ap, *(va)); \ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ va_end(__ap); \ \ min(__ret, TRACE_EVENT_STR_MAX); \ }) #endif /* _LINUX_TRACE_EVENT_H */ /* * Note: we keep the TRACE_CUSTOM_EVENT outside the include file ifdef protection. * This is due to the way trace custom events work. If a file includes two * trace event headers under one "CREATE_CUSTOM_TRACE_EVENTS" the first include * will override the TRACE_CUSTOM_EVENT and break the second include. */ #ifndef TRACE_CUSTOM_EVENT #define DECLARE_CUSTOM_EVENT_CLASS(name, proto, args, tstruct, assign, print) #define DEFINE_CUSTOM_EVENT(template, name, proto, args) #define TRACE_CUSTOM_EVENT(name, proto, args, struct, assign, print) #endif /* ifdef TRACE_CUSTOM_EVENT (see note above) */ |
| 36 11 3 12 20 31 31 31 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | /* * Copyright (c) 2006, 2017 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/in.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <linux/ipv6.h> #include "rds_single_path.h" #include "rds.h" #include "loop.h" static DEFINE_SPINLOCK(loop_conns_lock); static LIST_HEAD(loop_conns); static atomic_t rds_loop_unloading = ATOMIC_INIT(0); static void rds_loop_set_unloading(void) { atomic_set(&rds_loop_unloading, 1); } static bool rds_loop_is_unloading(struct rds_connection *conn) { return atomic_read(&rds_loop_unloading) != 0; } /* * This 'loopback' transport is a special case for flows that originate * and terminate on the same machine. * * Connection build-up notices if the destination address is thought of * as a local address by a transport. At that time it decides to use the * loopback transport instead of the bound transport of the sending socket. * * The loopback transport's sending path just hands the sent rds_message * straight to the receiving path via an embedded rds_incoming. */ /* * Usually a message transits both the sender and receiver's conns as it * flows to the receiver. In the loopback case, though, the receive path * is handed the sending conn so the sense of the addresses is reversed. */ static int rds_loop_xmit(struct rds_connection *conn, struct rds_message *rm, unsigned int hdr_off, unsigned int sg, unsigned int off) { struct scatterlist *sgp = &rm->data.op_sg[sg]; int ret = sizeof(struct rds_header) + be32_to_cpu(rm->m_inc.i_hdr.h_len); /* Do not send cong updates to loopback */ if (rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); ret = min_t(int, ret, sgp->length - conn->c_xmit_data_off); goto out; } BUG_ON(hdr_off || sg || off); rds_inc_init(&rm->m_inc, conn, &conn->c_laddr); /* For the embedded inc. Matching put is in loop_inc_free() */ rds_message_addref(rm); rds_recv_incoming(conn, &conn->c_laddr, &conn->c_faddr, &rm->m_inc, GFP_KERNEL); rds_send_drop_acked(conn, be64_to_cpu(rm->m_inc.i_hdr.h_sequence), NULL); rds_inc_put(&rm->m_inc); out: return ret; } /* * See rds_loop_xmit(). Since our inc is embedded in the rm, we * make sure the rm lives at least until the inc is done. */ static void rds_loop_inc_free(struct rds_incoming *inc) { struct rds_message *rm = container_of(inc, struct rds_message, m_inc); rds_message_put(rm); } /* we need to at least give the thread something to succeed */ static int rds_loop_recv_path(struct rds_conn_path *cp) { return 0; } struct rds_loop_connection { struct list_head loop_node; struct rds_connection *conn; }; /* * Even the loopback transport needs to keep track of its connections, * so it can call rds_conn_destroy() on them on exit. N.B. there are * 1+ loopback addresses (127.*.*.*) so it's not a bug to have * multiple loopback conns allocated, although rather useless. */ static int rds_loop_conn_alloc(struct rds_connection *conn, gfp_t gfp) { struct rds_loop_connection *lc; unsigned long flags; lc = kzalloc(sizeof(struct rds_loop_connection), gfp); if (!lc) return -ENOMEM; INIT_LIST_HEAD(&lc->loop_node); lc->conn = conn; conn->c_transport_data = lc; spin_lock_irqsave(&loop_conns_lock, flags); list_add_tail(&lc->loop_node, &loop_conns); spin_unlock_irqrestore(&loop_conns_lock, flags); return 0; } static void rds_loop_conn_free(void *arg) { struct rds_loop_connection *lc = arg; unsigned long flags; rdsdebug("lc %p\n", lc); spin_lock_irqsave(&loop_conns_lock, flags); list_del(&lc->loop_node); spin_unlock_irqrestore(&loop_conns_lock, flags); kfree(lc); } static int rds_loop_conn_path_connect(struct rds_conn_path *cp) { rds_connect_complete(cp->cp_conn); return 0; } static void rds_loop_conn_path_shutdown(struct rds_conn_path *cp) { } void rds_loop_exit(void) { struct rds_loop_connection *lc, *_lc; LIST_HEAD(tmp_list); rds_loop_set_unloading(); synchronize_rcu(); /* avoid calling conn_destroy with irqs off */ spin_lock_irq(&loop_conns_lock); list_splice(&loop_conns, &tmp_list); INIT_LIST_HEAD(&loop_conns); spin_unlock_irq(&loop_conns_lock); list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) { WARN_ON(lc->conn->c_passive); rds_conn_destroy(lc->conn); } } static void rds_loop_kill_conns(struct net *net) { struct rds_loop_connection *lc, *_lc; LIST_HEAD(tmp_list); spin_lock_irq(&loop_conns_lock); list_for_each_entry_safe(lc, _lc, &loop_conns, loop_node) { struct net *c_net = read_pnet(&lc->conn->c_net); if (net != c_net) continue; list_move_tail(&lc->loop_node, &tmp_list); } spin_unlock_irq(&loop_conns_lock); list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) { WARN_ON(lc->conn->c_passive); rds_conn_destroy(lc->conn); } } static void __net_exit rds_loop_exit_net(struct net *net) { rds_loop_kill_conns(net); } static struct pernet_operations rds_loop_net_ops = { .exit = rds_loop_exit_net, }; int rds_loop_net_init(void) { return register_pernet_device(&rds_loop_net_ops); } void rds_loop_net_exit(void) { unregister_pernet_device(&rds_loop_net_ops); } /* * This is missing .xmit_* because loop doesn't go through generic * rds_send_xmit() and doesn't call rds_recv_incoming(). .listen_stop and * .laddr_check are missing because transport.c doesn't iterate over * rds_loop_transport. */ struct rds_transport rds_loop_transport = { .xmit = rds_loop_xmit, .recv_path = rds_loop_recv_path, .conn_alloc = rds_loop_conn_alloc, .conn_free = rds_loop_conn_free, .conn_path_connect = rds_loop_conn_path_connect, .conn_path_shutdown = rds_loop_conn_path_shutdown, .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", .t_type = RDS_TRANS_LOOP, .t_unloading = rds_loop_is_unloading, }; |
| 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | // SPDX-License-Identifier: GPL-2.0 /* * (C) 2001 Clemson University and The University of Chicago * * See COPYING in top-level directory. */ #include "protocol.h" #include "orangefs-kernel.h" /* tags assigned to kernel upcall operations */ static __u64 next_tag_value; static DEFINE_SPINLOCK(next_tag_value_lock); /* the orangefs memory caches */ /* a cache for orangefs upcall/downcall operations */ static struct kmem_cache *op_cache; int op_cache_initialize(void) { op_cache = kmem_cache_create("orangefs_op_cache", sizeof(struct orangefs_kernel_op_s), 0, 0, NULL); if (!op_cache) { gossip_err("Cannot create orangefs_op_cache\n"); return -ENOMEM; } /* initialize our atomic tag counter */ spin_lock(&next_tag_value_lock); next_tag_value = 100; spin_unlock(&next_tag_value_lock); return 0; } int op_cache_finalize(void) { kmem_cache_destroy(op_cache); return 0; } char *get_opname_string(struct orangefs_kernel_op_s *new_op) { if (new_op) { __s32 type = new_op->upcall.type; if (type == ORANGEFS_VFS_OP_FILE_IO) return "OP_FILE_IO"; else if (type == ORANGEFS_VFS_OP_LOOKUP) return "OP_LOOKUP"; else if (type == ORANGEFS_VFS_OP_CREATE) return "OP_CREATE"; else if (type == ORANGEFS_VFS_OP_GETATTR) return "OP_GETATTR"; else if (type == ORANGEFS_VFS_OP_REMOVE) return "OP_REMOVE"; else if (type == ORANGEFS_VFS_OP_MKDIR) return "OP_MKDIR"; else if (type == ORANGEFS_VFS_OP_READDIR) return "OP_READDIR"; else if (type == ORANGEFS_VFS_OP_READDIRPLUS) return "OP_READDIRPLUS"; else if (type == ORANGEFS_VFS_OP_SETATTR) return "OP_SETATTR"; else if (type == ORANGEFS_VFS_OP_SYMLINK) return "OP_SYMLINK"; else if (type == ORANGEFS_VFS_OP_RENAME) return "OP_RENAME"; else if (type == ORANGEFS_VFS_OP_STATFS) return "OP_STATFS"; else if (type == ORANGEFS_VFS_OP_TRUNCATE) return "OP_TRUNCATE"; else if (type == ORANGEFS_VFS_OP_RA_FLUSH) return "OP_RA_FLUSH"; else if (type == ORANGEFS_VFS_OP_FS_MOUNT) return "OP_FS_MOUNT"; else if (type == ORANGEFS_VFS_OP_FS_UMOUNT) return "OP_FS_UMOUNT"; else if (type == ORANGEFS_VFS_OP_GETXATTR) return "OP_GETXATTR"; else if (type == ORANGEFS_VFS_OP_SETXATTR) return "OP_SETXATTR"; else if (type == ORANGEFS_VFS_OP_LISTXATTR) return "OP_LISTXATTR"; else if (type == ORANGEFS_VFS_OP_REMOVEXATTR) return "OP_REMOVEXATTR"; else if (type == ORANGEFS_VFS_OP_PARAM) return "OP_PARAM"; else if (type == ORANGEFS_VFS_OP_PERF_COUNT) return "OP_PERF_COUNT"; else if (type == ORANGEFS_VFS_OP_CANCEL) return "OP_CANCEL"; else if (type == ORANGEFS_VFS_OP_FSYNC) return "OP_FSYNC"; else if (type == ORANGEFS_VFS_OP_FSKEY) return "OP_FSKEY"; else if (type == ORANGEFS_VFS_OP_FEATURES) return "OP_FEATURES"; } return "OP_UNKNOWN?"; } void orangefs_new_tag(struct orangefs_kernel_op_s *op) { spin_lock(&next_tag_value_lock); op->tag = next_tag_value++; if (next_tag_value == 0) next_tag_value = 100; spin_unlock(&next_tag_value_lock); } struct orangefs_kernel_op_s *op_alloc(__s32 type) { struct orangefs_kernel_op_s *new_op = NULL; new_op = kmem_cache_zalloc(op_cache, GFP_KERNEL); if (new_op) { INIT_LIST_HEAD(&new_op->list); spin_lock_init(&new_op->lock); init_completion(&new_op->waitq); new_op->upcall.type = ORANGEFS_VFS_OP_INVALID; new_op->downcall.type = ORANGEFS_VFS_OP_INVALID; new_op->downcall.status = -1; new_op->op_state = OP_VFS_STATE_UNKNOWN; /* initialize the op specific tag and upcall credentials */ orangefs_new_tag(new_op); new_op->upcall.type = type; new_op->attempts = 0; gossip_debug(GOSSIP_CACHE_DEBUG, "Alloced OP (%p: %llu %s)\n", new_op, llu(new_op->tag), get_opname_string(new_op)); new_op->upcall.uid = from_kuid(&init_user_ns, current_fsuid()); new_op->upcall.gid = from_kgid(&init_user_ns, current_fsgid()); } else { gossip_err("op_alloc: kmem_cache_zalloc failed!\n"); } return new_op; } void op_release(struct orangefs_kernel_op_s *orangefs_op) { if (orangefs_op) { gossip_debug(GOSSIP_CACHE_DEBUG, "Releasing OP (%p: %llu)\n", orangefs_op, llu(orangefs_op->tag)); kmem_cache_free(op_cache, orangefs_op); } else { gossip_err("NULL pointer in op_release\n"); } } |
| 15 13 21 2 4 15 16 100 22 22 93 22 1 1 1 1 1 116 117 2 110 120 121 90 1 5 121 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 | // SPDX-License-Identifier: GPL-2.0 #include <linux/syscalls.h> #include <linux/slab.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/mount.h> #include <linux/namei.h> #include <linux/exportfs.h> #include <linux/fs_struct.h> #include <linux/fsnotify.h> #include <linux/personality.h> #include <linux/uaccess.h> #include <linux/compat.h> #include "internal.h" #include "mount.h" static long do_sys_name_to_handle(const struct path *path, struct file_handle __user *ufh, int __user *mnt_id, int fh_flags) { long retval; struct file_handle f_handle; int handle_dwords, handle_bytes; struct file_handle *handle = NULL; /* * We need to make sure whether the file system support decoding of * the file handle if decodeable file handle was requested. */ if (!exportfs_can_encode_fh(path->dentry->d_sb->s_export_op, fh_flags)) return -EOPNOTSUPP; if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) return -EFAULT; if (f_handle.handle_bytes > MAX_HANDLE_SZ) return -EINVAL; handle = kzalloc(struct_size(handle, f_handle, f_handle.handle_bytes), GFP_KERNEL); if (!handle) return -ENOMEM; /* convert handle size to multiple of sizeof(u32) */ handle_dwords = f_handle.handle_bytes >> 2; /* we ask for a non connectable maybe decodeable file handle */ retval = exportfs_encode_fh(path->dentry, (struct fid *)handle->f_handle, &handle_dwords, fh_flags); handle->handle_type = retval; /* convert handle size to bytes */ handle_bytes = handle_dwords * sizeof(u32); handle->handle_bytes = handle_bytes; if ((handle->handle_bytes > f_handle.handle_bytes) || (retval == FILEID_INVALID) || (retval < 0)) { /* As per old exportfs_encode_fh documentation * we could return ENOSPC to indicate overflow * But file system returned 255 always. So handle * both the values */ if (retval == FILEID_INVALID || retval == -ENOSPC) retval = -EOVERFLOW; /* * set the handle size to zero so we copy only * non variable part of the file_handle */ handle_bytes = 0; } else retval = 0; /* copy the mount id */ if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) || copy_to_user(ufh, handle, struct_size(handle, f_handle, handle_bytes))) retval = -EFAULT; kfree(handle); return retval; } /** * sys_name_to_handle_at: convert name to handle * @dfd: directory relative to which name is interpreted if not absolute * @name: name that should be converted to handle. * @handle: resulting file handle * @mnt_id: mount id of the file system containing the file * @flag: flag value to indicate whether to follow symlink or not * and whether a decodable file handle is required. * * @handle->handle_size indicate the space available to store the * variable part of the file handle in bytes. If there is not * enough space, the field is updated to return the minimum * value required. */ SYSCALL_DEFINE5(name_to_handle_at, int, dfd, const char __user *, name, struct file_handle __user *, handle, int __user *, mnt_id, int, flag) { struct path path; int lookup_flags; int fh_flags; int err; if (flag & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH | AT_HANDLE_FID)) return -EINVAL; lookup_flags = (flag & AT_SYMLINK_FOLLOW) ? LOOKUP_FOLLOW : 0; fh_flags = (flag & AT_HANDLE_FID) ? EXPORT_FH_FID : 0; if (flag & AT_EMPTY_PATH) lookup_flags |= LOOKUP_EMPTY; err = user_path_at(dfd, name, lookup_flags, &path); if (!err) { err = do_sys_name_to_handle(&path, handle, mnt_id, fh_flags); path_put(&path); } return err; } static int get_path_from_fd(int fd, struct path *root) { if (fd == AT_FDCWD) { struct fs_struct *fs = current->fs; spin_lock(&fs->lock); *root = fs->pwd; path_get(root); spin_unlock(&fs->lock); } else { struct fd f = fdget(fd); if (!f.file) return -EBADF; *root = f.file->f_path; path_get(root); fdput(f); } return 0; } enum handle_to_path_flags { HANDLE_CHECK_PERMS = (1 << 0), HANDLE_CHECK_SUBTREE = (1 << 1), }; struct handle_to_path_ctx { struct path root; enum handle_to_path_flags flags; unsigned int fh_flags; }; static int vfs_dentry_acceptable(void *context, struct dentry *dentry) { struct handle_to_path_ctx *ctx = context; struct user_namespace *user_ns = current_user_ns(); struct dentry *d, *root = ctx->root.dentry; struct mnt_idmap *idmap = mnt_idmap(ctx->root.mnt); int retval = 0; if (!root) return 1; /* Old permission model with global CAP_DAC_READ_SEARCH. */ if (!ctx->flags) return 1; /* * It's racy as we're not taking rename_lock but we're able to ignore * permissions and we just need an approximation whether we were able * to follow a path to the file. * * It's also potentially expensive on some filesystems especially if * there is a deep path. */ d = dget(dentry); while (d != root && !IS_ROOT(d)) { struct dentry *parent = dget_parent(d); /* * We know that we have the ability to override DAC permissions * as we've verified this earlier via CAP_DAC_READ_SEARCH. But * we also need to make sure that there aren't any unmapped * inodes in the path that would prevent us from reaching the * file. */ if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(parent))) { dput(d); dput(parent); return retval; } dput(d); d = parent; } if (!(ctx->flags & HANDLE_CHECK_SUBTREE) || d == root) retval = 1; WARN_ON_ONCE(d != root && d != root->d_sb->s_root); dput(d); return retval; } static int do_handle_to_path(struct file_handle *handle, struct path *path, struct handle_to_path_ctx *ctx) { int handle_dwords; struct vfsmount *mnt = ctx->root.mnt; /* change the handle size to multiple of sizeof(u32) */ handle_dwords = handle->handle_bytes >> 2; path->dentry = exportfs_decode_fh_raw(mnt, (struct fid *)handle->f_handle, handle_dwords, handle->handle_type, ctx->fh_flags, vfs_dentry_acceptable, ctx); if (IS_ERR_OR_NULL(path->dentry)) { if (path->dentry == ERR_PTR(-ENOMEM)) return -ENOMEM; return -ESTALE; } path->mnt = mntget(mnt); return 0; } /* * Allow relaxed permissions of file handles if the caller has the * ability to mount the filesystem or create a bind-mount of the * provided @mountdirfd. * * In both cases the caller may be able to get an unobstructed way to * the encoded file handle. If the caller is only able to create a * bind-mount we need to verify that there are no locked mounts on top * of it that could prevent us from getting to the encoded file. * * In principle, locked mounts can prevent the caller from mounting the * filesystem but that only applies to procfs and sysfs neither of which * support decoding file handles. */ static inline bool may_decode_fh(struct handle_to_path_ctx *ctx, unsigned int o_flags) { struct path *root = &ctx->root; /* * Restrict to O_DIRECTORY to provide a deterministic API that avoids a * confusing api in the face of disconnected non-dir dentries. * * There's only one dentry for each directory inode (VFS rule)... */ if (!(o_flags & O_DIRECTORY)) return false; if (ns_capable(root->mnt->mnt_sb->s_user_ns, CAP_SYS_ADMIN)) ctx->flags = HANDLE_CHECK_PERMS; else if (is_mounted(root->mnt) && ns_capable(real_mount(root->mnt)->mnt_ns->user_ns, CAP_SYS_ADMIN) && !has_locked_children(real_mount(root->mnt), root->dentry)) ctx->flags = HANDLE_CHECK_PERMS | HANDLE_CHECK_SUBTREE; else return false; /* Are we able to override DAC permissions? */ if (!ns_capable(current_user_ns(), CAP_DAC_READ_SEARCH)) return false; ctx->fh_flags = EXPORT_FH_DIR_ONLY; return true; } static int handle_to_path(int mountdirfd, struct file_handle __user *ufh, struct path *path, unsigned int o_flags) { int retval = 0; struct file_handle f_handle; struct file_handle *handle = NULL; struct handle_to_path_ctx ctx = {}; retval = get_path_from_fd(mountdirfd, &ctx.root); if (retval) goto out_err; if (!capable(CAP_DAC_READ_SEARCH) && !may_decode_fh(&ctx, o_flags)) { retval = -EPERM; goto out_path; } if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) { retval = -EFAULT; goto out_path; } if ((f_handle.handle_bytes > MAX_HANDLE_SZ) || (f_handle.handle_bytes == 0)) { retval = -EINVAL; goto out_path; } handle = kmalloc(struct_size(handle, f_handle, f_handle.handle_bytes), GFP_KERNEL); if (!handle) { retval = -ENOMEM; goto out_path; } /* copy the full handle */ *handle = f_handle; if (copy_from_user(&handle->f_handle, &ufh->f_handle, f_handle.handle_bytes)) { retval = -EFAULT; goto out_handle; } retval = do_handle_to_path(handle, path, &ctx); out_handle: kfree(handle); out_path: path_put(&ctx.root); out_err: return retval; } static long do_handle_open(int mountdirfd, struct file_handle __user *ufh, int open_flag) { long retval = 0; struct path path; struct file *file; int fd; retval = handle_to_path(mountdirfd, ufh, &path, open_flag); if (retval) return retval; fd = get_unused_fd_flags(open_flag); if (fd < 0) { path_put(&path); return fd; } file = file_open_root(&path, "", open_flag, 0); if (IS_ERR(file)) { put_unused_fd(fd); retval = PTR_ERR(file); } else { retval = fd; fd_install(fd, file); } path_put(&path); return retval; } /** * sys_open_by_handle_at: Open the file handle * @mountdirfd: directory file descriptor * @handle: file handle to be opened * @flags: open flags. * * @mountdirfd indicate the directory file descriptor * of the mount point. file handle is decoded relative * to the vfsmount pointed by the @mountdirfd. @flags * value is same as the open(2) flags. */ SYSCALL_DEFINE3(open_by_handle_at, int, mountdirfd, struct file_handle __user *, handle, int, flags) { long ret; if (force_o_largefile()) flags |= O_LARGEFILE; ret = do_handle_open(mountdirfd, handle, flags); return ret; } #ifdef CONFIG_COMPAT /* * Exactly like fs/open.c:sys_open_by_handle_at(), except that it * doesn't set the O_LARGEFILE flag. */ COMPAT_SYSCALL_DEFINE3(open_by_handle_at, int, mountdirfd, struct file_handle __user *, handle, int, flags) { return do_handle_open(mountdirfd, handle, flags); } #endif |
| 15 15 4 1 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | // SPDX-License-Identifier: GPL-2.0-only /* iptables module for using new netfilter netlink queue * * (C) 2005 by Harald Welte <laforge@netfilter.org> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <linux/netfilter.h> #include <linux/netfilter_arp.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_NFQUEUE.h> #include <net/netfilter/nf_queue.h> MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>"); MODULE_DESCRIPTION("Xtables: packet forwarding to netlink"); MODULE_LICENSE("GPL"); MODULE_ALIAS("ipt_NFQUEUE"); MODULE_ALIAS("ip6t_NFQUEUE"); MODULE_ALIAS("arpt_NFQUEUE"); static u32 jhash_initval __read_mostly; static unsigned int nfqueue_tg(struct sk_buff *skb, const struct xt_action_param *par) { const struct xt_NFQ_info *tinfo = par->targinfo; return NF_QUEUE_NR(tinfo->queuenum); } static unsigned int nfqueue_tg_v1(struct sk_buff *skb, const struct xt_action_param *par) { const struct xt_NFQ_info_v1 *info = par->targinfo; u32 queue = info->queuenum; if (info->queues_total > 1) { queue = nfqueue_hash(skb, queue, info->queues_total, xt_family(par), jhash_initval); } return NF_QUEUE_NR(queue); } static unsigned int nfqueue_tg_v2(struct sk_buff *skb, const struct xt_action_param *par) { const struct xt_NFQ_info_v2 *info = par->targinfo; unsigned int ret = nfqueue_tg_v1(skb, par); if (info->bypass) ret |= NF_VERDICT_FLAG_QUEUE_BYPASS; return ret; } static int nfqueue_tg_check(const struct xt_tgchk_param *par) { const struct xt_NFQ_info_v3 *info = par->targinfo; u32 maxid; init_hashrandom(&jhash_initval); if (info->queues_total == 0) { pr_info_ratelimited("number of total queues is 0\n"); return -EINVAL; } maxid = info->queues_total - 1 + info->queuenum; if (maxid > 0xffff) { pr_info_ratelimited("number of queues (%u) out of range (got %u)\n", info->queues_total, maxid); return -ERANGE; } if (par->target->revision == 2 && info->flags > 1) return -EINVAL; if (par->target->revision == 3 && info->flags & ~NFQ_FLAG_MASK) return -EINVAL; return 0; } static unsigned int nfqueue_tg_v3(struct sk_buff *skb, const struct xt_action_param *par) { const struct xt_NFQ_info_v3 *info = par->targinfo; u32 queue = info->queuenum; int ret; if (info->queues_total > 1) { if (info->flags & NFQ_FLAG_CPU_FANOUT) { int cpu = smp_processor_id(); queue = info->queuenum + cpu % info->queues_total; } else { queue = nfqueue_hash(skb, queue, info->queues_total, xt_family(par), jhash_initval); } } ret = NF_QUEUE_NR(queue); if (info->flags & NFQ_FLAG_BYPASS) ret |= NF_VERDICT_FLAG_QUEUE_BYPASS; return ret; } static struct xt_target nfqueue_tg_reg[] __read_mostly = { { .name = "NFQUEUE", .family = NFPROTO_UNSPEC, .target = nfqueue_tg, .targetsize = sizeof(struct xt_NFQ_info), .me = THIS_MODULE, }, { .name = "NFQUEUE", .revision = 1, .family = NFPROTO_UNSPEC, .checkentry = nfqueue_tg_check, .target = nfqueue_tg_v1, .targetsize = sizeof(struct xt_NFQ_info_v1), .me = THIS_MODULE, }, { .name = "NFQUEUE", .revision = 2, .family = NFPROTO_UNSPEC, .checkentry = nfqueue_tg_check, .target = nfqueue_tg_v2, .targetsize = sizeof(struct xt_NFQ_info_v2), .me = THIS_MODULE, }, { .name = "NFQUEUE", .revision = 3, .family = NFPROTO_UNSPEC, .checkentry = nfqueue_tg_check, .target = nfqueue_tg_v3, .targetsize = sizeof(struct xt_NFQ_info_v3), .me = THIS_MODULE, }, }; static int __init nfqueue_tg_init(void) { return xt_register_targets(nfqueue_tg_reg, ARRAY_SIZE(nfqueue_tg_reg)); } static void __exit nfqueue_tg_exit(void) { xt_unregister_targets(nfqueue_tg_reg, ARRAY_SIZE(nfqueue_tg_reg)); } module_init(nfqueue_tg_init); module_exit(nfqueue_tg_exit); |
| 22 49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/dccp/timer.c * * An implementation of the DCCP protocol * Arnaldo Carvalho de Melo <acme@conectiva.com.br> */ #include <linux/dccp.h> #include <linux/skbuff.h> #include <linux/export.h> #include "dccp.h" /* sysctl variables governing numbers of retransmission attempts */ int sysctl_dccp_request_retries __read_mostly = TCP_SYN_RETRIES; int sysctl_dccp_retries1 __read_mostly = TCP_RETR1; int sysctl_dccp_retries2 __read_mostly = TCP_RETR2; static void dccp_write_err(struct sock *sk) { sk->sk_err = READ_ONCE(sk->sk_err_soft) ? : ETIMEDOUT; sk_error_report(sk); dccp_send_reset(sk, DCCP_RESET_CODE_ABORTED); dccp_done(sk); __DCCP_INC_STATS(DCCP_MIB_ABORTONTIMEOUT); } /* A write timeout has occurred. Process the after effects. */ static int dccp_write_timeout(struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); int retry_until; if (sk->sk_state == DCCP_REQUESTING || sk->sk_state == DCCP_PARTOPEN) { if (icsk->icsk_retransmits != 0) dst_negative_advice(sk); retry_until = icsk->icsk_syn_retries ? : sysctl_dccp_request_retries; } else { if (icsk->icsk_retransmits >= sysctl_dccp_retries1) { /* NOTE. draft-ietf-tcpimpl-pmtud-01.txt requires pmtu black hole detection. :-( It is place to make it. It is not made. I do not want to make it. It is disguisting. It does not work in any case. Let me to cite the same draft, which requires for us to implement this: "The one security concern raised by this memo is that ICMP black holes are often caused by over-zealous security administrators who block all ICMP messages. It is vitally important that those who design and deploy security systems understand the impact of strict filtering on upper-layer protocols. The safest web site in the world is worthless if most TCP implementations cannot transfer data from it. It would be far nicer to have all of the black holes fixed rather than fixing all of the TCP implementations." Golden words :-). */ dst_negative_advice(sk); } retry_until = sysctl_dccp_retries2; /* * FIXME: see tcp_write_timout and tcp_out_of_resources */ } if (icsk->icsk_retransmits >= retry_until) { /* Has it gone just too far? */ dccp_write_err(sk); return 1; } return 0; } /* * The DCCP retransmit timer. */ static void dccp_retransmit_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); /* * More than 4MSL (8 minutes) has passed, a RESET(aborted) was * sent, no need to retransmit, this sock is dead. */ if (dccp_write_timeout(sk)) return; /* * We want to know the number of packets retransmitted, not the * total number of retransmissions of clones of original packets. */ if (icsk->icsk_retransmits == 0) __DCCP_INC_STATS(DCCP_MIB_TIMEOUTS); if (dccp_retransmit_skb(sk) != 0) { /* * Retransmission failed because of local congestion, * do not backoff. */ if (--icsk->icsk_retransmits == 0) icsk->icsk_retransmits = 1; inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, min(icsk->icsk_rto, TCP_RESOURCE_PROBE_INTERVAL), DCCP_RTO_MAX); return; } icsk->icsk_backoff++; icsk->icsk_rto = min(icsk->icsk_rto << 1, DCCP_RTO_MAX); inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, DCCP_RTO_MAX); if (icsk->icsk_retransmits > sysctl_dccp_retries1) __sk_dst_reset(sk); } static void dccp_write_timer(struct timer_list *t) { struct inet_connection_sock *icsk = from_timer(icsk, t, icsk_retransmit_timer); struct sock *sk = &icsk->icsk_inet.sk; int event = 0; bh_lock_sock(sk); if (sock_owned_by_user(sk)) { /* Try again later */ sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + (HZ / 20)); goto out; } if (sk->sk_state == DCCP_CLOSED || !icsk->icsk_pending) goto out; if (time_after(icsk->icsk_timeout, jiffies)) { sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout); goto out; } event = icsk->icsk_pending; icsk->icsk_pending = 0; switch (event) { case ICSK_TIME_RETRANS: dccp_retransmit_timer(sk); break; } out: bh_unlock_sock(sk); sock_put(sk); } static void dccp_keepalive_timer(struct timer_list *t) { struct sock *sk = from_timer(sk, t, sk_timer); pr_err("dccp should not use a keepalive timer !\n"); sock_put(sk); } /* This is the same as tcp_delack_timer, sans prequeue & mem_reclaim stuff */ static void dccp_delack_timer(struct timer_list *t) { struct inet_connection_sock *icsk = from_timer(icsk, t, icsk_delack_timer); struct sock *sk = &icsk->icsk_inet.sk; bh_lock_sock(sk); if (sock_owned_by_user(sk)) { /* Try again later. */ __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED); sk_reset_timer(sk, &icsk->icsk_delack_timer, jiffies + TCP_DELACK_MIN); goto out; } if (sk->sk_state == DCCP_CLOSED || !(icsk->icsk_ack.pending & ICSK_ACK_TIMER)) goto out; if (time_after(icsk->icsk_ack.timeout, jiffies)) { sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout); goto out; } icsk->icsk_ack.pending &= ~ICSK_ACK_TIMER; if (inet_csk_ack_scheduled(sk)) { if (!inet_csk_in_pingpong_mode(sk)) { /* Delayed ACK missed: inflate ATO. */ icsk->icsk_ack.ato = min_t(u32, icsk->icsk_ack.ato << 1, icsk->icsk_rto); } else { /* Delayed ACK missed: leave pingpong mode and * deflate ATO. */ inet_csk_exit_pingpong_mode(sk); icsk->icsk_ack.ato = TCP_ATO_MIN; } dccp_send_ack(sk); __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS); } out: bh_unlock_sock(sk); sock_put(sk); } /** * dccp_write_xmitlet - Workhorse for CCID packet dequeueing interface * @t: pointer to the tasklet associated with this handler * * See the comments above %ccid_dequeueing_decision for supported modes. */ static void dccp_write_xmitlet(struct tasklet_struct *t) { struct dccp_sock *dp = from_tasklet(dp, t, dccps_xmitlet); struct sock *sk = &dp->dccps_inet_connection.icsk_inet.sk; bh_lock_sock(sk); if (sock_owned_by_user(sk)) sk_reset_timer(sk, &dccp_sk(sk)->dccps_xmit_timer, jiffies + 1); else dccp_write_xmit(sk); bh_unlock_sock(sk); sock_put(sk); } static void dccp_write_xmit_timer(struct timer_list *t) { struct dccp_sock *dp = from_timer(dp, t, dccps_xmit_timer); dccp_write_xmitlet(&dp->dccps_xmitlet); } void dccp_init_xmit_timers(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); tasklet_setup(&dp->dccps_xmitlet, dccp_write_xmitlet); timer_setup(&dp->dccps_xmit_timer, dccp_write_xmit_timer, 0); inet_csk_init_xmit_timers(sk, &dccp_write_timer, &dccp_delack_timer, &dccp_keepalive_timer); } static ktime_t dccp_timestamp_seed; /** * dccp_timestamp - 10s of microseconds time source * Returns the number of 10s of microseconds since loading DCCP. This is native * DCCP time difference format (RFC 4340, sec. 13). * Please note: This will wrap around about circa every 11.9 hours. */ u32 dccp_timestamp(void) { u64 delta = (u64)ktime_us_delta(ktime_get_real(), dccp_timestamp_seed); do_div(delta, 10); return delta; } EXPORT_SYMBOL_GPL(dccp_timestamp); void __init dccp_timestamping_init(void) { dccp_timestamp_seed = ktime_get_real(); } |
| 88 88 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | // SPDX-License-Identifier: GPL-2.0-or-later /* * CRC32C *@Article{castagnoli-crc, * author = { Guy Castagnoli and Stefan Braeuer and Martin Herrman}, * title = {{Optimization of Cyclic Redundancy-Check Codes with 24 * and 32 Parity Bits}}, * journal = IEEE Transactions on Communication, * year = {1993}, * volume = {41}, * number = {6}, * pages = {}, * month = {June}, *} * Used by the iSCSI driver, possibly others, and derived from * the iscsi-crc.c module of the linux-iscsi driver at * http://linux-iscsi.sourceforge.net. * * Following the example of lib/crc32, this function is intended to be * flexible and useful for all users. Modules that currently have their * own crc32c, but hopefully may be able to use this one are: * net/sctp (please add all your doco to here if you change to * use this one!) * <endoflist> * * Copyright (c) 2004 Cisco Systems, Inc. */ #include <crypto/hash.h> #include <linux/err.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/crc32c.h> static struct crypto_shash *tfm; u32 crc32c(u32 crc, const void *address, unsigned int length) { SHASH_DESC_ON_STACK(shash, tfm); u32 ret, *ctx = (u32 *)shash_desc_ctx(shash); int err; shash->tfm = tfm; *ctx = crc; err = crypto_shash_update(shash, address, length); BUG_ON(err); ret = *ctx; barrier_data(ctx); return ret; } EXPORT_SYMBOL(crc32c); static int __init libcrc32c_mod_init(void) { tfm = crypto_alloc_shash("crc32c", 0, 0); return PTR_ERR_OR_ZERO(tfm); } static void __exit libcrc32c_mod_fini(void) { crypto_free_shash(tfm); } module_init(libcrc32c_mod_init); module_exit(libcrc32c_mod_fini); MODULE_AUTHOR("Clay Haapala <chaapala@cisco.com>"); MODULE_DESCRIPTION("CRC32c (Castagnoli) calculations"); MODULE_LICENSE("GPL"); MODULE_SOFTDEP("pre: crc32c"); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | // SPDX-License-Identifier: GPL-2.0 #ifndef _LINUX_MM_SLOT_H #define _LINUX_MM_SLOT_H #include <linux/hashtable.h> #include <linux/slab.h> /* * struct mm_slot - hash lookup from mm to mm_slot * @hash: link to the mm_slots hash list * @mm_node: link into the mm_slots list * @mm: the mm that this information is valid for */ struct mm_slot { struct hlist_node hash; struct list_head mm_node; struct mm_struct *mm; }; #define mm_slot_entry(ptr, type, member) \ container_of(ptr, type, member) static inline void *mm_slot_alloc(struct kmem_cache *cache) { if (!cache) /* initialization failed */ return NULL; return kmem_cache_zalloc(cache, GFP_KERNEL); } static inline void mm_slot_free(struct kmem_cache *cache, void *objp) { kmem_cache_free(cache, objp); } #define mm_slot_lookup(_hashtable, _mm) \ ({ \ struct mm_slot *tmp_slot, *mm_slot = NULL; \ \ hash_for_each_possible(_hashtable, tmp_slot, hash, (unsigned long)_mm) \ if (_mm == tmp_slot->mm) { \ mm_slot = tmp_slot; \ break; \ } \ \ mm_slot; \ }) #define mm_slot_insert(_hashtable, _mm, _mm_slot) \ ({ \ _mm_slot->mm = _mm; \ hash_add(_hashtable, &_mm_slot->hash, (unsigned long)_mm); \ }) #endif /* _LINUX_MM_SLOT_H */ |
| 2 1 1 3 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | // SPDX-License-Identifier: GPL-2.0-only /* * (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org> * (C) 2011 Patrick McHardy <kaber@trash.net> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <linux/netfilter.h> #include <linux/netfilter/x_tables.h> #include <net/netfilter/nf_nat.h> static int xt_nat_checkentry_v0(const struct xt_tgchk_param *par) { const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; if (mr->rangesize != 1) { pr_info_ratelimited("multiple ranges no longer supported\n"); return -EINVAL; } return nf_ct_netns_get(par->net, par->family); } static int xt_nat_checkentry(const struct xt_tgchk_param *par) { return nf_ct_netns_get(par->net, par->family); } static void xt_nat_destroy(const struct xt_tgdtor_param *par) { nf_ct_netns_put(par->net, par->family); } static void xt_nat_convert_range(struct nf_nat_range2 *dst, const struct nf_nat_ipv4_range *src) { memset(&dst->min_addr, 0, sizeof(dst->min_addr)); memset(&dst->max_addr, 0, sizeof(dst->max_addr)); memset(&dst->base_proto, 0, sizeof(dst->base_proto)); dst->flags = src->flags; dst->min_addr.ip = src->min_ip; dst->max_addr.ip = src->max_ip; dst->min_proto = src->min; dst->max_proto = src->max; } static unsigned int xt_snat_target_v0(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; struct nf_nat_range2 range; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY))); xt_nat_convert_range(&range, &mr->range[0]); return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC); } static unsigned int xt_dnat_target_v0(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; struct nf_nat_range2 range; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED))); xt_nat_convert_range(&range, &mr->range[0]); return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST); } static unsigned int xt_snat_target_v1(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_range *range_v1 = par->targinfo; struct nf_nat_range2 range; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY))); memcpy(&range, range_v1, sizeof(*range_v1)); memset(&range.base_proto, 0, sizeof(range.base_proto)); return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC); } static unsigned int xt_dnat_target_v1(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_range *range_v1 = par->targinfo; struct nf_nat_range2 range; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED))); memcpy(&range, range_v1, sizeof(*range_v1)); memset(&range.base_proto, 0, sizeof(range.base_proto)); return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST); } static unsigned int xt_snat_target_v2(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_range2 *range = par->targinfo; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY))); return nf_nat_setup_info(ct, range, NF_NAT_MANIP_SRC); } static unsigned int xt_dnat_target_v2(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_range2 *range = par->targinfo; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); WARN_ON(!(ct != NULL && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED))); return nf_nat_setup_info(ct, range, NF_NAT_MANIP_DST); } static struct xt_target xt_nat_target_reg[] __read_mostly = { { .name = "SNAT", .revision = 0, .checkentry = xt_nat_checkentry_v0, .destroy = xt_nat_destroy, .target = xt_snat_target_v0, .targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat), .family = NFPROTO_IPV4, .table = "nat", .hooks = (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_IN), .me = THIS_MODULE, }, { .name = "DNAT", .revision = 0, .checkentry = xt_nat_checkentry_v0, .destroy = xt_nat_destroy, .target = xt_dnat_target_v0, .targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat), .family = NFPROTO_IPV4, .table = "nat", .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, { .name = "SNAT", .revision = 1, .checkentry = xt_nat_checkentry, .destroy = xt_nat_destroy, .target = xt_snat_target_v1, .targetsize = sizeof(struct nf_nat_range), .table = "nat", .hooks = (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_IN), .me = THIS_MODULE, }, { .name = "DNAT", .revision = 1, .checkentry = xt_nat_checkentry, .destroy = xt_nat_destroy, .target = xt_dnat_target_v1, .targetsize = sizeof(struct nf_nat_range), .table = "nat", .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, { .name = "SNAT", .revision = 2, .checkentry = xt_nat_checkentry, .destroy = xt_nat_destroy, .target = xt_snat_target_v2, .targetsize = sizeof(struct nf_nat_range2), .table = "nat", .hooks = (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_IN), .me = THIS_MODULE, }, { .name = "DNAT", .revision = 2, .checkentry = xt_nat_checkentry, .destroy = xt_nat_destroy, .target = xt_dnat_target_v2, .targetsize = sizeof(struct nf_nat_range2), .table = "nat", .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT), .me = THIS_MODULE, }, }; static int __init xt_nat_init(void) { return xt_register_targets(xt_nat_target_reg, ARRAY_SIZE(xt_nat_target_reg)); } static void __exit xt_nat_exit(void) { xt_unregister_targets(xt_nat_target_reg, ARRAY_SIZE(xt_nat_target_reg)); } module_init(xt_nat_init); module_exit(xt_nat_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_ALIAS("ipt_SNAT"); MODULE_ALIAS("ipt_DNAT"); MODULE_ALIAS("ip6t_SNAT"); MODULE_ALIAS("ip6t_DNAT"); MODULE_DESCRIPTION("SNAT and DNAT targets support"); |
| 1705 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | /* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef _NET_RPS_H #define _NET_RPS_H #include <linux/types.h> #include <linux/static_key.h> #include <net/sock.h> #include <net/hotdata.h> #ifdef CONFIG_RPS extern struct static_key_false rps_needed; extern struct static_key_false rfs_needed; /* * This structure holds an RPS map which can be of variable length. The * map is an array of CPUs. */ struct rps_map { unsigned int len; struct rcu_head rcu; u16 cpus[]; }; #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + ((_num) * sizeof(u16))) /* * The rps_dev_flow structure contains the mapping of a flow to a CPU, the * tail pointer for that CPU's input queue at the time of last enqueue, and * a hardware filter index. */ struct rps_dev_flow { u16 cpu; u16 filter; unsigned int last_qtail; }; #define RPS_NO_FILTER 0xffff /* * The rps_dev_flow_table structure contains a table of flow mappings. */ struct rps_dev_flow_table { unsigned int mask; struct rcu_head rcu; struct rps_dev_flow flows[]; }; #define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \ ((_num) * sizeof(struct rps_dev_flow))) /* * The rps_sock_flow_table contains mappings of flows to the last CPU * on which they were processed by the application (set in recvmsg). * Each entry is a 32bit value. Upper part is the high-order bits * of flow hash, lower part is CPU number. * rps_cpu_mask is used to partition the space, depending on number of * possible CPUs : rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1 * For example, if 64 CPUs are possible, rps_cpu_mask = 0x3f, * meaning we use 32-6=26 bits for the hash. */ struct rps_sock_flow_table { u32 mask; u32 ents[] ____cacheline_aligned_in_smp; }; #define RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num])) #define RPS_NO_CPU 0xffff static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, u32 hash) { unsigned int index = hash & table->mask; u32 val = hash & ~net_hotdata.rps_cpu_mask; /* We only give a hint, preemption can change CPU under us */ val |= raw_smp_processor_id(); /* The following WRITE_ONCE() is paired with the READ_ONCE() * here, and another one in get_rps_cpu(). */ if (READ_ONCE(table->ents[index]) != val) WRITE_ONCE(table->ents[index], val); } #endif /* CONFIG_RPS */ static inline void sock_rps_record_flow_hash(__u32 hash) { #ifdef CONFIG_RPS struct rps_sock_flow_table *sock_flow_table; if (!hash) return; rcu_read_lock(); sock_flow_table = rcu_dereference(net_hotdata.rps_sock_flow_table); if (sock_flow_table) rps_record_sock_flow(sock_flow_table, hash); rcu_read_unlock(); #endif } static inline void sock_rps_record_flow(const struct sock *sk) { #ifdef CONFIG_RPS if (static_branch_unlikely(&rfs_needed)) { /* Reading sk->sk_rxhash might incur an expensive cache line * miss. * * TCP_ESTABLISHED does cover almost all states where RFS * might be useful, and is cheaper [1] than testing : * IPv4: inet_sk(sk)->inet_daddr * IPv6: ipv6_addr_any(&sk->sk_v6_daddr) * OR an additional socket flag * [1] : sk_state and sk_prot are in the same cache line. */ if (sk->sk_state == TCP_ESTABLISHED) { /* This READ_ONCE() is paired with the WRITE_ONCE() * from sock_rps_save_rxhash() and sock_rps_reset_rxhash(). */ sock_rps_record_flow_hash(READ_ONCE(sk->sk_rxhash)); } } #endif } static inline u32 rps_input_queue_tail_incr(struct softnet_data *sd) { #ifdef CONFIG_RPS return ++sd->input_queue_tail; #else return 0; #endif } static inline void rps_input_queue_tail_save(u32 *dest, u32 tail) { #ifdef CONFIG_RPS WRITE_ONCE(*dest, tail); #endif } static inline void rps_input_queue_head_add(struct softnet_data *sd, int val) { #ifdef CONFIG_RPS WRITE_ONCE(sd->input_queue_head, sd->input_queue_head + val); #endif } static inline void rps_input_queue_head_incr(struct softnet_data *sd) { rps_input_queue_head_add(sd, 1); } #endif /* _NET_RPS_H */ |
| 10 11 8 1 10 15 15 1 5 10 9 10 10 10 10 2 2 2 2 1 1 2 13 13 3 3 1 2 3 13 13 1 5 5 25 21 4 2 15 1 2 5 23 3 7 6 5 4 3 2 10 3 3 3 3 5 5 4 5 1 4 5 3 2 5 2 1 1 1 1 25 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Sunplus spca504(abc) spca533 spca536 library * Copyright (C) 2005 Michel Xhaard mxhaard@magic.fr * * V4L2 by Jean-Francois Moine <http://moinejf.free.fr> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #define MODULE_NAME "sunplus" #include "gspca.h" #include "jpeg.h" MODULE_AUTHOR("Michel Xhaard <mxhaard@users.sourceforge.net>"); MODULE_DESCRIPTION("GSPCA/SPCA5xx USB Camera Driver"); MODULE_LICENSE("GPL"); #define QUALITY 85 /* specific webcam descriptor */ struct sd { struct gspca_dev gspca_dev; /* !! must be the first item */ bool autogain; u8 bridge; #define BRIDGE_SPCA504 0 #define BRIDGE_SPCA504B 1 #define BRIDGE_SPCA504C 2 #define BRIDGE_SPCA533 3 #define BRIDGE_SPCA536 4 u8 subtype; #define AiptekMiniPenCam13 1 #define LogitechClickSmart420 2 #define LogitechClickSmart820 3 #define MegapixV4 4 #define MegaImageVI 5 u8 jpeg_hdr[JPEG_HDR_SZ]; }; static const struct v4l2_pix_format vga_mode[] = { {320, 240, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {640, 480, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, }; static const struct v4l2_pix_format custom_mode[] = { {320, 240, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {464, 480, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 464, .sizeimage = 464 * 480 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, }; static const struct v4l2_pix_format vga_mode2[] = { {176, 144, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 176, .sizeimage = 176 * 144 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 4}, {320, 240, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 3}, {352, 288, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 352, .sizeimage = 352 * 288 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {640, 480, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, }; #define SPCA50X_OFFSET_DATA 10 #define SPCA504_PCCAM600_OFFSET_SNAPSHOT 3 #define SPCA504_PCCAM600_OFFSET_COMPRESS 4 #define SPCA504_PCCAM600_OFFSET_MODE 5 #define SPCA504_PCCAM600_OFFSET_DATA 14 /* Frame packet header offsets for the spca533 */ #define SPCA533_OFFSET_DATA 16 #define SPCA533_OFFSET_FRAMSEQ 15 /* Frame packet header offsets for the spca536 */ #define SPCA536_OFFSET_DATA 4 #define SPCA536_OFFSET_FRAMSEQ 1 struct cmd { u8 req; u16 val; u16 idx; }; /* Initialisation data for the Creative PC-CAM 600 */ static const struct cmd spca504_pccam600_init_data[] = { /* {0xa0, 0x0000, 0x0503}, * capture mode */ {0x00, 0x0000, 0x2000}, {0x00, 0x0013, 0x2301}, {0x00, 0x0003, 0x2000}, {0x00, 0x0001, 0x21ac}, {0x00, 0x0001, 0x21a6}, {0x00, 0x0000, 0x21a7}, /* brightness */ {0x00, 0x0020, 0x21a8}, /* contrast */ {0x00, 0x0001, 0x21ac}, /* sat/hue */ {0x00, 0x0000, 0x21ad}, /* hue */ {0x00, 0x001a, 0x21ae}, /* saturation */ {0x00, 0x0002, 0x21a3}, /* gamma */ {0x30, 0x0154, 0x0008}, {0x30, 0x0004, 0x0006}, {0x30, 0x0258, 0x0009}, {0x30, 0x0004, 0x0000}, {0x30, 0x0093, 0x0004}, {0x30, 0x0066, 0x0005}, {0x00, 0x0000, 0x2000}, {0x00, 0x0013, 0x2301}, {0x00, 0x0003, 0x2000}, {0x00, 0x0013, 0x2301}, {0x00, 0x0003, 0x2000}, }; /* Creative PC-CAM 600 specific open data, sent before using the * generic initialisation data from spca504_open_data. */ static const struct cmd spca504_pccam600_open_data[] = { {0x00, 0x0001, 0x2501}, {0x20, 0x0500, 0x0001}, /* snapshot mode */ {0x00, 0x0003, 0x2880}, {0x00, 0x0001, 0x2881}, }; /* Initialisation data for the logitech clicksmart 420 */ static const struct cmd spca504A_clicksmart420_init_data[] = { /* {0xa0, 0x0000, 0x0503}, * capture mode */ {0x00, 0x0000, 0x2000}, {0x00, 0x0013, 0x2301}, {0x00, 0x0003, 0x2000}, {0x00, 0x0001, 0x21ac}, {0x00, 0x0001, 0x21a6}, {0x00, 0x0000, 0x21a7}, /* brightness */ {0x00, 0x0020, 0x21a8}, /* contrast */ {0x00, 0x0001, 0x21ac}, /* sat/hue */ {0x00, 0x0000, 0x21ad}, /* hue */ {0x00, 0x001a, 0x21ae}, /* saturation */ {0x00, 0x0002, 0x21a3}, /* gamma */ {0x30, 0x0004, 0x000a}, {0xb0, 0x0001, 0x0000}, {0xa1, 0x0080, 0x0001}, {0x30, 0x0049, 0x0000}, {0x30, 0x0060, 0x0005}, {0x0c, 0x0004, 0x0000}, {0x00, 0x0000, 0x0000}, {0x00, 0x0000, 0x2000}, {0x00, 0x0013, 0x2301}, {0x00, 0x0003, 0x2000}, }; /* clicksmart 420 open data ? */ static const struct cmd spca504A_clicksmart420_open_data[] = { {0x00, 0x0001, 0x2501}, {0x20, 0x0502, 0x0000}, {0x06, 0x0000, 0x0000}, {0x00, 0x0004, 0x2880}, {0x00, 0x0001, 0x2881}, {0xa0, 0x0000, 0x0503}, }; static const u8 qtable_creative_pccam[2][64] = { { /* Q-table Y-components */ 0x05, 0x03, 0x03, 0x05, 0x07, 0x0c, 0x0f, 0x12, 0x04, 0x04, 0x04, 0x06, 0x08, 0x11, 0x12, 0x11, 0x04, 0x04, 0x05, 0x07, 0x0c, 0x11, 0x15, 0x11, 0x04, 0x05, 0x07, 0x09, 0x0f, 0x1a, 0x18, 0x13, 0x05, 0x07, 0x0b, 0x11, 0x14, 0x21, 0x1f, 0x17, 0x07, 0x0b, 0x11, 0x13, 0x18, 0x1f, 0x22, 0x1c, 0x0f, 0x13, 0x17, 0x1a, 0x1f, 0x24, 0x24, 0x1e, 0x16, 0x1c, 0x1d, 0x1d, 0x22, 0x1e, 0x1f, 0x1e}, { /* Q-table C-components */ 0x05, 0x05, 0x07, 0x0e, 0x1e, 0x1e, 0x1e, 0x1e, 0x05, 0x06, 0x08, 0x14, 0x1e, 0x1e, 0x1e, 0x1e, 0x07, 0x08, 0x11, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x0e, 0x14, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e} }; /* FIXME: This Q-table is identical to the Creative PC-CAM one, * except for one byte. Possibly a typo? * NWG: 18/05/2003. */ static const u8 qtable_spca504_default[2][64] = { { /* Q-table Y-components */ 0x05, 0x03, 0x03, 0x05, 0x07, 0x0c, 0x0f, 0x12, 0x04, 0x04, 0x04, 0x06, 0x08, 0x11, 0x12, 0x11, 0x04, 0x04, 0x05, 0x07, 0x0c, 0x11, 0x15, 0x11, 0x04, 0x05, 0x07, 0x09, 0x0f, 0x1a, 0x18, 0x13, 0x05, 0x07, 0x0b, 0x11, 0x14, 0x21, 0x1f, 0x17, 0x07, 0x0b, 0x11, 0x13, 0x18, 0x1f, 0x22, 0x1c, 0x0f, 0x13, 0x17, 0x1a, 0x1f, 0x24, 0x24, 0x1e, 0x16, 0x1c, 0x1d, 0x1d, 0x1d /* 0x22 */ , 0x1e, 0x1f, 0x1e, }, { /* Q-table C-components */ 0x05, 0x05, 0x07, 0x0e, 0x1e, 0x1e, 0x1e, 0x1e, 0x05, 0x06, 0x08, 0x14, 0x1e, 0x1e, 0x1e, 0x1e, 0x07, 0x08, 0x11, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x0e, 0x14, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e, 0x1e} }; /* read <len> bytes to gspca_dev->usb_buf */ static void reg_r(struct gspca_dev *gspca_dev, u8 req, u16 index, u16 len) { int ret; if (len > USB_BUF_SZ) { gspca_err(gspca_dev, "reg_r: buffer overflow\n"); return; } if (len == 0) { gspca_err(gspca_dev, "reg_r: zero-length read\n"); return; } if (gspca_dev->usb_err < 0) return; ret = usb_control_msg(gspca_dev->dev, usb_rcvctrlpipe(gspca_dev->dev, 0), req, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, /* value */ index, gspca_dev->usb_buf, len, 500); if (ret < 0) { pr_err("reg_r err %d\n", ret); gspca_dev->usb_err = ret; /* * Make sure the buffer is zeroed to avoid uninitialized * values. */ memset(gspca_dev->usb_buf, 0, USB_BUF_SZ); } } /* write one byte */ static void reg_w_1(struct gspca_dev *gspca_dev, u8 req, u16 value, u16 index, u16 byte) { int ret; if (gspca_dev->usb_err < 0) return; gspca_dev->usb_buf[0] = byte; ret = usb_control_msg(gspca_dev->dev, usb_sndctrlpipe(gspca_dev->dev, 0), req, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, value, index, gspca_dev->usb_buf, 1, 500); if (ret < 0) { pr_err("reg_w_1 err %d\n", ret); gspca_dev->usb_err = ret; } } /* write req / index / value */ static void reg_w_riv(struct gspca_dev *gspca_dev, u8 req, u16 index, u16 value) { struct usb_device *dev = gspca_dev->dev; int ret; if (gspca_dev->usb_err < 0) return; ret = usb_control_msg(dev, usb_sndctrlpipe(dev, 0), req, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, value, index, NULL, 0, 500); if (ret < 0) { pr_err("reg_w_riv err %d\n", ret); gspca_dev->usb_err = ret; return; } gspca_dbg(gspca_dev, D_USBO, "reg_w_riv: 0x%02x,0x%04x:0x%04x\n", req, index, value); } static void write_vector(struct gspca_dev *gspca_dev, const struct cmd *data, int ncmds) { while (--ncmds >= 0) { reg_w_riv(gspca_dev, data->req, data->idx, data->val); data++; } } static void setup_qtable(struct gspca_dev *gspca_dev, const u8 qtable[2][64]) { int i; /* loop over y components */ for (i = 0; i < 64; i++) reg_w_riv(gspca_dev, 0x00, 0x2800 + i, qtable[0][i]); /* loop over c components */ for (i = 0; i < 64; i++) reg_w_riv(gspca_dev, 0x00, 0x2840 + i, qtable[1][i]); } static void spca504_acknowledged_command(struct gspca_dev *gspca_dev, u8 req, u16 idx, u16 val) { reg_w_riv(gspca_dev, req, idx, val); reg_r(gspca_dev, 0x01, 0x0001, 1); gspca_dbg(gspca_dev, D_FRAM, "before wait 0x%04x\n", gspca_dev->usb_buf[0]); reg_w_riv(gspca_dev, req, idx, val); msleep(200); reg_r(gspca_dev, 0x01, 0x0001, 1); gspca_dbg(gspca_dev, D_FRAM, "after wait 0x%04x\n", gspca_dev->usb_buf[0]); } static void spca504_read_info(struct gspca_dev *gspca_dev) { int i; u8 info[6]; if (gspca_debug < D_STREAM) return; for (i = 0; i < 6; i++) { reg_r(gspca_dev, 0, i, 1); info[i] = gspca_dev->usb_buf[0]; } gspca_dbg(gspca_dev, D_STREAM, "Read info: %d %d %d %d %d %d. Should be 1,0,2,2,0,0\n", info[0], info[1], info[2], info[3], info[4], info[5]); } static void spca504A_acknowledged_command(struct gspca_dev *gspca_dev, u8 req, u16 idx, u16 val, u8 endcode, u8 count) { u16 status; reg_w_riv(gspca_dev, req, idx, val); reg_r(gspca_dev, 0x01, 0x0001, 1); if (gspca_dev->usb_err < 0) return; gspca_dbg(gspca_dev, D_FRAM, "Status 0x%02x Need 0x%02x\n", gspca_dev->usb_buf[0], endcode); if (!count) return; count = 200; while (--count > 0) { msleep(10); /* gsmart mini2 write a each wait setting 1 ms is enough */ /* reg_w_riv(gspca_dev, req, idx, val); */ reg_r(gspca_dev, 0x01, 0x0001, 1); status = gspca_dev->usb_buf[0]; if (status == endcode) { gspca_dbg(gspca_dev, D_FRAM, "status 0x%04x after wait %d\n", status, 200 - count); break; } } } static void spca504B_PollingDataReady(struct gspca_dev *gspca_dev) { int count = 10; while (--count > 0) { reg_r(gspca_dev, 0x21, 0, 1); if ((gspca_dev->usb_buf[0] & 0x01) == 0) break; msleep(10); } } static void spca504B_WaitCmdStatus(struct gspca_dev *gspca_dev) { int count = 50; while (--count > 0) { reg_r(gspca_dev, 0x21, 1, 1); if (gspca_dev->usb_buf[0] != 0) { reg_w_1(gspca_dev, 0x21, 0, 1, 0); reg_r(gspca_dev, 0x21, 1, 1); spca504B_PollingDataReady(gspca_dev); break; } msleep(10); } } static void spca50x_GetFirmware(struct gspca_dev *gspca_dev) { u8 *data; if (gspca_debug < D_STREAM) return; data = gspca_dev->usb_buf; reg_r(gspca_dev, 0x20, 0, 5); gspca_dbg(gspca_dev, D_STREAM, "FirmWare: %d %d %d %d %d\n", data[0], data[1], data[2], data[3], data[4]); reg_r(gspca_dev, 0x23, 0, 64); reg_r(gspca_dev, 0x23, 1, 64); } static void spca504B_SetSizeType(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; u8 Size; Size = gspca_dev->cam.cam_mode[gspca_dev->curr_mode].priv; switch (sd->bridge) { case BRIDGE_SPCA533: reg_w_riv(gspca_dev, 0x31, 0, 0); spca504B_WaitCmdStatus(gspca_dev); spca504B_PollingDataReady(gspca_dev); spca50x_GetFirmware(gspca_dev); reg_w_1(gspca_dev, 0x24, 0, 8, 2); /* type */ reg_r(gspca_dev, 0x24, 8, 1); reg_w_1(gspca_dev, 0x25, 0, 4, Size); reg_r(gspca_dev, 0x25, 4, 1); /* size */ spca504B_PollingDataReady(gspca_dev); /* Init the cam width height with some values get on init ? */ reg_w_riv(gspca_dev, 0x31, 0x0004, 0x00); spca504B_WaitCmdStatus(gspca_dev); spca504B_PollingDataReady(gspca_dev); break; default: /* case BRIDGE_SPCA504B: */ /* case BRIDGE_SPCA536: */ reg_w_1(gspca_dev, 0x25, 0, 4, Size); reg_r(gspca_dev, 0x25, 4, 1); /* size */ reg_w_1(gspca_dev, 0x27, 0, 0, 6); reg_r(gspca_dev, 0x27, 0, 1); /* type */ spca504B_PollingDataReady(gspca_dev); break; case BRIDGE_SPCA504: Size += 3; if (sd->subtype == AiptekMiniPenCam13) { /* spca504a aiptek */ spca504A_acknowledged_command(gspca_dev, 0x08, Size, 0, 0x80 | (Size & 0x0f), 1); spca504A_acknowledged_command(gspca_dev, 1, 3, 0, 0x9f, 0); } else { spca504_acknowledged_command(gspca_dev, 0x08, Size, 0); } break; case BRIDGE_SPCA504C: /* capture mode */ reg_w_riv(gspca_dev, 0xa0, (0x0500 | (Size & 0x0f)), 0x00); reg_w_riv(gspca_dev, 0x20, 0x01, 0x0500 | (Size & 0x0f)); break; } } static void spca504_wait_status(struct gspca_dev *gspca_dev) { int cnt; cnt = 256; while (--cnt > 0) { /* With this we get the status, when return 0 it's all ok */ reg_r(gspca_dev, 0x06, 0x00, 1); if (gspca_dev->usb_buf[0] == 0) return; msleep(10); } } static void spca504B_setQtable(struct gspca_dev *gspca_dev) { reg_w_1(gspca_dev, 0x26, 0, 0, 3); reg_r(gspca_dev, 0x26, 0, 1); spca504B_PollingDataReady(gspca_dev); } static void setbrightness(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; u16 reg; reg = sd->bridge == BRIDGE_SPCA536 ? 0x20f0 : 0x21a7; reg_w_riv(gspca_dev, 0x00, reg, val); } static void setcontrast(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; u16 reg; reg = sd->bridge == BRIDGE_SPCA536 ? 0x20f1 : 0x21a8; reg_w_riv(gspca_dev, 0x00, reg, val); } static void setcolors(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; u16 reg; reg = sd->bridge == BRIDGE_SPCA536 ? 0x20f6 : 0x21ae; reg_w_riv(gspca_dev, 0x00, reg, val); } static void init_ctl_reg(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; int pollreg = 1; switch (sd->bridge) { case BRIDGE_SPCA504: case BRIDGE_SPCA504C: pollreg = 0; fallthrough; default: /* case BRIDGE_SPCA533: */ /* case BRIDGE_SPCA504B: */ reg_w_riv(gspca_dev, 0, 0x21ad, 0x00); /* hue */ reg_w_riv(gspca_dev, 0, 0x21ac, 0x01); /* sat/hue */ reg_w_riv(gspca_dev, 0, 0x21a3, 0x00); /* gamma */ break; case BRIDGE_SPCA536: reg_w_riv(gspca_dev, 0, 0x20f5, 0x40); reg_w_riv(gspca_dev, 0, 0x20f4, 0x01); reg_w_riv(gspca_dev, 0, 0x2089, 0x00); break; } if (pollreg) spca504B_PollingDataReady(gspca_dev); } /* this function is called at probe time */ static int sd_config(struct gspca_dev *gspca_dev, const struct usb_device_id *id) { struct sd *sd = (struct sd *) gspca_dev; struct cam *cam; cam = &gspca_dev->cam; sd->bridge = id->driver_info >> 8; sd->subtype = id->driver_info; if (sd->subtype == AiptekMiniPenCam13) { /* try to get the firmware as some cam answer 2.0.1.2.2 * and should be a spca504b then overwrite that setting */ reg_r(gspca_dev, 0x20, 0, 1); switch (gspca_dev->usb_buf[0]) { case 1: break; /* (right bridge/subtype) */ case 2: sd->bridge = BRIDGE_SPCA504B; sd->subtype = 0; break; default: return -ENODEV; } } switch (sd->bridge) { default: /* case BRIDGE_SPCA504B: */ /* case BRIDGE_SPCA504: */ /* case BRIDGE_SPCA536: */ cam->cam_mode = vga_mode; cam->nmodes = ARRAY_SIZE(vga_mode); break; case BRIDGE_SPCA533: cam->cam_mode = custom_mode; if (sd->subtype == MegaImageVI) /* 320x240 only */ cam->nmodes = ARRAY_SIZE(custom_mode) - 1; else cam->nmodes = ARRAY_SIZE(custom_mode); break; case BRIDGE_SPCA504C: cam->cam_mode = vga_mode2; cam->nmodes = ARRAY_SIZE(vga_mode2); break; } return 0; } /* this function is called at probe and resume time */ static int sd_init(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; switch (sd->bridge) { case BRIDGE_SPCA504B: reg_w_riv(gspca_dev, 0x1d, 0x00, 0); reg_w_riv(gspca_dev, 0x00, 0x2306, 0x01); reg_w_riv(gspca_dev, 0x00, 0x0d04, 0x00); reg_w_riv(gspca_dev, 0x00, 0x2000, 0x00); reg_w_riv(gspca_dev, 0x00, 0x2301, 0x13); reg_w_riv(gspca_dev, 0x00, 0x2306, 0x00); fallthrough; case BRIDGE_SPCA533: spca504B_PollingDataReady(gspca_dev); spca50x_GetFirmware(gspca_dev); break; case BRIDGE_SPCA536: spca50x_GetFirmware(gspca_dev); reg_r(gspca_dev, 0x00, 0x5002, 1); reg_w_1(gspca_dev, 0x24, 0, 0, 0); reg_r(gspca_dev, 0x24, 0, 1); spca504B_PollingDataReady(gspca_dev); reg_w_riv(gspca_dev, 0x34, 0, 0); spca504B_WaitCmdStatus(gspca_dev); break; case BRIDGE_SPCA504C: /* pccam600 */ gspca_dbg(gspca_dev, D_STREAM, "Opening SPCA504 (PC-CAM 600)\n"); reg_w_riv(gspca_dev, 0xe0, 0x0000, 0x0000); reg_w_riv(gspca_dev, 0xe0, 0x0000, 0x0001); /* reset */ spca504_wait_status(gspca_dev); if (sd->subtype == LogitechClickSmart420) write_vector(gspca_dev, spca504A_clicksmart420_open_data, ARRAY_SIZE(spca504A_clicksmart420_open_data)); else write_vector(gspca_dev, spca504_pccam600_open_data, ARRAY_SIZE(spca504_pccam600_open_data)); setup_qtable(gspca_dev, qtable_creative_pccam); break; default: /* case BRIDGE_SPCA504: */ gspca_dbg(gspca_dev, D_STREAM, "Opening SPCA504\n"); if (sd->subtype == AiptekMiniPenCam13) { spca504_read_info(gspca_dev); /* Set AE AWB Banding Type 3-> 50Hz 2-> 60Hz */ spca504A_acknowledged_command(gspca_dev, 0x24, 8, 3, 0x9e, 1); /* Twice sequential need status 0xff->0x9e->0x9d */ spca504A_acknowledged_command(gspca_dev, 0x24, 8, 3, 0x9e, 0); spca504A_acknowledged_command(gspca_dev, 0x24, 0, 0, 0x9d, 1); /******************************/ /* spca504a aiptek */ spca504A_acknowledged_command(gspca_dev, 0x08, 6, 0, 0x86, 1); /* reg_write (dev, 0, 0x2000, 0); */ /* reg_write (dev, 0, 0x2883, 1); */ /* spca504A_acknowledged_command (gspca_dev, 0x08, 6, 0, 0x86, 1); */ /* spca504A_acknowledged_command (gspca_dev, 0x24, 0, 0, 0x9D, 1); */ reg_w_riv(gspca_dev, 0x00, 0x270c, 0x05); /* L92 sno1t.txt */ reg_w_riv(gspca_dev, 0x00, 0x2310, 0x05); spca504A_acknowledged_command(gspca_dev, 0x01, 0x0f, 0, 0xff, 0); } /* setup qtable */ reg_w_riv(gspca_dev, 0, 0x2000, 0); reg_w_riv(gspca_dev, 0, 0x2883, 1); setup_qtable(gspca_dev, qtable_spca504_default); break; } return gspca_dev->usb_err; } static int sd_start(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; int enable; /* create the JPEG header */ jpeg_define(sd->jpeg_hdr, gspca_dev->pixfmt.height, gspca_dev->pixfmt.width, 0x22); /* JPEG 411 */ jpeg_set_qual(sd->jpeg_hdr, QUALITY); if (sd->bridge == BRIDGE_SPCA504B) spca504B_setQtable(gspca_dev); spca504B_SetSizeType(gspca_dev); switch (sd->bridge) { default: /* case BRIDGE_SPCA504B: */ /* case BRIDGE_SPCA533: */ /* case BRIDGE_SPCA536: */ switch (sd->subtype) { case MegapixV4: case LogitechClickSmart820: case MegaImageVI: reg_w_riv(gspca_dev, 0xf0, 0, 0); spca504B_WaitCmdStatus(gspca_dev); reg_w_riv(gspca_dev, 0xf0, 4, 0); spca504B_WaitCmdStatus(gspca_dev); break; default: reg_w_riv(gspca_dev, 0x31, 0x0004, 0x00); spca504B_WaitCmdStatus(gspca_dev); spca504B_PollingDataReady(gspca_dev); break; } break; case BRIDGE_SPCA504: if (sd->subtype == AiptekMiniPenCam13) { spca504_read_info(gspca_dev); /* Set AE AWB Banding Type 3-> 50Hz 2-> 60Hz */ spca504A_acknowledged_command(gspca_dev, 0x24, 8, 3, 0x9e, 1); /* Twice sequential need status 0xff->0x9e->0x9d */ spca504A_acknowledged_command(gspca_dev, 0x24, 8, 3, 0x9e, 0); spca504A_acknowledged_command(gspca_dev, 0x24, 0, 0, 0x9d, 1); } else { spca504_acknowledged_command(gspca_dev, 0x24, 8, 3); spca504_read_info(gspca_dev); spca504_acknowledged_command(gspca_dev, 0x24, 8, 3); spca504_acknowledged_command(gspca_dev, 0x24, 0, 0); } spca504B_SetSizeType(gspca_dev); reg_w_riv(gspca_dev, 0x00, 0x270c, 0x05); /* L92 sno1t.txt */ reg_w_riv(gspca_dev, 0x00, 0x2310, 0x05); break; case BRIDGE_SPCA504C: if (sd->subtype == LogitechClickSmart420) { write_vector(gspca_dev, spca504A_clicksmart420_init_data, ARRAY_SIZE(spca504A_clicksmart420_init_data)); } else { write_vector(gspca_dev, spca504_pccam600_init_data, ARRAY_SIZE(spca504_pccam600_init_data)); } enable = (sd->autogain ? 0x04 : 0x01); reg_w_riv(gspca_dev, 0x0c, 0x0000, enable); /* auto exposure */ reg_w_riv(gspca_dev, 0xb0, 0x0000, enable); /* auto whiteness */ /* set default exposure compensation and whiteness balance */ reg_w_riv(gspca_dev, 0x30, 0x0001, 800); /* ~ 20 fps */ reg_w_riv(gspca_dev, 0x30, 0x0002, 1600); spca504B_SetSizeType(gspca_dev); break; } init_ctl_reg(gspca_dev); return gspca_dev->usb_err; } static void sd_stopN(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; switch (sd->bridge) { default: /* case BRIDGE_SPCA533: */ /* case BRIDGE_SPCA536: */ /* case BRIDGE_SPCA504B: */ reg_w_riv(gspca_dev, 0x31, 0, 0); spca504B_WaitCmdStatus(gspca_dev); spca504B_PollingDataReady(gspca_dev); break; case BRIDGE_SPCA504: case BRIDGE_SPCA504C: reg_w_riv(gspca_dev, 0x00, 0x2000, 0x0000); if (sd->subtype == AiptekMiniPenCam13) { /* spca504a aiptek */ /* spca504A_acknowledged_command(gspca_dev, 0x08, 6, 0, 0x86, 1); */ spca504A_acknowledged_command(gspca_dev, 0x24, 0x00, 0x00, 0x9d, 1); spca504A_acknowledged_command(gspca_dev, 0x01, 0x0f, 0x00, 0xff, 1); } else { spca504_acknowledged_command(gspca_dev, 0x24, 0, 0); reg_w_riv(gspca_dev, 0x01, 0x000f, 0x0000); } break; } } static void sd_pkt_scan(struct gspca_dev *gspca_dev, u8 *data, /* isoc packet */ int len) /* iso packet length */ { struct sd *sd = (struct sd *) gspca_dev; int i, sof = 0; static u8 ffd9[] = {0xff, 0xd9}; /* frames are jpeg 4.1.1 without 0xff escape */ switch (sd->bridge) { case BRIDGE_SPCA533: if (data[0] == 0xff) { if (data[1] != 0x01) { /* drop packet */ /* gspca_dev->last_packet_type = DISCARD_PACKET; */ return; } sof = 1; data += SPCA533_OFFSET_DATA; len -= SPCA533_OFFSET_DATA; } else { data += 1; len -= 1; } break; case BRIDGE_SPCA536: if (data[0] == 0xff) { sof = 1; data += SPCA536_OFFSET_DATA; len -= SPCA536_OFFSET_DATA; } else { data += 2; len -= 2; } break; default: /* case BRIDGE_SPCA504: */ /* case BRIDGE_SPCA504B: */ switch (data[0]) { case 0xfe: /* start of frame */ sof = 1; data += SPCA50X_OFFSET_DATA; len -= SPCA50X_OFFSET_DATA; break; case 0xff: /* drop packet */ /* gspca_dev->last_packet_type = DISCARD_PACKET; */ return; default: data += 1; len -= 1; break; } break; case BRIDGE_SPCA504C: switch (data[0]) { case 0xfe: /* start of frame */ sof = 1; data += SPCA504_PCCAM600_OFFSET_DATA; len -= SPCA504_PCCAM600_OFFSET_DATA; break; case 0xff: /* drop packet */ /* gspca_dev->last_packet_type = DISCARD_PACKET; */ return; default: data += 1; len -= 1; break; } break; } if (sof) { /* start of frame */ gspca_frame_add(gspca_dev, LAST_PACKET, ffd9, 2); /* put the JPEG header in the new frame */ gspca_frame_add(gspca_dev, FIRST_PACKET, sd->jpeg_hdr, JPEG_HDR_SZ); } /* add 0x00 after 0xff */ i = 0; do { if (data[i] == 0xff) { gspca_frame_add(gspca_dev, INTER_PACKET, data, i + 1); len -= i; data += i; *data = 0x00; i = 0; } i++; } while (i < len); gspca_frame_add(gspca_dev, INTER_PACKET, data, len); } static int sd_s_ctrl(struct v4l2_ctrl *ctrl) { struct gspca_dev *gspca_dev = container_of(ctrl->handler, struct gspca_dev, ctrl_handler); struct sd *sd = (struct sd *)gspca_dev; gspca_dev->usb_err = 0; if (!gspca_dev->streaming) return 0; switch (ctrl->id) { case V4L2_CID_BRIGHTNESS: setbrightness(gspca_dev, ctrl->val); break; case V4L2_CID_CONTRAST: setcontrast(gspca_dev, ctrl->val); break; case V4L2_CID_SATURATION: setcolors(gspca_dev, ctrl->val); break; case V4L2_CID_AUTOGAIN: sd->autogain = ctrl->val; break; } return gspca_dev->usb_err; } static const struct v4l2_ctrl_ops sd_ctrl_ops = { .s_ctrl = sd_s_ctrl, }; static int sd_init_controls(struct gspca_dev *gspca_dev) { struct v4l2_ctrl_handler *hdl = &gspca_dev->ctrl_handler; gspca_dev->vdev.ctrl_handler = hdl; v4l2_ctrl_handler_init(hdl, 4); v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_BRIGHTNESS, -128, 127, 1, 0); v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_CONTRAST, 0, 255, 1, 0x20); v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_SATURATION, 0, 255, 1, 0x1a); v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_AUTOGAIN, 0, 1, 1, 1); if (hdl->error) { pr_err("Could not initialize controls\n"); return hdl->error; } return 0; } /* sub-driver description */ static const struct sd_desc sd_desc = { .name = MODULE_NAME, .config = sd_config, .init = sd_init, .init_controls = sd_init_controls, .start = sd_start, .stopN = sd_stopN, .pkt_scan = sd_pkt_scan, }; /* -- module initialisation -- */ #define BS(bridge, subtype) \ .driver_info = (BRIDGE_ ## bridge << 8) \ | (subtype) static const struct usb_device_id device_table[] = { {USB_DEVICE(0x041e, 0x400b), BS(SPCA504C, 0)}, {USB_DEVICE(0x041e, 0x4012), BS(SPCA504C, 0)}, {USB_DEVICE(0x041e, 0x4013), BS(SPCA504C, 0)}, {USB_DEVICE(0x0458, 0x7006), BS(SPCA504B, 0)}, {USB_DEVICE(0x0461, 0x0821), BS(SPCA533, 0)}, {USB_DEVICE(0x046d, 0x0905), BS(SPCA533, LogitechClickSmart820)}, {USB_DEVICE(0x046d, 0x0960), BS(SPCA504C, LogitechClickSmart420)}, {USB_DEVICE(0x0471, 0x0322), BS(SPCA504B, 0)}, {USB_DEVICE(0x04a5, 0x3003), BS(SPCA504B, 0)}, {USB_DEVICE(0x04a5, 0x3008), BS(SPCA533, 0)}, {USB_DEVICE(0x04a5, 0x300a), BS(SPCA533, 0)}, {USB_DEVICE(0x04f1, 0x1001), BS(SPCA504B, 0)}, {USB_DEVICE(0x04fc, 0x500c), BS(SPCA504B, 0)}, {USB_DEVICE(0x04fc, 0x504a), BS(SPCA504, AiptekMiniPenCam13)}, {USB_DEVICE(0x04fc, 0x504b), BS(SPCA504B, 0)}, {USB_DEVICE(0x04fc, 0x5330), BS(SPCA533, 0)}, {USB_DEVICE(0x04fc, 0x5360), BS(SPCA536, 0)}, {USB_DEVICE(0x04fc, 0xffff), BS(SPCA504B, 0)}, {USB_DEVICE(0x052b, 0x1507), BS(SPCA533, MegapixV4)}, {USB_DEVICE(0x052b, 0x1513), BS(SPCA533, MegapixV4)}, {USB_DEVICE(0x052b, 0x1803), BS(SPCA533, MegaImageVI)}, {USB_DEVICE(0x0546, 0x3155), BS(SPCA533, 0)}, {USB_DEVICE(0x0546, 0x3191), BS(SPCA504B, 0)}, {USB_DEVICE(0x0546, 0x3273), BS(SPCA504B, 0)}, {USB_DEVICE(0x055f, 0xc211), BS(SPCA536, 0)}, {USB_DEVICE(0x055f, 0xc230), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc232), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc360), BS(SPCA536, 0)}, {USB_DEVICE(0x055f, 0xc420), BS(SPCA504, 0)}, {USB_DEVICE(0x055f, 0xc430), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc440), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc520), BS(SPCA504, 0)}, {USB_DEVICE(0x055f, 0xc530), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc540), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc630), BS(SPCA533, 0)}, {USB_DEVICE(0x055f, 0xc650), BS(SPCA533, 0)}, {USB_DEVICE(0x05da, 0x1018), BS(SPCA504B, 0)}, {USB_DEVICE(0x06d6, 0x0031), BS(SPCA533, 0)}, {USB_DEVICE(0x06d6, 0x0041), BS(SPCA504B, 0)}, {USB_DEVICE(0x0733, 0x1311), BS(SPCA533, 0)}, {USB_DEVICE(0x0733, 0x1314), BS(SPCA533, 0)}, {USB_DEVICE(0x0733, 0x2211), BS(SPCA533, 0)}, {USB_DEVICE(0x0733, 0x2221), BS(SPCA533, 0)}, {USB_DEVICE(0x0733, 0x3261), BS(SPCA536, 0)}, {USB_DEVICE(0x0733, 0x3281), BS(SPCA536, 0)}, {USB_DEVICE(0x08ca, 0x0104), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x0106), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x2008), BS(SPCA504B, 0)}, {USB_DEVICE(0x08ca, 0x2010), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x2016), BS(SPCA504B, 0)}, {USB_DEVICE(0x08ca, 0x2018), BS(SPCA504B, 0)}, {USB_DEVICE(0x08ca, 0x2020), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x2022), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x2024), BS(SPCA536, 0)}, {USB_DEVICE(0x08ca, 0x2028), BS(SPCA533, 0)}, {USB_DEVICE(0x08ca, 0x2040), BS(SPCA536, 0)}, {USB_DEVICE(0x08ca, 0x2042), BS(SPCA536, 0)}, {USB_DEVICE(0x08ca, 0x2050), BS(SPCA536, 0)}, {USB_DEVICE(0x08ca, 0x2060), BS(SPCA536, 0)}, {USB_DEVICE(0x0d64, 0x0303), BS(SPCA536, 0)}, {} }; MODULE_DEVICE_TABLE(usb, device_table); /* -- device connect -- */ static int sd_probe(struct usb_interface *intf, const struct usb_device_id *id) { return gspca_dev_probe(intf, id, &sd_desc, sizeof(struct sd), THIS_MODULE); } static struct usb_driver sd_driver = { .name = MODULE_NAME, .id_table = device_table, .probe = sd_probe, .disconnect = gspca_disconnect, #ifdef CONFIG_PM .suspend = gspca_suspend, .resume = gspca_resume, .reset_resume = gspca_resume, #endif }; module_usb_driver(sd_driver); |
| 19 111 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_MBCACHE_H #define _LINUX_MBCACHE_H #include <linux/hash.h> #include <linux/list_bl.h> #include <linux/list.h> #include <linux/atomic.h> #include <linux/fs.h> struct mb_cache; /* Cache entry flags */ enum { MBE_REFERENCED_B = 0, MBE_REUSABLE_B }; struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; /* * Hash table list - protected by hash chain bitlock. The entry is * guaranteed to be hashed while e_refcnt > 0. */ struct hlist_bl_node e_hash_list; /* * Entry refcount. Once it reaches zero, entry is unhashed and freed. * While refcount > 0, the entry is guaranteed to stay in the hash and * e.g. mb_cache_entry_try_delete() will fail. */ atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; unsigned long e_flags; /* User provided value - stable during lifetime of the entry */ u64 e_value; }; struct mb_cache *mb_cache_create(int bucket_bits); void mb_cache_destroy(struct mb_cache *cache); int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, u64 value, bool reusable); void __mb_cache_entry_free(struct mb_cache *cache, struct mb_cache_entry *entry); void mb_cache_entry_wait_unused(struct mb_cache_entry *entry); static inline void mb_cache_entry_put(struct mb_cache *cache, struct mb_cache_entry *entry) { unsigned int cnt = atomic_dec_return(&entry->e_refcnt); if (cnt > 0) { if (cnt <= 2) wake_up_var(&entry->e_refcnt); return; } __mb_cache_entry_free(cache, entry); } struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, u32 key, u64 value); struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, u64 value); struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, u32 key); struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache *cache, struct mb_cache_entry *entry); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry); #endif /* _LINUX_MBCACHE_H */ |
| 17 18 18 17 17 1 2 3 2 3 17 17 17 14 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Cryptographic API. * * SHA-3, as specified in * https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf * * SHA-3 code by Jeff Garzik <jeff@garzik.org> * Ard Biesheuvel <ard.biesheuvel@linaro.org> */ #include <crypto/internal/hash.h> #include <linux/init.h> #include <linux/module.h> #include <linux/types.h> #include <crypto/sha3.h> #include <asm/unaligned.h> /* * On some 32-bit architectures (h8300), GCC ends up using * over 1 KB of stack if we inline the round calculation into the loop * in keccakf(). On the other hand, on 64-bit architectures with plenty * of [64-bit wide] general purpose registers, not inlining it severely * hurts performance. So let's use 64-bitness as a heuristic to decide * whether to inline or not. */ #ifdef CONFIG_64BIT #define SHA3_INLINE inline #else #define SHA3_INLINE noinline #endif #define KECCAK_ROUNDS 24 static const u64 keccakf_rndc[24] = { 0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL, 0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL, 0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL, 0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL, 0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL, 0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL, 0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL, 0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL }; /* update the state with given number of rounds */ static SHA3_INLINE void keccakf_round(u64 st[25]) { u64 t[5], tt, bc[5]; /* Theta */ bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20]; bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21]; bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22]; bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23]; bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24]; t[0] = bc[4] ^ rol64(bc[1], 1); t[1] = bc[0] ^ rol64(bc[2], 1); t[2] = bc[1] ^ rol64(bc[3], 1); t[3] = bc[2] ^ rol64(bc[4], 1); t[4] = bc[3] ^ rol64(bc[0], 1); st[0] ^= t[0]; /* Rho Pi */ tt = st[1]; st[ 1] = rol64(st[ 6] ^ t[1], 44); st[ 6] = rol64(st[ 9] ^ t[4], 20); st[ 9] = rol64(st[22] ^ t[2], 61); st[22] = rol64(st[14] ^ t[4], 39); st[14] = rol64(st[20] ^ t[0], 18); st[20] = rol64(st[ 2] ^ t[2], 62); st[ 2] = rol64(st[12] ^ t[2], 43); st[12] = rol64(st[13] ^ t[3], 25); st[13] = rol64(st[19] ^ t[4], 8); st[19] = rol64(st[23] ^ t[3], 56); st[23] = rol64(st[15] ^ t[0], 41); st[15] = rol64(st[ 4] ^ t[4], 27); st[ 4] = rol64(st[24] ^ t[4], 14); st[24] = rol64(st[21] ^ t[1], 2); st[21] = rol64(st[ 8] ^ t[3], 55); st[ 8] = rol64(st[16] ^ t[1], 45); st[16] = rol64(st[ 5] ^ t[0], 36); st[ 5] = rol64(st[ 3] ^ t[3], 28); st[ 3] = rol64(st[18] ^ t[3], 21); st[18] = rol64(st[17] ^ t[2], 15); st[17] = rol64(st[11] ^ t[1], 10); st[11] = rol64(st[ 7] ^ t[2], 6); st[ 7] = rol64(st[10] ^ t[0], 3); st[10] = rol64( tt ^ t[1], 1); /* Chi */ bc[ 0] = ~st[ 1] & st[ 2]; bc[ 1] = ~st[ 2] & st[ 3]; bc[ 2] = ~st[ 3] & st[ 4]; bc[ 3] = ~st[ 4] & st[ 0]; bc[ 4] = ~st[ 0] & st[ 1]; st[ 0] ^= bc[ 0]; st[ 1] ^= bc[ 1]; st[ 2] ^= bc[ 2]; st[ 3] ^= bc[ 3]; st[ 4] ^= bc[ 4]; bc[ 0] = ~st[ 6] & st[ 7]; bc[ 1] = ~st[ 7] & st[ 8]; bc[ 2] = ~st[ 8] & st[ 9]; bc[ 3] = ~st[ 9] & st[ 5]; bc[ 4] = ~st[ 5] & st[ 6]; st[ 5] ^= bc[ 0]; st[ 6] ^= bc[ 1]; st[ 7] ^= bc[ 2]; st[ 8] ^= bc[ 3]; st[ 9] ^= bc[ 4]; bc[ 0] = ~st[11] & st[12]; bc[ 1] = ~st[12] & st[13]; bc[ 2] = ~st[13] & st[14]; bc[ 3] = ~st[14] & st[10]; bc[ 4] = ~st[10] & st[11]; st[10] ^= bc[ 0]; st[11] ^= bc[ 1]; st[12] ^= bc[ 2]; st[13] ^= bc[ 3]; st[14] ^= bc[ 4]; bc[ 0] = ~st[16] & st[17]; bc[ 1] = ~st[17] & st[18]; bc[ 2] = ~st[18] & st[19]; bc[ 3] = ~st[19] & st[15]; bc[ 4] = ~st[15] & st[16]; st[15] ^= bc[ 0]; st[16] ^= bc[ 1]; st[17] ^= bc[ 2]; st[18] ^= bc[ 3]; st[19] ^= bc[ 4]; bc[ 0] = ~st[21] & st[22]; bc[ 1] = ~st[22] & st[23]; bc[ 2] = ~st[23] & st[24]; bc[ 3] = ~st[24] & st[20]; bc[ 4] = ~st[20] & st[21]; st[20] ^= bc[ 0]; st[21] ^= bc[ 1]; st[22] ^= bc[ 2]; st[23] ^= bc[ 3]; st[24] ^= bc[ 4]; } static void keccakf(u64 st[25]) { int round; for (round = 0; round < KECCAK_ROUNDS; round++) { keccakf_round(st); /* Iota */ st[0] ^= keccakf_rndc[round]; } } int crypto_sha3_init(struct shash_desc *desc) { struct sha3_state *sctx = shash_desc_ctx(desc); unsigned int digest_size = crypto_shash_digestsize(desc->tfm); sctx->rsiz = 200 - 2 * digest_size; sctx->rsizw = sctx->rsiz / 8; sctx->partial = 0; memset(sctx->st, 0, sizeof(sctx->st)); return 0; } EXPORT_SYMBOL(crypto_sha3_init); int crypto_sha3_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct sha3_state *sctx = shash_desc_ctx(desc); unsigned int done; const u8 *src; done = 0; src = data; if ((sctx->partial + len) > (sctx->rsiz - 1)) { if (sctx->partial) { done = -sctx->partial; memcpy(sctx->buf + sctx->partial, data, done + sctx->rsiz); src = sctx->buf; } do { unsigned int i; for (i = 0; i < sctx->rsizw; i++) sctx->st[i] ^= get_unaligned_le64(src + 8 * i); keccakf(sctx->st); done += sctx->rsiz; src = data + done; } while (done + (sctx->rsiz - 1) < len); sctx->partial = 0; } memcpy(sctx->buf + sctx->partial, src, len - done); sctx->partial += (len - done); return 0; } EXPORT_SYMBOL(crypto_sha3_update); int crypto_sha3_final(struct shash_desc *desc, u8 *out) { struct sha3_state *sctx = shash_desc_ctx(desc); unsigned int i, inlen = sctx->partial; unsigned int digest_size = crypto_shash_digestsize(desc->tfm); __le64 *digest = (__le64 *)out; sctx->buf[inlen++] = 0x06; memset(sctx->buf + inlen, 0, sctx->rsiz - inlen); sctx->buf[sctx->rsiz - 1] |= 0x80; for (i = 0; i < sctx->rsizw; i++) sctx->st[i] ^= get_unaligned_le64(sctx->buf + 8 * i); keccakf(sctx->st); for (i = 0; i < digest_size / 8; i++) put_unaligned_le64(sctx->st[i], digest++); if (digest_size & 4) put_unaligned_le32(sctx->st[i], (__le32 *)digest); memset(sctx, 0, sizeof(*sctx)); return 0; } EXPORT_SYMBOL(crypto_sha3_final); static struct shash_alg algs[] = { { .digestsize = SHA3_224_DIGEST_SIZE, .init = crypto_sha3_init, .update = crypto_sha3_update, .final = crypto_sha3_final, .descsize = sizeof(struct sha3_state), .base.cra_name = "sha3-224", .base.cra_driver_name = "sha3-224-generic", .base.cra_blocksize = SHA3_224_BLOCK_SIZE, .base.cra_module = THIS_MODULE, }, { .digestsize = SHA3_256_DIGEST_SIZE, .init = crypto_sha3_init, .update = crypto_sha3_update, .final = crypto_sha3_final, .descsize = sizeof(struct sha3_state), .base.cra_name = "sha3-256", .base.cra_driver_name = "sha3-256-generic", .base.cra_blocksize = SHA3_256_BLOCK_SIZE, .base.cra_module = THIS_MODULE, }, { .digestsize = SHA3_384_DIGEST_SIZE, .init = crypto_sha3_init, .update = crypto_sha3_update, .final = crypto_sha3_final, .descsize = sizeof(struct sha3_state), .base.cra_name = "sha3-384", .base.cra_driver_name = "sha3-384-generic", .base.cra_blocksize = SHA3_384_BLOCK_SIZE, .base.cra_module = THIS_MODULE, }, { .digestsize = SHA3_512_DIGEST_SIZE, .init = crypto_sha3_init, .update = crypto_sha3_update, .final = crypto_sha3_final, .descsize = sizeof(struct sha3_state), .base.cra_name = "sha3-512", .base.cra_driver_name = "sha3-512-generic", .base.cra_blocksize = SHA3_512_BLOCK_SIZE, .base.cra_module = THIS_MODULE, } }; static int __init sha3_generic_mod_init(void) { return crypto_register_shashes(algs, ARRAY_SIZE(algs)); } static void __exit sha3_generic_mod_fini(void) { crypto_unregister_shashes(algs, ARRAY_SIZE(algs)); } subsys_initcall(sha3_generic_mod_init); module_exit(sha3_generic_mod_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("SHA-3 Secure Hash Algorithm"); MODULE_ALIAS_CRYPTO("sha3-224"); MODULE_ALIAS_CRYPTO("sha3-224-generic"); MODULE_ALIAS_CRYPTO("sha3-256"); MODULE_ALIAS_CRYPTO("sha3-256-generic"); MODULE_ALIAS_CRYPTO("sha3-384"); MODULE_ALIAS_CRYPTO("sha3-384-generic"); MODULE_ALIAS_CRYPTO("sha3-512"); MODULE_ALIAS_CRYPTO("sha3-512-generic"); |
| 2 2 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 | /* BNEP implementation for Linux Bluetooth stack (BlueZ). Copyright (C) 2001-2002 Inventel Systemes Written 2001-2002 by Clément Moreau <clement.moreau@inventel.fr> David Libault <david.libault@inventel.fr> Copyright (C) 2002 Maxim Krasnyansky <maxk@qualcomm.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) AND AUTHOR(S) BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS, RELATING TO USE OF THIS SOFTWARE IS DISCLAIMED. */ #include <linux/module.h> #include <linux/kthread.h> #include <linux/file.h> #include <linux/etherdevice.h> #include <asm/unaligned.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/l2cap.h> #include <net/bluetooth/hci_core.h> #include "bnep.h" #define VERSION "1.3" static bool compress_src = true; static bool compress_dst = true; static LIST_HEAD(bnep_session_list); static DECLARE_RWSEM(bnep_session_sem); static struct bnep_session *__bnep_get_session(u8 *dst) { struct bnep_session *s; BT_DBG(""); list_for_each_entry(s, &bnep_session_list, list) if (ether_addr_equal(dst, s->eh.h_source)) return s; return NULL; } static void __bnep_link_session(struct bnep_session *s) { list_add(&s->list, &bnep_session_list); } static void __bnep_unlink_session(struct bnep_session *s) { list_del(&s->list); } static int bnep_send(struct bnep_session *s, void *data, size_t len) { struct socket *sock = s->sock; struct kvec iv = { data, len }; return kernel_sendmsg(sock, &s->msg, &iv, 1, len); } static int bnep_send_rsp(struct bnep_session *s, u8 ctrl, u16 resp) { struct bnep_control_rsp rsp; rsp.type = BNEP_CONTROL; rsp.ctrl = ctrl; rsp.resp = htons(resp); return bnep_send(s, &rsp, sizeof(rsp)); } #ifdef CONFIG_BT_BNEP_PROTO_FILTER static inline void bnep_set_default_proto_filter(struct bnep_session *s) { /* (IPv4, ARP) */ s->proto_filter[0].start = ETH_P_IP; s->proto_filter[0].end = ETH_P_ARP; /* (RARP, AppleTalk) */ s->proto_filter[1].start = ETH_P_RARP; s->proto_filter[1].end = ETH_P_AARP; /* (IPX, IPv6) */ s->proto_filter[2].start = ETH_P_IPX; s->proto_filter[2].end = ETH_P_IPV6; } #endif static int bnep_ctrl_set_netfilter(struct bnep_session *s, __be16 *data, int len) { int n; if (len < 2) return -EILSEQ; n = get_unaligned_be16(data); data++; len -= 2; if (len < n) return -EILSEQ; BT_DBG("filter len %d", n); #ifdef CONFIG_BT_BNEP_PROTO_FILTER n /= 4; if (n <= BNEP_MAX_PROTO_FILTERS) { struct bnep_proto_filter *f = s->proto_filter; int i; for (i = 0; i < n; i++) { f[i].start = get_unaligned_be16(data++); f[i].end = get_unaligned_be16(data++); BT_DBG("proto filter start %u end %u", f[i].start, f[i].end); } if (i < BNEP_MAX_PROTO_FILTERS) memset(f + i, 0, sizeof(*f)); if (n == 0) bnep_set_default_proto_filter(s); bnep_send_rsp(s, BNEP_FILTER_NET_TYPE_RSP, BNEP_SUCCESS); } else { bnep_send_rsp(s, BNEP_FILTER_NET_TYPE_RSP, BNEP_FILTER_LIMIT_REACHED); } #else bnep_send_rsp(s, BNEP_FILTER_NET_TYPE_RSP, BNEP_FILTER_UNSUPPORTED_REQ); #endif return 0; } static int bnep_ctrl_set_mcfilter(struct bnep_session *s, u8 *data, int len) { int n; if (len < 2) return -EILSEQ; n = get_unaligned_be16(data); data += 2; len -= 2; if (len < n) return -EILSEQ; BT_DBG("filter len %d", n); #ifdef CONFIG_BT_BNEP_MC_FILTER n /= (ETH_ALEN * 2); if (n > 0) { int i; s->mc_filter = 0; /* Always send broadcast */ set_bit(bnep_mc_hash(s->dev->broadcast), (ulong *) &s->mc_filter); /* Add address ranges to the multicast hash */ for (; n > 0; n--) { u8 a1[6], *a2; memcpy(a1, data, ETH_ALEN); data += ETH_ALEN; a2 = data; data += ETH_ALEN; BT_DBG("mc filter %pMR -> %pMR", a1, a2); /* Iterate from a1 to a2 */ set_bit(bnep_mc_hash(a1), (ulong *) &s->mc_filter); while (memcmp(a1, a2, 6) < 0 && s->mc_filter != ~0LL) { /* Increment a1 */ i = 5; while (i >= 0 && ++a1[i--] == 0) ; set_bit(bnep_mc_hash(a1), (ulong *) &s->mc_filter); } } } BT_DBG("mc filter hash 0x%llx", s->mc_filter); bnep_send_rsp(s, BNEP_FILTER_MULTI_ADDR_RSP, BNEP_SUCCESS); #else bnep_send_rsp(s, BNEP_FILTER_MULTI_ADDR_RSP, BNEP_FILTER_UNSUPPORTED_REQ); #endif return 0; } static int bnep_rx_control(struct bnep_session *s, void *data, int len) { u8 cmd = *(u8 *)data; int err = 0; data++; len--; switch (cmd) { case BNEP_CMD_NOT_UNDERSTOOD: case BNEP_SETUP_CONN_RSP: case BNEP_FILTER_NET_TYPE_RSP: case BNEP_FILTER_MULTI_ADDR_RSP: /* Ignore these for now */ break; case BNEP_FILTER_NET_TYPE_SET: err = bnep_ctrl_set_netfilter(s, data, len); break; case BNEP_FILTER_MULTI_ADDR_SET: err = bnep_ctrl_set_mcfilter(s, data, len); break; case BNEP_SETUP_CONN_REQ: /* Successful response should be sent only once */ if (test_bit(BNEP_SETUP_RESPONSE, &s->flags) && !test_and_set_bit(BNEP_SETUP_RSP_SENT, &s->flags)) err = bnep_send_rsp(s, BNEP_SETUP_CONN_RSP, BNEP_SUCCESS); else err = bnep_send_rsp(s, BNEP_SETUP_CONN_RSP, BNEP_CONN_NOT_ALLOWED); break; default: { u8 pkt[3]; pkt[0] = BNEP_CONTROL; pkt[1] = BNEP_CMD_NOT_UNDERSTOOD; pkt[2] = cmd; err = bnep_send(s, pkt, sizeof(pkt)); } break; } return err; } static int bnep_rx_extension(struct bnep_session *s, struct sk_buff *skb) { struct bnep_ext_hdr *h; int err = 0; do { h = (void *) skb->data; if (!skb_pull(skb, sizeof(*h))) { err = -EILSEQ; break; } BT_DBG("type 0x%x len %u", h->type, h->len); switch (h->type & BNEP_TYPE_MASK) { case BNEP_EXT_CONTROL: bnep_rx_control(s, skb->data, skb->len); break; default: /* Unknown extension, skip it. */ break; } if (!skb_pull(skb, h->len)) { err = -EILSEQ; break; } } while (!err && (h->type & BNEP_EXT_HEADER)); return err; } static u8 __bnep_rx_hlen[] = { ETH_HLEN, /* BNEP_GENERAL */ 0, /* BNEP_CONTROL */ 2, /* BNEP_COMPRESSED */ ETH_ALEN + 2, /* BNEP_COMPRESSED_SRC_ONLY */ ETH_ALEN + 2 /* BNEP_COMPRESSED_DST_ONLY */ }; static int bnep_rx_frame(struct bnep_session *s, struct sk_buff *skb) { struct net_device *dev = s->dev; struct sk_buff *nskb; u8 type, ctrl_type; dev->stats.rx_bytes += skb->len; type = *(u8 *) skb->data; skb_pull(skb, 1); ctrl_type = *(u8 *)skb->data; if ((type & BNEP_TYPE_MASK) >= sizeof(__bnep_rx_hlen)) goto badframe; if ((type & BNEP_TYPE_MASK) == BNEP_CONTROL) { if (bnep_rx_control(s, skb->data, skb->len) < 0) { dev->stats.tx_errors++; kfree_skb(skb); return 0; } if (!(type & BNEP_EXT_HEADER)) { kfree_skb(skb); return 0; } /* Verify and pull ctrl message since it's already processed */ switch (ctrl_type) { case BNEP_SETUP_CONN_REQ: /* Pull: ctrl type (1 b), len (1 b), data (len bytes) */ if (!skb_pull(skb, 2 + *(u8 *)(skb->data + 1) * 2)) goto badframe; break; case BNEP_FILTER_MULTI_ADDR_SET: case BNEP_FILTER_NET_TYPE_SET: /* Pull: ctrl type (1 b), len (2 b), data (len bytes) */ if (!skb_pull(skb, 3 + *(u16 *)(skb->data + 1) * 2)) goto badframe; break; default: kfree_skb(skb); return 0; } } else { skb_reset_mac_header(skb); /* Verify and pull out header */ if (!skb_pull(skb, __bnep_rx_hlen[type & BNEP_TYPE_MASK])) goto badframe; s->eh.h_proto = get_unaligned((__be16 *) (skb->data - 2)); } if (type & BNEP_EXT_HEADER) { if (bnep_rx_extension(s, skb) < 0) goto badframe; } /* Strip 802.1p header */ if (ntohs(s->eh.h_proto) == ETH_P_8021Q) { if (!skb_pull(skb, 4)) goto badframe; s->eh.h_proto = get_unaligned((__be16 *) (skb->data - 2)); } /* We have to alloc new skb and copy data here :(. Because original skb * may not be modified and because of the alignment requirements. */ nskb = alloc_skb(2 + ETH_HLEN + skb->len, GFP_KERNEL); if (!nskb) { dev->stats.rx_dropped++; kfree_skb(skb); return -ENOMEM; } skb_reserve(nskb, 2); /* Decompress header and construct ether frame */ switch (type & BNEP_TYPE_MASK) { case BNEP_COMPRESSED: __skb_put_data(nskb, &s->eh, ETH_HLEN); break; case BNEP_COMPRESSED_SRC_ONLY: __skb_put_data(nskb, s->eh.h_dest, ETH_ALEN); __skb_put_data(nskb, skb_mac_header(skb), ETH_ALEN); put_unaligned(s->eh.h_proto, (__be16 *) __skb_put(nskb, 2)); break; case BNEP_COMPRESSED_DST_ONLY: __skb_put_data(nskb, skb_mac_header(skb), ETH_ALEN); __skb_put_data(nskb, s->eh.h_source, ETH_ALEN); put_unaligned(s->eh.h_proto, (__be16 *)__skb_put(nskb, 2)); break; case BNEP_GENERAL: __skb_put_data(nskb, skb_mac_header(skb), ETH_ALEN * 2); put_unaligned(s->eh.h_proto, (__be16 *) __skb_put(nskb, 2)); break; } skb_copy_from_linear_data(skb, __skb_put(nskb, skb->len), skb->len); kfree_skb(skb); dev->stats.rx_packets++; nskb->ip_summed = CHECKSUM_NONE; nskb->protocol = eth_type_trans(nskb, dev); netif_rx(nskb); return 0; badframe: dev->stats.rx_errors++; kfree_skb(skb); return 0; } static u8 __bnep_tx_types[] = { BNEP_GENERAL, BNEP_COMPRESSED_SRC_ONLY, BNEP_COMPRESSED_DST_ONLY, BNEP_COMPRESSED }; static int bnep_tx_frame(struct bnep_session *s, struct sk_buff *skb) { struct ethhdr *eh = (void *) skb->data; struct socket *sock = s->sock; struct kvec iv[3]; int len = 0, il = 0; u8 type = 0; BT_DBG("skb %p dev %p type %u", skb, skb->dev, skb->pkt_type); if (!skb->dev) { /* Control frame sent by us */ goto send; } iv[il++] = (struct kvec) { &type, 1 }; len++; if (compress_src && ether_addr_equal(eh->h_dest, s->eh.h_source)) type |= 0x01; if (compress_dst && ether_addr_equal(eh->h_source, s->eh.h_dest)) type |= 0x02; if (type) skb_pull(skb, ETH_ALEN * 2); type = __bnep_tx_types[type]; switch (type) { case BNEP_COMPRESSED_SRC_ONLY: iv[il++] = (struct kvec) { eh->h_source, ETH_ALEN }; len += ETH_ALEN; break; case BNEP_COMPRESSED_DST_ONLY: iv[il++] = (struct kvec) { eh->h_dest, ETH_ALEN }; len += ETH_ALEN; break; } send: iv[il++] = (struct kvec) { skb->data, skb->len }; len += skb->len; /* FIXME: linearize skb */ { len = kernel_sendmsg(sock, &s->msg, iv, il, len); } kfree_skb(skb); if (len > 0) { s->dev->stats.tx_bytes += len; s->dev->stats.tx_packets++; return 0; } return len; } static int bn |