| 29 29 28 29 6 6 6 6 4 7 4 4 4 7 2 2 2 1 180 183 181 183 181 6 3 3 3 3 3 183 181 183 1 1 1 1 1 1 1 1 1 1 1 52 52 51 52 52 1 1 52 3 3 3 1 3 3 3 1 11 11 11 10 10 10 10 1 10 10 10 6 6 3 6 1 1 10 10 1 10 1 2 2 7 2 5 7 5 5 2 5 4 7 37 37 36 35 28 29 2 2 28 7 7 35 18 7 11 32 26 26 26 26 7 6 1 1 32 32 17 12 17 6 16 1 15 32 21 31 20 14 7 30 5 28 28 27 11 15 2 16 14 2 14 13 16 23 7 29 25 3 1 1 1 1 8 8 7 7 7 6 4 6 8 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 8 16 8 16 2 2 4 1 1 1 3 2 128 128 4 6 5 4 22 23 6 16 2 1 2 16 1 5 5 5 5 5 5 4 4 4 4 2 2 5 5 5 5 4 5 5 1 1 1 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 | // SPDX-License-Identifier: GPL-2.0-or-later /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * RAW - implementation of IP "raw" sockets. * * Authors: Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> * * Fixes: * Alan Cox : verify_area() fixed up * Alan Cox : ICMP error handling * Alan Cox : EMSGSIZE if you send too big a packet * Alan Cox : Now uses generic datagrams and shared * skbuff library. No more peek crashes, * no more backlogs * Alan Cox : Checks sk->broadcast. * Alan Cox : Uses skb_free_datagram/skb_copy_datagram * Alan Cox : Raw passes ip options too * Alan Cox : Setsocketopt added * Alan Cox : Fixed error return for broadcasts * Alan Cox : Removed wake_up calls * Alan Cox : Use ttl/tos * Alan Cox : Cleaned up old debugging * Alan Cox : Use new kernel side addresses * Arnt Gulbrandsen : Fixed MSG_DONTROUTE in raw sockets. * Alan Cox : BSD style RAW socket demultiplexing. * Alan Cox : Beginnings of mrouted support. * Alan Cox : Added IP_HDRINCL option. * Alan Cox : Skip broadcast check if BSDism set. * David S. Miller : New socket lookup architecture. */ #include <linux/types.h> #include <linux/atomic.h> #include <asm/byteorder.h> #include <asm/current.h> #include <linux/uaccess.h> #include <asm/ioctls.h> #include <linux/stddef.h> #include <linux/slab.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/export.h> #include <linux/spinlock.h> #include <linux/sockios.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/mroute.h> #include <linux/netdevice.h> #include <linux/in_route.h> #include <linux/route.h> #include <linux/skbuff.h> #include <linux/igmp.h> #include <net/net_namespace.h> #include <net/dst.h> #include <net/sock.h> #include <linux/ip.h> #include <linux/net.h> #include <net/ip.h> #include <net/icmp.h> #include <net/udp.h> #include <net/raw.h> #include <net/snmp.h> #include <net/tcp_states.h> #include <net/inet_common.h> #include <net/checksum.h> #include <net/xfrm.h> #include <linux/rtnetlink.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <linux/compat.h> #include <linux/uio.h> struct raw_frag_vec { struct msghdr *msg; union { struct icmphdr icmph; char c[1]; } hdr; int hlen; }; struct raw_hashinfo raw_v4_hashinfo; EXPORT_SYMBOL_GPL(raw_v4_hashinfo); int raw_hash_sk(struct sock *sk) { struct raw_hashinfo *h = sk->sk_prot->h.raw_hash; struct hlist_head *hlist; hlist = &h->ht[raw_hashfunc(sock_net(sk), inet_sk(sk)->inet_num)]; spin_lock(&h->lock); sk_add_node_rcu(sk, hlist); sock_set_flag(sk, SOCK_RCU_FREE); spin_unlock(&h->lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); return 0; } EXPORT_SYMBOL_GPL(raw_hash_sk); void raw_unhash_sk(struct sock *sk) { struct raw_hashinfo *h = sk->sk_prot->h.raw_hash; spin_lock(&h->lock); if (sk_del_node_init_rcu(sk)) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); spin_unlock(&h->lock); } EXPORT_SYMBOL_GPL(raw_unhash_sk); bool raw_v4_match(struct net *net, const struct sock *sk, unsigned short num, __be32 raddr, __be32 laddr, int dif, int sdif) { const struct inet_sock *inet = inet_sk(sk); if (net_eq(sock_net(sk), net) && inet->inet_num == num && !(inet->inet_daddr && inet->inet_daddr != raddr) && !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) && raw_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif)) return true; return false; } EXPORT_SYMBOL_GPL(raw_v4_match); /* * 0 - deliver * 1 - block */ static int icmp_filter(const struct sock *sk, const struct sk_buff *skb) { struct icmphdr _hdr; const struct icmphdr *hdr; hdr = skb_header_pointer(skb, skb_transport_offset(skb), sizeof(_hdr), &_hdr); if (!hdr) return 1; if (hdr->type < 32) { __u32 data = raw_sk(sk)->filter.data; return ((1U << hdr->type) & data) != 0; } /* Do not block unknown ICMP types */ return 0; } /* IP input processing comes here for RAW socket delivery. * Caller owns SKB, so we must make clones. * * RFC 1122: SHOULD pass TOS value up to the transport layer. * -> It does. And not only TOS, but all IP header. */ static int raw_v4_input(struct net *net, struct sk_buff *skb, const struct iphdr *iph, int hash) { int sdif = inet_sdif(skb); struct hlist_head *hlist; int dif = inet_iif(skb); int delivered = 0; struct sock *sk; hlist = &raw_v4_hashinfo.ht[hash]; rcu_read_lock(); sk_for_each_rcu(sk, hlist) { if (!raw_v4_match(net, sk, iph->protocol, iph->saddr, iph->daddr, dif, sdif)) continue; if (atomic_read(&sk->sk_rmem_alloc) >= READ_ONCE(sk->sk_rcvbuf)) { atomic_inc(&sk->sk_drops); continue; } delivered = 1; if ((iph->protocol != IPPROTO_ICMP || !icmp_filter(sk, skb)) && ip_mc_sf_allow(sk, iph->daddr, iph->saddr, skb->dev->ifindex, sdif)) { struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC); /* Not releasing hash table! */ if (clone) raw_rcv(sk, clone); } } rcu_read_unlock(); return delivered; } int raw_local_deliver(struct sk_buff *skb, int protocol) { struct net *net = dev_net(skb->dev); return raw_v4_input(net, skb, ip_hdr(skb), raw_hashfunc(net, protocol)); } static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info) { struct inet_sock *inet = inet_sk(sk); const int type = icmp_hdr(skb)->type; const int code = icmp_hdr(skb)->code; int harderr = 0; bool recverr; int err = 0; if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) ipv4_sk_update_pmtu(skb, sk, info); else if (type == ICMP_REDIRECT) { ipv4_sk_redirect(skb, sk); return; } /* Report error on raw socket, if: 1. User requested ip_recverr. 2. Socket is connected (otherwise the error indication is useless without ip_recverr and error is hard. */ recverr = inet_test_bit(RECVERR, sk); if (!recverr && sk->sk_state != TCP_ESTABLISHED) return; switch (type) { default: case ICMP_TIME_EXCEEDED: err = EHOSTUNREACH; break; case ICMP_SOURCE_QUENCH: return; case ICMP_PARAMETERPROB: err = EPROTO; harderr = 1; break; case ICMP_DEST_UNREACH: err = EHOSTUNREACH; if (code > NR_ICMP_UNREACH) break; if (code == ICMP_FRAG_NEEDED) { harderr = READ_ONCE(inet->pmtudisc) != IP_PMTUDISC_DONT; err = EMSGSIZE; } else { err = icmp_err_convert[code].errno; harderr = icmp_err_convert[code].fatal; } } if (recverr) { const struct iphdr *iph = (const struct iphdr *)skb->data; u8 *payload = skb->data + (iph->ihl << 2); if (inet_test_bit(HDRINCL, sk)) payload = skb->data; ip_icmp_error(sk, skb, err, 0, info, payload); } if (recverr || harderr) { sk->sk_err = err; sk_error_report(sk); } } void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info) { struct net *net = dev_net(skb->dev); int dif = skb->dev->ifindex; int sdif = inet_sdif(skb); struct hlist_head *hlist; const struct iphdr *iph; struct sock *sk; int hash; hash = raw_hashfunc(net, protocol); hlist = &raw_v4_hashinfo.ht[hash]; rcu_read_lock(); sk_for_each_rcu(sk, hlist) { iph = (const struct iphdr *)skb->data; if (!raw_v4_match(net, sk, iph->protocol, iph->daddr, iph->saddr, dif, sdif)) continue; raw_err(sk, skb, info); } rcu_read_unlock(); } static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb) { enum skb_drop_reason reason; /* Charge it to the socket. */ ipv4_pktinfo_prepare(sk, skb, true); if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) { sk_skb_reason_drop(sk, skb, reason); return NET_RX_DROP; } return NET_RX_SUCCESS; } int raw_rcv(struct sock *sk, struct sk_buff *skb) { if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) { atomic_inc(&sk->sk_drops); sk_skb_reason_drop(sk, skb, SKB_DROP_REASON_XFRM_POLICY); return NET_RX_DROP; } nf_reset_ct(skb); skb_push(skb, -skb_network_offset(skb)); raw_rcv_skb(sk, skb); return 0; } static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, struct msghdr *msg, size_t length, struct rtable **rtp, unsigned int flags, const struct sockcm_cookie *sockc) { struct inet_sock *inet = inet_sk(sk); struct net *net = sock_net(sk); struct iphdr *iph; struct sk_buff *skb; unsigned int iphlen; int err; struct rtable *rt = *rtp; int hlen, tlen; if (length > rt->dst.dev->mtu) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, rt->dst.dev->mtu); return -EMSGSIZE; } if (length < sizeof(struct iphdr)) return -EINVAL; if (flags&MSG_PROBE) goto out; hlen = LL_RESERVED_SPACE(rt->dst.dev); tlen = rt->dst.dev->needed_tailroom; skb = sock_alloc_send_skb(sk, length + hlen + tlen + 15, flags & MSG_DONTWAIT, &err); if (!skb) goto error; skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IP); skb->priority = sockc->priority; skb->mark = sockc->mark; skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid); skb_dst_set(skb, &rt->dst); *rtp = NULL; skb_reset_network_header(skb); iph = ip_hdr(skb); skb_put(skb, length); skb->ip_summed = CHECKSUM_NONE; skb_setup_tx_timestamp(skb, sockc); if (flags & MSG_CONFIRM) skb_set_dst_pending_confirm(skb, 1); skb->transport_header = skb->network_header; err = -EFAULT; if (memcpy_from_msg(iph, msg, length)) goto error_free; iphlen = iph->ihl * 4; /* * We don't want to modify the ip header, but we do need to * be sure that it won't cause problems later along the network * stack. Specifically we want to make sure that iph->ihl is a * sane value. If ihl points beyond the length of the buffer passed * in, reject the frame as invalid */ err = -EINVAL; if (iphlen > length) goto error_free; if (iphlen >= sizeof(*iph)) { if (!iph->saddr) iph->saddr = fl4->saddr; iph->check = 0; iph->tot_len = htons(length); if (!iph->id) ip_select_ident(net, skb, NULL); iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); skb->transport_header += iphlen; if (iph->protocol == IPPROTO_ICMP && length >= iphlen + sizeof(struct icmphdr)) icmp_out_count(net, ((struct icmphdr *) skb_transport_header(skb))->type); } err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, skb, NULL, rt->dst.dev, dst_output); if (err > 0) err = net_xmit_errno(err); if (err) goto error; out: return 0; error_free: kfree_skb(skb); error: IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS); if (err == -ENOBUFS && !inet_test_bit(RECVERR, sk)) err = 0; return err; } static int raw_probe_proto_opt(struct raw_frag_vec *rfv, struct flowi4 *fl4) { int err; if (fl4->flowi4_proto != IPPROTO_ICMP) return 0; /* We only need the first two bytes. */ rfv->hlen = 2; err = memcpy_from_msg(rfv->hdr.c, rfv->msg, rfv->hlen); if (err) return err; fl4->fl4_icmp_type = rfv->hdr.icmph.type; fl4->fl4_icmp_code = rfv->hdr.icmph.code; return 0; } static int raw_getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb) { struct raw_frag_vec *rfv = from; if (offset < rfv->hlen) { int copy = min(rfv->hlen - offset, len); if (skb->ip_summed == CHECKSUM_PARTIAL) memcpy(to, rfv->hdr.c + offset, copy); else skb->csum = csum_block_add( skb->csum, csum_partial_copy_nocheck(rfv->hdr.c + offset, to, copy), odd); odd = 0; offset += copy; to += copy; len -= copy; if (!len) return 0; } offset -= rfv->hlen; return ip_generic_getfrag(rfv->msg, to, offset, len, odd, skb); } static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct inet_sock *inet = inet_sk(sk); struct net *net = sock_net(sk); struct ipcm_cookie ipc; struct rtable *rt = NULL; struct flowi4 fl4; u8 scope; int free = 0; __be32 daddr; __be32 saddr; int uc_index, err; struct ip_options_data opt_copy; struct raw_frag_vec rfv; int hdrincl; err = -EMSGSIZE; if (len > 0xFFFF) goto out; hdrincl = inet_test_bit(HDRINCL, sk); /* * Check the flags. */ err = -EOPNOTSUPP; if (msg->msg_flags & MSG_OOB) /* Mirror BSD error message */ goto out; /* compatibility */ /* * Get and verify the address. */ if (msg->msg_namelen) { DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name); err = -EINVAL; if (msg->msg_namelen < sizeof(*usin)) goto out; if (usin->sin_family != AF_INET) { pr_info_once("%s: %s forgot to set AF_INET. Fix it!\n", __func__, current->comm); err = -EAFNOSUPPORT; if (usin->sin_family) goto out; } daddr = usin->sin_addr.s_addr; /* ANK: I did not forget to get protocol from port field. * I just do not know, who uses this weirdness. * IP_HDRINCL is much more convenient. */ } else { err = -EDESTADDRREQ; if (sk->sk_state != TCP_ESTABLISHED) goto out; daddr = inet->inet_daddr; } ipcm_init_sk(&ipc, inet); /* Keep backward compat */ if (hdrincl) ipc.protocol = IPPROTO_RAW; if (msg->msg_controllen) { err = ip_cmsg_send(sk, msg, &ipc, false); if (unlikely(err)) { kfree(ipc.opt); goto out; } if (ipc.opt) free = 1; } saddr = ipc.addr; ipc.addr = daddr; if (!ipc.opt) { struct ip_options_rcu *inet_opt; rcu_read_lock(); inet_opt = rcu_dereference(inet->inet_opt); if (inet_opt) { memcpy(&opt_copy, inet_opt, sizeof(*inet_opt) + inet_opt->opt.optlen); ipc.opt = &opt_copy.opt; } rcu_read_unlock(); } if (ipc.opt) { err = -EINVAL; /* Linux does not mangle headers on raw sockets, * so that IP options + IP_HDRINCL is non-sense. */ if (hdrincl) goto done; if (ipc.opt->opt.srr) { if (!daddr) goto done; daddr = ipc.opt->opt.faddr; } } scope = ip_sendmsg_scope(inet, &ipc, msg); uc_index = READ_ONCE(inet->uc_index); if (ipv4_is_multicast(daddr)) { if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif)) ipc.oif = READ_ONCE(inet->mc_index); if (!saddr) saddr = READ_ONCE(inet->mc_addr); } else if (!ipc.oif) { ipc.oif = uc_index; } else if (ipv4_is_lbcast(daddr) && uc_index) { /* oif is set, packet is to local broadcast * and uc_index is set. oif is most likely set * by sk_bound_dev_if. If uc_index != oif check if the * oif is an L3 master and uc_index is an L3 slave. * If so, we want to allow the send using the uc_index. */ if (ipc.oif != uc_index && ipc.oif == l3mdev_master_ifindex_by_index(sock_net(sk), uc_index)) { ipc.oif = uc_index; } } flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, ipc.tos & INET_DSCP_MASK, scope, hdrincl ? ipc.protocol : sk->sk_protocol, inet_sk_flowi_flags(sk) | (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), daddr, saddr, 0, 0, sk->sk_uid); fl4.fl4_icmp_type = 0; fl4.fl4_icmp_code = 0; if (!hdrincl) { rfv.msg = msg; rfv.hlen = 0; err = raw_probe_proto_opt(&rfv, &fl4); if (err) goto done; } security_sk_classify_flow(sk, flowi4_to_flowi_common(&fl4)); rt = ip_route_output_flow(net, &fl4, sk); if (IS_ERR(rt)) { err = PTR_ERR(rt); rt = NULL; goto done; } err = -EACCES; if (rt->rt_flags & RTCF_BROADCAST && !sock_flag(sk, SOCK_BROADCAST)) goto done; if (msg->msg_flags & MSG_CONFIRM) goto do_confirm; back_from_confirm: if (hdrincl) err = raw_send_hdrinc(sk, &fl4, msg, len, &rt, msg->msg_flags, &ipc.sockc); else { if (!ipc.addr) ipc.addr = fl4.daddr; lock_sock(sk); err = ip_append_data(sk, &fl4, raw_getfrag, &rfv, len, 0, &ipc, &rt, msg->msg_flags); if (err) ip_flush_pending_frames(sk); else if (!(msg->msg_flags & MSG_MORE)) { err = ip_push_pending_frames(sk, &fl4); if (err == -ENOBUFS && !inet_test_bit(RECVERR, sk)) err = 0; } release_sock(sk); } done: if (free) kfree(ipc.opt); ip_rt_put(rt); out: if (err < 0) return err; return len; do_confirm: if (msg->msg_flags & MSG_PROBE) dst_confirm_neigh(&rt->dst, &fl4.daddr); if (!(msg->msg_flags & MSG_PROBE) || len) goto back_from_confirm; err = 0; goto done; } static void raw_close(struct sock *sk, long timeout) { /* * Raw sockets may have direct kernel references. Kill them. */ ip_ra_control(sk, 0, NULL); sk_common_release(sk); } static void raw_destroy(struct sock *sk) { lock_sock(sk); ip_flush_pending_frames(sk); release_sock(sk); } /* This gets rid of all the nasties in af_inet. -DaveM */ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct inet_sock *inet = inet_sk(sk); struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; struct net *net = sock_net(sk); u32 tb_id = RT_TABLE_LOCAL; int ret = -EINVAL; int chk_addr_ret; lock_sock(sk); if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_in)) goto out; if (sk->sk_bound_dev_if) tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id; chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id); ret = -EADDRNOTAVAIL; if (!inet_addr_valid_or_nonlocal(net, inet, addr->sin_addr.s_addr, chk_addr_ret)) goto out; inet->inet_rcv_saddr = inet->inet_saddr = addr->sin_addr.s_addr; if (chk_addr_ret == RTN_MULTICAST || chk_addr_ret == RTN_BROADCAST) inet->inet_saddr = 0; /* Use device */ sk_dst_reset(sk); ret = 0; out: release_sock(sk); return ret; } /* * This should be easy, if there is something there * we return it, otherwise we block. */ static int raw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len) { struct inet_sock *inet = inet_sk(sk); size_t copied = 0; int err = -EOPNOTSUPP; DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name); struct sk_buff *skb; if (flags & MSG_OOB) goto out; if (flags & MSG_ERRQUEUE) { err = ip_recv_error(sk, msg, len, addr_len); goto out; } skb = skb_recv_datagram(sk, flags, &err); if (!skb) goto out; copied = skb->len; if (len < copied) { msg->msg_flags |= MSG_TRUNC; copied = len; } err = skb_copy_datagram_msg(skb, 0, msg, copied); if (err) goto done; sock_recv_cmsgs(msg, sk, skb); /* Copy the address. */ if (sin) { sin->sin_family = AF_INET; sin->sin_addr.s_addr = ip_hdr(skb)->saddr; sin->sin_port = 0; memset(&sin->sin_zero, 0, sizeof(sin->sin_zero)); *addr_len = sizeof(*sin); } if (inet_cmsg_flags(inet)) ip_cmsg_recv(msg, skb); if (flags & MSG_TRUNC) copied = skb->len; done: skb_free_datagram(sk, skb); out: if (err) return err; return copied; } static int raw_sk_init(struct sock *sk) { struct raw_sock *rp = raw_sk(sk); if (inet_sk(sk)->inet_num == IPPROTO_ICMP) memset(&rp->filter, 0, sizeof(rp->filter)); return 0; } static int raw_seticmpfilter(struct sock *sk, sockptr_t optval, int optlen) { if (optlen > sizeof(struct icmp_filter)) optlen = sizeof(struct icmp_filter); if (copy_from_sockptr(&raw_sk(sk)->filter, optval, optlen)) return -EFAULT; return 0; } static int raw_geticmpfilter(struct sock *sk, char __user *optval, int __user *optlen) { int len, ret = -EFAULT; if (get_user(len, optlen)) goto out; ret = -EINVAL; if (len < 0) goto out; if (len > sizeof(struct icmp_filter)) len = sizeof(struct icmp_filter); ret = -EFAULT; if (put_user(len, optlen) || copy_to_user(optval, &raw_sk(sk)->filter, len)) goto out; ret = 0; out: return ret; } static int do_raw_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { if (optname == ICMP_FILTER) { if (inet_sk(sk)->inet_num != IPPROTO_ICMP) return -EOPNOTSUPP; else return raw_seticmpfilter(sk, optval, optlen); } return -ENOPROTOOPT; } static int raw_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen) { if (level != SOL_RAW) return ip_setsockopt(sk, level, optname, optval, optlen); return do_raw_setsockopt(sk, optname, optval, optlen); } static int do_raw_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen) { if (optname == ICMP_FILTER) { if (inet_sk(sk)->inet_num != IPPROTO_ICMP) return -EOPNOTSUPP; else return raw_geticmpfilter(sk, optval, optlen); } return -ENOPROTOOPT; } static int raw_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { if (level != SOL_RAW) return ip_getsockopt(sk, level, optname, optval, optlen); return do_raw_getsockopt(sk, optname, optval, optlen); } static int raw_ioctl(struct sock *sk, int cmd, int *karg) { switch (cmd) { case SIOCOUTQ: { *karg = sk_wmem_alloc_get(sk); return 0; } case SIOCINQ: { struct sk_buff *skb; spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); if (skb) *karg = skb->len; else *karg = 0; spin_unlock_bh(&sk->sk_receive_queue.lock); return 0; } default: #ifdef CONFIG_IP_MROUTE return ipmr_ioctl(sk, cmd, karg); #else return -ENOIOCTLCMD; #endif } } #ifdef CONFIG_COMPAT static int compat_raw_ioctl(struct sock *sk, unsigned int cmd, unsigned long arg) { switch (cmd) { case SIOCOUTQ: case SIOCINQ: return -ENOIOCTLCMD; default: #ifdef CONFIG_IP_MROUTE return ipmr_compat_ioctl(sk, cmd, compat_ptr(arg)); #else return -ENOIOCTLCMD; #endif } } #endif int raw_abort(struct sock *sk, int err) { lock_sock(sk); sk->sk_err = err; sk_error_report(sk); __udp_disconnect(sk, 0); release_sock(sk); return 0; } EXPORT_SYMBOL_GPL(raw_abort); struct proto raw_prot = { .name = "RAW", .owner = THIS_MODULE, .close = raw_close, .destroy = raw_destroy, .connect = ip4_datagram_connect, .disconnect = __udp_disconnect, .ioctl = raw_ioctl, .init = raw_sk_init, .setsockopt = raw_setsockopt, .getsockopt = raw_getsockopt, .sendmsg = raw_sendmsg, .recvmsg = raw_recvmsg, .bind = raw_bind, .backlog_rcv = raw_rcv_skb, .release_cb = ip4_datagram_release_cb, .hash = raw_hash_sk, .unhash = raw_unhash_sk, .obj_size = sizeof(struct raw_sock), .useroffset = offsetof(struct raw_sock, filter), .usersize = sizeof_field(struct raw_sock, filter), .h.raw_hash = &raw_v4_hashinfo, #ifdef CONFIG_COMPAT .compat_ioctl = compat_raw_ioctl, #endif .diag_destroy = raw_abort, }; #ifdef CONFIG_PROC_FS static struct sock *raw_get_first(struct seq_file *seq, int bucket) { struct raw_hashinfo *h = pde_data(file_inode(seq->file)); struct raw_iter_state *state = raw_seq_private(seq); struct hlist_head *hlist; struct sock *sk; for (state->bucket = bucket; state->bucket < RAW_HTABLE_SIZE; ++state->bucket) { hlist = &h->ht[state->bucket]; sk_for_each(sk, hlist) { if (sock_net(sk) == seq_file_net(seq)) return sk; } } return NULL; } static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk) { struct raw_iter_state *state = raw_seq_private(seq); do { sk = sk_next(sk); } while (sk && sock_net(sk) != seq_file_net(seq)); if (!sk) return raw_get_first(seq, state->bucket + 1); return sk; } static struct sock *raw_get_idx(struct seq_file *seq, loff_t pos) { struct sock *sk = raw_get_first(seq, 0); if (sk) while (pos && (sk = raw_get_next(seq, sk)) != NULL) --pos; return pos ? NULL : sk; } void *raw_seq_start(struct seq_file *seq, loff_t *pos) __acquires(&h->lock) { struct raw_hashinfo *h = pde_data(file_inode(seq->file)); spin_lock(&h->lock); return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; } EXPORT_SYMBOL_GPL(raw_seq_start); void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct sock *sk; if (v == SEQ_START_TOKEN) sk = raw_get_first(seq, 0); else sk = raw_get_next(seq, v); ++*pos; return sk; } EXPORT_SYMBOL_GPL(raw_seq_next); void raw_seq_stop(struct seq_file *seq, void *v) __releases(&h->lock) { struct raw_hashinfo *h = pde_data(file_inode(seq->file)); spin_unlock(&h->lock); } EXPORT_SYMBOL_GPL(raw_seq_stop); static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i) { struct inet_sock *inet = inet_sk(sp); __be32 dest = inet->inet_daddr, src = inet->inet_rcv_saddr; __u16 destp = 0, srcp = inet->inet_num; seq_printf(seq, "%4d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u\n", i, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), sk_rmem_alloc_get(sp), 0, 0L, 0, from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)), 0, sock_i_ino(sp), refcount_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops)); } static int raw_seq_show(struct seq_file *seq, void *v) { if (v == SEQ_START_TOKEN) seq_printf(seq, " sl local_address rem_address st tx_queue " "rx_queue tr tm->when retrnsmt uid timeout " "inode ref pointer drops\n"); else raw_sock_seq_show(seq, v, raw_seq_private(seq)->bucket); return 0; } static const struct seq_operations raw_seq_ops = { .start = raw_seq_start, .next = raw_seq_next, .stop = raw_seq_stop, .show = raw_seq_show, }; static __net_init int raw_init_net(struct net *net) { if (!proc_create_net_data("raw", 0444, net->proc_net, &raw_seq_ops, sizeof(struct raw_iter_state), &raw_v4_hashinfo)) return -ENOMEM; return 0; } static __net_exit void raw_exit_net(struct net *net) { remove_proc_entry("raw", net->proc_net); } static __net_initdata struct pernet_operations raw_net_ops = { .init = raw_init_net, .exit = raw_exit_net, }; int __init raw_proc_init(void) { return register_pernet_subsys(&raw_net_ops); } void __init raw_proc_exit(void) { unregister_pernet_subsys(&raw_net_ops); } #endif /* CONFIG_PROC_FS */ static void raw_sysctl_init_net(struct net *net) { #ifdef CONFIG_NET_L3_MASTER_DEV net->ipv4.sysctl_raw_l3mdev_accept = 1; #endif } static int __net_init raw_sysctl_init(struct net *net) { raw_sysctl_init_net(net); return 0; } static struct pernet_operations __net_initdata raw_sysctl_ops = { .init = raw_sysctl_init, }; void __init raw_init(void) { raw_sysctl_init_net(&init_net); if (register_pernet_subsys(&raw_sysctl_ops)) panic("RAW: failed to init sysctl parameters.\n"); } |
| 9 9 2 68 74 54 53 48 58 58 75 113 11 114 114 59 75 38 38 38 38 54 29 25 54 54 54 29 25 54 5 4 2 38 29 29 27 29 29 31 31 31 1 31 31 28 31 7 7 7 7 66 34 34 68 2 2 63 68 3 3 3 3 7 7 49 48 49 31 1 22 22 7 15 1 14 22 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 22 22 22 22 21 3 3 3 3 3 112 112 4 20 3 19 107 112 111 18 18 18 111 69 69 69 64 64 4 69 68 70 70 46 45 45 33 33 33 33 33 33 33 33 31 33 33 2 32 31 30 32 32 2 2 39 14 14 14 40 38 38 38 39 37 38 38 38 38 38 38 38 38 37 38 38 33 38 13 38 38 39 59 59 59 59 3 3 3 3 3 3 58 57 57 93 93 93 94 93 45 28 29 5 5 5 5 5 29 48 48 48 48 48 47 48 47 48 48 48 48 47 47 48 47 48 48 48 48 48 48 48 48 47 48 4 3 16 15 18 9 5 5 5 48 26 9 2 2 3 1 5 59 59 3 59 59 1 59 20 12 49 49 2 48 43 8 8 8 8 39 2 2 59 40 40 40 5 40 39 40 40 40 1 1 1 3 2 1 2 2 1 3 3 3 3 3 24 24 23 23 23 23 23 22 23 23 23 24 24 24 107 107 2 2 2 2 1 1 1 2 2 2 2 2 2 4 2 2 2 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 | // SPDX-License-Identifier: GPL-2.0-only /* Connection state tracking for netfilter. This is separated from, but required by, the NAT layer; it can also be used by an iptables extension. */ /* (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org> * (C) 2003,2004 USAGI/WIDE Project <http://www.linux-ipv6.org> * (C) 2005-2012 Patrick McHardy <kaber@trash.net> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/types.h> #include <linux/netfilter.h> #include <linux/module.h> #include <linux/sched.h> #include <linux/skbuff.h> #include <linux/proc_fs.h> #include <linux/vmalloc.h> #include <linux/stddef.h> #include <linux/slab.h> #include <linux/random.h> #include <linux/siphash.h> #include <linux/err.h> #include <linux/percpu.h> #include <linux/moduleparam.h> #include <linux/notifier.h> #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/socket.h> #include <linux/mm.h> #include <linux/nsproxy.h> #include <linux/rculist_nulls.h> #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_bpf.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_expect.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_conntrack_extend.h> #include <net/netfilter/nf_conntrack_acct.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_zones.h> #include <net/netfilter/nf_conntrack_timestamp.h> #include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_conntrack_labels.h> #include <net/netfilter/nf_conntrack_synproxy.h> #include <net/netfilter/nf_nat.h> #include <net/netfilter/nf_nat_helper.h> #include <net/netns/hash.h> #include <net/ip.h> #include "nf_internals.h" __cacheline_aligned_in_smp spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS]; EXPORT_SYMBOL_GPL(nf_conntrack_locks); __cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock); EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock); struct hlist_nulls_head *nf_conntrack_hash __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; u32 next_bucket; u32 avg_timeout; u32 count; u32 start_time; bool exiting; bool early_drop; }; static __read_mostly struct kmem_cache *nf_conntrack_cachep; static DEFINE_SPINLOCK(nf_conntrack_locks_all_lock); static __read_mostly bool nf_conntrack_locks_all; /* serialize hash resizes and nf_ct_iterate_cleanup */ static DEFINE_MUTEX(nf_conntrack_mutex); #define GC_SCAN_INTERVAL_MAX (60ul * HZ) #define GC_SCAN_INTERVAL_MIN (1ul * HZ) /* clamp timeouts to this value (TCP unacked) */ #define GC_SCAN_INTERVAL_CLAMP (300ul * HZ) /* Initial bias pretending we have 100 entries at the upper bound so we don't * wakeup often just because we have three entries with a 1s timeout while still * allowing non-idle machines to wakeup more often when needed. */ #define GC_SCAN_INITIAL_COUNT 100 #define GC_SCAN_INTERVAL_INIT GC_SCAN_INTERVAL_MAX #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) #define GC_SCAN_EXPIRED_MAX (64000u / HZ) #define MIN_CHAINLEN 50u #define MAX_CHAINLEN (80u - MIN_CHAINLEN) static struct conntrack_gc_work conntrack_gc_work; void nf_conntrack_lock(spinlock_t *lock) __acquires(lock) { /* 1) Acquire the lock */ spin_lock(lock); /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics * It pairs with the smp_store_release() in nf_conntrack_all_unlock() */ if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false)) return; /* fast path failed, unlock */ spin_unlock(lock); /* Slow path 1) get global lock */ spin_lock(&nf_conntrack_locks_all_lock); /* Slow path 2) get the lock we want */ spin_lock(lock); /* Slow path 3) release the global lock */ spin_unlock(&nf_conntrack_locks_all_lock); } EXPORT_SYMBOL_GPL(nf_conntrack_lock); static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2) { h1 %= CONNTRACK_LOCKS; h2 %= CONNTRACK_LOCKS; spin_unlock(&nf_conntrack_locks[h1]); if (h1 != h2) spin_unlock(&nf_conntrack_locks[h2]); } /* return true if we need to recompute hashes (in case hash table was resized) */ static bool nf_conntrack_double_lock(struct net *net, unsigned int h1, unsigned int h2, unsigned int sequence) { h1 %= CONNTRACK_LOCKS; h2 %= CONNTRACK_LOCKS; if (h1 <= h2) { nf_conntrack_lock(&nf_conntrack_locks[h1]); if (h1 != h2) spin_lock_nested(&nf_conntrack_locks[h2], SINGLE_DEPTH_NESTING); } else { nf_conntrack_lock(&nf_conntrack_locks[h2]); spin_lock_nested(&nf_conntrack_locks[h1], SINGLE_DEPTH_NESTING); } if (read_seqcount_retry(&nf_conntrack_generation, sequence)) { nf_conntrack_double_unlock(h1, h2); return true; } return false; } static void nf_conntrack_all_lock(void) __acquires(&nf_conntrack_locks_all_lock) { int i; spin_lock(&nf_conntrack_locks_all_lock); /* For nf_contrack_locks_all, only the latest time when another * CPU will see an update is controlled, by the "release" of the * spin_lock below. * The earliest time is not controlled, an thus KCSAN could detect * a race when nf_conntract_lock() reads the variable. * WRITE_ONCE() is used to ensure the compiler will not * optimize the write. */ WRITE_ONCE(nf_conntrack_locks_all, true); for (i = 0; i < CONNTRACK_LOCKS; i++) { spin_lock(&nf_conntrack_locks[i]); /* This spin_unlock provides the "release" to ensure that * nf_conntrack_locks_all==true is visible to everyone that * acquired spin_lock(&nf_conntrack_locks[]). */ spin_unlock(&nf_conntrack_locks[i]); } } static void nf_conntrack_all_unlock(void) __releases(&nf_conntrack_locks_all_lock) { /* All prior stores must be complete before we clear * 'nf_conntrack_locks_all'. Otherwise nf_conntrack_lock() * might observe the false value but not the entire * critical section. * It pairs with the smp_load_acquire() in nf_conntrack_lock() */ smp_store_release(&nf_conntrack_locks_all, false); spin_unlock(&nf_conntrack_locks_all_lock); } unsigned int nf_conntrack_htable_size __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_htable_size); unsigned int nf_conntrack_max __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_max); seqcount_spinlock_t nf_conntrack_generation __read_mostly; static siphash_aligned_key_t nf_conntrack_hash_rnd; static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple, unsigned int zoneid, const struct net *net) { siphash_key_t key; get_random_once(&nf_conntrack_hash_rnd, sizeof(nf_conntrack_hash_rnd)); key = nf_conntrack_hash_rnd; key.key[0] ^= zoneid; key.key[1] ^= net_hash_mix(net); return siphash((void *)tuple, offsetofend(struct nf_conntrack_tuple, dst.__nfct_hash_offsetend), &key); } static u32 scale_hash(u32 hash) { return reciprocal_scale(hash, nf_conntrack_htable_size); } static u32 __hash_conntrack(const struct net *net, const struct nf_conntrack_tuple *tuple, unsigned int zoneid, unsigned int size) { return reciprocal_scale(hash_conntrack_raw(tuple, zoneid, net), size); } static u32 hash_conntrack(const struct net *net, const struct nf_conntrack_tuple *tuple, unsigned int zoneid) { return scale_hash(hash_conntrack_raw(tuple, zoneid, net)); } static bool nf_ct_get_tuple_ports(const struct sk_buff *skb, unsigned int dataoff, struct nf_conntrack_tuple *tuple) { struct { __be16 sport; __be16 dport; } _inet_hdr, *inet_hdr; /* Actually only need first 4 bytes to get ports. */ inet_hdr = skb_header_pointer(skb, dataoff, sizeof(_inet_hdr), &_inet_hdr); if (!inet_hdr) return false; tuple->src.u.udp.port = inet_hdr->sport; tuple->dst.u.udp.port = inet_hdr->dport; return true; } static bool nf_ct_get_tuple(const struct sk_buff *skb, unsigned int nhoff, unsigned int dataoff, u_int16_t l3num, u_int8_t protonum, struct net *net, struct nf_conntrack_tuple *tuple) { unsigned int size; const __be32 *ap; __be32 _addrs[8]; memset(tuple, 0, sizeof(*tuple)); tuple->src.l3num = l3num; switch (l3num) { case NFPROTO_IPV4: nhoff += offsetof(struct iphdr, saddr); size = 2 * sizeof(__be32); break; case NFPROTO_IPV6: nhoff += offsetof(struct ipv6hdr, saddr); size = sizeof(_addrs); break; default: return true; } ap = skb_header_pointer(skb, nhoff, size, _addrs); if (!ap) return false; switch (l3num) { case NFPROTO_IPV4: tuple->src.u3.ip = ap[0]; tuple->dst.u3.ip = ap[1]; break; case NFPROTO_IPV6: memcpy(tuple->src.u3.ip6, ap, sizeof(tuple->src.u3.ip6)); memcpy(tuple->dst.u3.ip6, ap + 4, sizeof(tuple->dst.u3.ip6)); break; } tuple->dst.protonum = protonum; tuple->dst.dir = IP_CT_DIR_ORIGINAL; switch (protonum) { #if IS_ENABLED(CONFIG_IPV6) case IPPROTO_ICMPV6: return icmpv6_pkt_to_tuple(skb, dataoff, net, tuple); #endif case IPPROTO_ICMP: return icmp_pkt_to_tuple(skb, dataoff, net, tuple); #ifdef CONFIG_NF_CT_PROTO_GRE case IPPROTO_GRE: return gre_pkt_to_tuple(skb, dataoff, net, tuple); #endif case IPPROTO_TCP: case IPPROTO_UDP: #ifdef CONFIG_NF_CT_PROTO_UDPLITE case IPPROTO_UDPLITE: #endif #ifdef CONFIG_NF_CT_PROTO_SCTP case IPPROTO_SCTP: #endif #ifdef CONFIG_NF_CT_PROTO_DCCP case IPPROTO_DCCP: #endif /* fallthrough */ return nf_ct_get_tuple_ports(skb, dataoff, tuple); default: break; } return true; } static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, u_int8_t *protonum) { int dataoff = -1; const struct iphdr *iph; struct iphdr _iph; iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); if (!iph) return -1; /* Conntrack defragments packets, we might still see fragments * inside ICMP packets though. */ if (iph->frag_off & htons(IP_OFFSET)) return -1; dataoff = nhoff + (iph->ihl << 2); *protonum = iph->protocol; /* Check bogus IP headers */ if (dataoff > skb->len) { pr_debug("bogus IPv4 packet: nhoff %u, ihl %u, skblen %u\n", nhoff, iph->ihl << 2, skb->len); return -1; } return dataoff; } #if IS_ENABLED(CONFIG_IPV6) static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, u8 *protonum) { int protoff = -1; unsigned int extoff = nhoff + sizeof(struct ipv6hdr); __be16 frag_off; u8 nexthdr; if (skb_copy_bits(skb, nhoff + offsetof(struct ipv6hdr, nexthdr), &nexthdr, sizeof(nexthdr)) != 0) { pr_debug("can't get nexthdr\n"); return -1; } protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off); /* * (protoff == skb->len) means the packet has not data, just * IPv6 and possibly extensions headers, but it is tracked anyway */ if (protoff < 0 || (frag_off & htons(~0x7)) != 0) { pr_debug("can't find proto in pkt\n"); return -1; } *protonum = nexthdr; return protoff; } #endif static int get_l4proto(const struct sk_buff *skb, unsigned int nhoff, u8 pf, u8 *l4num) { switch (pf) { case NFPROTO_IPV4: return ipv4_get_l4proto(skb, nhoff, l4num); #if IS_ENABLED(CONFIG_IPV6) case NFPROTO_IPV6: return ipv6_get_l4proto(skb, nhoff, l4num); #endif default: *l4num = 0; break; } return -1; } bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff, u_int16_t l3num, struct net *net, struct nf_conntrack_tuple *tuple) { u8 protonum; int protoff; protoff = get_l4proto(skb, nhoff, l3num, &protonum); if (protoff <= 0) return false; return nf_ct_get_tuple(skb, nhoff, protoff, l3num, protonum, net, tuple); } EXPORT_SYMBOL_GPL(nf_ct_get_tuplepr); bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse, const struct nf_conntrack_tuple *orig) { memset(inverse, 0, sizeof(*inverse)); inverse->src.l3num = orig->src.l3num; switch (orig->src.l3num) { case NFPROTO_IPV4: inverse->src.u3.ip = orig->dst.u3.ip; inverse->dst.u3.ip = orig->src.u3.ip; break; case NFPROTO_IPV6: inverse->src.u3.in6 = orig->dst.u3.in6; inverse->dst.u3.in6 = orig->src.u3.in6; break; default: break; } inverse->dst.dir = !orig->dst.dir; inverse->dst.protonum = orig->dst.protonum; switch (orig->dst.protonum) { case IPPROTO_ICMP: return nf_conntrack_invert_icmp_tuple(inverse, orig); #if IS_ENABLED(CONFIG_IPV6) case IPPROTO_ICMPV6: return nf_conntrack_invert_icmpv6_tuple(inverse, orig); #endif } inverse->src.u.all = orig->dst.u.all; inverse->dst.u.all = orig->src.u.all; return true; } EXPORT_SYMBOL_GPL(nf_ct_invert_tuple); /* Generate a almost-unique pseudo-id for a given conntrack. * * intentionally doesn't re-use any of the seeds used for hash * table location, we assume id gets exposed to userspace. * * Following nf_conn items do not change throughout lifetime * of the nf_conn: * * 1. nf_conn address * 2. nf_conn->master address (normally NULL) * 3. the associated net namespace * 4. the original direction tuple */ u32 nf_ct_get_id(const struct nf_conn *ct) { static siphash_aligned_key_t ct_id_seed; unsigned long a, b, c, d; net_get_random_once(&ct_id_seed, sizeof(ct_id_seed)); a = (unsigned long)ct; b = (unsigned long)ct->master; c = (unsigned long)nf_ct_net(ct); d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple), &ct_id_seed); #ifdef CONFIG_64BIT return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed); #else return siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &ct_id_seed); #endif } EXPORT_SYMBOL_GPL(nf_ct_get_id); static u32 nf_conntrack_get_id(const struct nf_conntrack *nfct) { return nf_ct_get_id(nf_ct_to_nf_conn(nfct)); } static void clean_from_lists(struct nf_conn *ct) { hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode); /* Destroy all pending expectations */ nf_ct_remove_expectations(ct); } #define NFCT_ALIGN(len) (((len) + NFCT_INFOMASK) & ~NFCT_INFOMASK) /* Released via nf_ct_destroy() */ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, const struct nf_conntrack_zone *zone, gfp_t flags) { struct nf_conn *tmpl, *p; if (ARCH_KMALLOC_MINALIGN <= NFCT_INFOMASK) { tmpl = kzalloc(sizeof(*tmpl) + NFCT_INFOMASK, flags); if (!tmpl) return NULL; p = tmpl; tmpl = (struct nf_conn *)NFCT_ALIGN((unsigned long)p); if (tmpl != p) tmpl->proto.tmpl_padto = (char *)tmpl - (char *)p; } else { tmpl = kzalloc(sizeof(*tmpl), flags); if (!tmpl) return NULL; } tmpl->status = IPS_TEMPLATE; write_pnet(&tmpl->ct_net, net); nf_ct_zone_add(tmpl, zone); refcount_set(&tmpl->ct_general.use, 1); return tmpl; } EXPORT_SYMBOL_GPL(nf_ct_tmpl_alloc); void nf_ct_tmpl_free(struct nf_conn *tmpl) { kfree(tmpl->ext); if (ARCH_KMALLOC_MINALIGN <= NFCT_INFOMASK) kfree((char *)tmpl - tmpl->proto.tmpl_padto); else kfree(tmpl); } EXPORT_SYMBOL_GPL(nf_ct_tmpl_free); static void destroy_gre_conntrack(struct nf_conn *ct) { #ifdef CONFIG_NF_CT_PROTO_GRE struct nf_conn *master = ct->master; if (master) nf_ct_gre_keymap_destroy(master); #endif } void nf_ct_destroy(struct nf_conntrack *nfct) { struct nf_conn *ct = (struct nf_conn *)nfct; WARN_ON(refcount_read(&nfct->use) != 0); if (unlikely(nf_ct_is_template(ct))) { nf_ct_tmpl_free(ct); return; } if (unlikely(nf_ct_protonum(ct) == IPPROTO_GRE)) destroy_gre_conntrack(ct); /* Expectations will have been removed in clean_from_lists, * except TFTP can create an expectation on the first packet, * before connection is in the list, so we need to clean here, * too. */ nf_ct_remove_expectations(ct); if (ct->master) nf_ct_put(ct->master); nf_conntrack_free(ct); } EXPORT_SYMBOL(nf_ct_destroy); static void __nf_ct_delete_from_lists(struct nf_conn *ct) { struct net *net = nf_ct_net(ct); unsigned int hash, reply_hash; unsigned int sequence; do { sequence = read_seqcount_begin(&nf_conntrack_generation); hash = hash_conntrack(net, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_ORIGINAL)); reply_hash = hash_conntrack(net, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY)); } while (nf_conntrack_double_lock(net, hash, reply_hash, sequence)); clean_from_lists(ct); nf_conntrack_double_unlock(hash, reply_hash); } static void nf_ct_delete_from_lists(struct nf_conn *ct) { nf_ct_helper_destroy(ct); local_bh_disable(); __nf_ct_delete_from_lists(ct); local_bh_enable(); } static void nf_ct_add_to_ecache_list(struct nf_conn *ct) { #ifdef CONFIG_NF_CONNTRACK_EVENTS struct nf_conntrack_net *cnet = nf_ct_pernet(nf_ct_net(ct)); spin_lock(&cnet->ecache.dying_lock); hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode, &cnet->ecache.dying_list); spin_unlock(&cnet->ecache.dying_lock); #endif } bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report) { struct nf_conn_tstamp *tstamp; struct net *net; if (test_and_set_bit(IPS_DYING_BIT, &ct->status)) return false; tstamp = nf_conn_tstamp_find(ct); if (tstamp) { s32 timeout = READ_ONCE(ct->timeout) - nfct_time_stamp; tstamp->stop = ktime_get_real_ns(); if (timeout < 0) tstamp->stop -= jiffies_to_nsecs(-timeout); } if (nf_conntrack_event_report(IPCT_DESTROY, ct, portid, report) < 0) { /* destroy event was not delivered. nf_ct_put will * be done by event cache worker on redelivery. */ nf_ct_helper_destroy(ct); local_bh_disable(); __nf_ct_delete_from_lists(ct); nf_ct_add_to_ecache_list(ct); local_bh_enable(); nf_conntrack_ecache_work(nf_ct_net(ct), NFCT_ECACHE_DESTROY_FAIL); return false; } net = nf_ct_net(ct); if (nf_conntrack_ecache_dwork_pending(net)) nf_conntrack_ecache_work(net, NFCT_ECACHE_DESTROY_SENT); nf_ct_delete_from_lists(ct); nf_ct_put(ct); return true; } EXPORT_SYMBOL_GPL(nf_ct_delete); static inline bool nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, const struct nf_conntrack_tuple *tuple, const struct nf_conntrack_zone *zone, const struct net *net) { struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); /* A conntrack can be recreated with the equal tuple, * so we need to check that the conntrack is confirmed */ return nf_ct_tuple_equal(tuple, &h->tuple) && nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h)) && nf_ct_is_confirmed(ct) && net_eq(net, nf_ct_net(ct)); } static inline bool nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2) { return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) && nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) && nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) && net_eq(nf_ct_net(ct1), nf_ct_net(ct2)); } /* caller must hold rcu readlock and none of the nf_conntrack_locks */ static void nf_ct_gc_expired(struct nf_conn *ct) { if (!refcount_inc_not_zero(&ct->ct_general.use)) return; /* load ->status after refcount increase */ smp_acquire__after_ctrl_dep(); if (nf_ct_should_gc(ct)) nf_ct_kill(ct); nf_ct_put(ct); } /* * Warning : * - Caller must take a reference on returned object * and recheck nf_ct_tuple_equal(tuple, &h->tuple) */ static struct nf_conntrack_tuple_hash * ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple, u32 hash) { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; struct hlist_nulls_node *n; unsigned int bucket, hsize; begin: nf_conntrack_get_ht(&ct_hash, &hsize); bucket = reciprocal_scale(hash, hsize); hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) { struct nf_conn *ct; ct = nf_ct_tuplehash_to_ctrack(h); if (nf_ct_is_expired(ct)) { nf_ct_gc_expired(ct); continue; } if (nf_ct_key_equal(h, tuple, zone, net)) return h; } /* * if the nulls value we got at the end of this lookup is * not the expected one, we must restart lookup. * We probably met an item that was moved to another chain. */ if (get_nulls_value(n) != bucket) { NF_CT_STAT_INC_ATOMIC(net, search_restart); goto begin; } return NULL; } /* Find a connection corresponding to a tuple. */ static struct nf_conntrack_tuple_hash * __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple, u32 hash) { struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; h = ____nf_conntrack_find(net, zone, tuple, hash); if (h) { /* We have a candidate that matches the tuple we're interested * in, try to obtain a reference and re-check tuple */ ct = nf_ct_tuplehash_to_ctrack(h); if (likely(refcount_inc_not_zero(&ct->ct_general.use))) { /* re-check key after refcount */ smp_acquire__after_ctrl_dep(); if (likely(nf_ct_key_equal(h, tuple, zone, net))) return h; /* TYPESAFE_BY_RCU recycled the candidate */ nf_ct_put(ct); } h = NULL; } return h; } struct nf_conntrack_tuple_hash * nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple) { unsigned int rid, zone_id = nf_ct_zone_id(zone, IP_CT_DIR_ORIGINAL); struct nf_conntrack_tuple_hash *thash; rcu_read_lock(); thash = __nf_conntrack_find_get(net, zone, tuple, hash_conntrack_raw(tuple, zone_id, net)); if (thash) goto out_unlock; rid = nf_ct_zone_id(zone, IP_CT_DIR_REPLY); if (rid != zone_id) thash = __nf_conntrack_find_get(net, zone, tuple, hash_conntrack_raw(tuple, rid, net)); out_unlock: rcu_read_unlock(); return thash; } EXPORT_SYMBOL_GPL(nf_conntrack_find_get); static void __nf_conntrack_hash_insert(struct nf_conn *ct, unsigned int hash, unsigned int reply_hash) { hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode, &nf_conntrack_hash[hash]); hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode, &nf_conntrack_hash[reply_hash]); } static bool nf_ct_ext_valid_pre(const struct nf_ct_ext *ext) { /* if ext->gen_id is not equal to nf_conntrack_ext_genid, some extensions * may contain stale pointers to e.g. helper that has been removed. * * The helper can't clear this because the nf_conn object isn't in * any hash and synchronize_rcu() isn't enough because associated skb * might sit in a queue. */ return !ext || ext->gen_id == atomic_read(&nf_conntrack_ext_genid); } static bool nf_ct_ext_valid_post(struct nf_ct_ext *ext) { if (!ext) return true; if (ext->gen_id != atomic_read(&nf_conntrack_ext_genid)) return false; /* inserted into conntrack table, nf_ct_iterate_cleanup() * will find it. Disable nf_ct_ext_find() id check. */ WRITE_ONCE(ext->gen_id, 0); return true; } int nf_conntrack_hash_check_insert(struct nf_conn *ct) { const struct nf_conntrack_zone *zone; struct net *net = nf_ct_net(ct); unsigned int hash, reply_hash; struct nf_conntrack_tuple_hash *h; struct hlist_nulls_node *n; unsigned int max_chainlen; unsigned int chainlen = 0; unsigned int sequence; int err = -EEXIST; zone = nf_ct_zone(ct); if (!nf_ct_ext_valid_pre(ct->ext)) return -EAGAIN; local_bh_disable(); do { sequence = read_seqcount_begin(&nf_conntrack_generation); hash = hash_conntrack(net, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_ORIGINAL)); reply_hash = hash_conntrack(net, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY)); } while (nf_conntrack_double_lock(net, hash, reply_hash, sequence)); max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN); /* See if there's one in the list already, including reverse */ hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode) { if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, zone, net)) goto out; if (chainlen++ > max_chainlen) goto chaintoolong; } chainlen = 0; hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode) { if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, zone, net)) goto out; if (chainlen++ > max_chainlen) goto chaintoolong; } /* If genid has changed, we can't insert anymore because ct * extensions could have stale pointers and nf_ct_iterate_destroy * might have completed its table scan already. * * Increment of the ext genid right after this check is fine: * nf_ct_iterate_destroy blocks until locks are released. */ if (!nf_ct_ext_valid_post(ct->ext)) { err = -EAGAIN; goto out; } smp_wmb(); /* The caller holds a reference to this object */ refcount_set(&ct->ct_general.use, 2); __nf_conntrack_hash_insert(ct, hash, reply_hash); nf_conntrack_double_unlock(hash, reply_hash); NF_CT_STAT_INC(net, insert); local_bh_enable(); return 0; chaintoolong: NF_CT_STAT_INC(net, chaintoolong); err = -ENOSPC; out: nf_conntrack_double_unlock(hash, reply_hash); local_bh_enable(); return err; } EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert); void nf_ct_acct_add(struct nf_conn *ct, u32 dir, unsigned int packets, unsigned int bytes) { struct nf_conn_acct *acct; acct = nf_conn_acct_find(ct); if (acct) { struct nf_conn_counter *counter = acct->counter; atomic64_add(packets, &counter[dir].packets); atomic64_add(bytes, &counter[dir].bytes); } } EXPORT_SYMBOL_GPL(nf_ct_acct_add); static void nf_ct_acct_merge(struct nf_conn *ct, enum ip_conntrack_info ctinfo, const struct nf_conn *loser_ct) { struct nf_conn_acct *acct; acct = nf_conn_acct_find(loser_ct); if (acct) { struct nf_conn_counter *counter = acct->counter; unsigned int bytes; /* u32 should be fine since we must have seen one packet. */ bytes = atomic64_read(&counter[CTINFO2DIR(ctinfo)].bytes); nf_ct_acct_update(ct, CTINFO2DIR(ctinfo), bytes); } } static void __nf_conntrack_insert_prepare(struct nf_conn *ct) { struct nf_conn_tstamp *tstamp; refcount_inc(&ct->ct_general.use); /* set conntrack timestamp, if enabled. */ tstamp = nf_conn_tstamp_find(ct); if (tstamp) tstamp->start = ktime_get_real_ns(); } /** * nf_ct_match_reverse - check if ct1 and ct2 refer to identical flow * @ct1: conntrack in hash table to check against * @ct2: merge candidate * * returns true if ct1 and ct2 happen to refer to the same flow, but * in opposing directions, i.e. * ct1: a:b -> c:d * ct2: c:d -> a:b * for both directions. If so, @ct2 should not have been created * as the skb should have been picked up as ESTABLISHED flow. * But ct1 was not yet committed to hash table before skb that created * ct2 had arrived. * * Note we don't compare netns because ct entries in different net * namespace cannot clash to begin with. * * @return: true if ct1 and ct2 are identical when swapping origin/reply. */ static bool nf_ct_match_reverse(const struct nf_conn *ct1, const struct nf_conn *ct2) { u16 id1, id2; if (!nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, &ct2->tuplehash[IP_CT_DIR_REPLY].tuple)) return false; if (!nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) return false; id1 = nf_ct_zone_id(nf_ct_zone(ct1), IP_CT_DIR_ORIGINAL); id2 = nf_ct_zone_id(nf_ct_zone(ct2), IP_CT_DIR_REPLY); if (id1 != id2) return false; id1 = nf_ct_zone_id(nf_ct_zone(ct1), IP_CT_DIR_REPLY); id2 = nf_ct_zone_id(nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL); return id1 == id2; } static int nf_ct_can_merge(const struct nf_conn *ct, const struct nf_conn *loser_ct) { return nf_ct_match(ct, loser_ct) || nf_ct_match_reverse(ct, loser_ct); } /* caller must hold locks to prevent concurrent changes */ static int __nf_ct_resolve_clash(struct sk_buff *skb, struct nf_conntrack_tuple_hash *h) { /* This is the conntrack entry already in hashes that won race. */ struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); enum ip_conntrack_info ctinfo; struct nf_conn *loser_ct; loser_ct = nf_ct_get(skb, &ctinfo); if (nf_ct_can_merge(ct, loser_ct)) { struct net *net = nf_ct_net(ct); nf_conntrack_get(&ct->ct_general); nf_ct_acct_merge(ct, ctinfo, loser_ct); nf_ct_put(loser_ct); nf_ct_set(skb, ct, ctinfo); NF_CT_STAT_INC(net, clash_resolve); return NF_ACCEPT; } return NF_DROP; } /** * nf_ct_resolve_clash_harder - attempt to insert clashing conntrack entry * * @skb: skb that causes the collision * @repl_idx: hash slot for reply direction * * Called when origin or reply direction had a clash. * The skb can be handled without packet drop provided the reply direction * is unique or there the existing entry has the identical tuple in both * directions. * * Caller must hold conntrack table locks to prevent concurrent updates. * * Returns NF_DROP if the clash could not be handled. */ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx) { struct nf_conn *loser_ct = (struct nf_conn *)skb_nfct(skb); const struct nf_conntrack_zone *zone; struct nf_conntrack_tuple_hash *h; struct hlist_nulls_node *n; struct net *net; zone = nf_ct_zone(loser_ct); net = nf_ct_net(loser_ct); /* Reply direction must never result in a clash, unless both origin * and reply tuples are identical. */ hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[repl_idx], hnnode) { if (nf_ct_key_equal(h, &loser_ct->tuplehash[IP_CT_DIR_REPLY].tuple, zone, net)) return __nf_ct_resolve_clash(skb, h); } /* We want the clashing entry to go away real soon: 1 second timeout. */ WRITE_ONCE(loser_ct->timeout, nfct_time_stamp + HZ); /* IPS_NAT_CLASH removes the entry automatically on the first * reply. Also prevents UDP tracker from moving the entry to * ASSURED state, i.e. the entry can always be evicted under * pressure. */ loser_ct->status |= IPS_FIXED_TIMEOUT | IPS_NAT_CLASH; __nf_conntrack_insert_prepare(loser_ct); /* fake add for ORIGINAL dir: we want lookups to only find the entry * already in the table. This also hides the clashing entry from * ctnetlink iteration, i.e. conntrack -L won't show them. */ hlist_nulls_add_fake(&loser_ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); hlist_nulls_add_head_rcu(&loser_ct->tuplehash[IP_CT_DIR_REPLY].hnnode, &nf_conntrack_hash[repl_idx]); NF_CT_STAT_INC(net, clash_resolve); return NF_ACCEPT; } /** * nf_ct_resolve_clash - attempt to handle clash without packet drop * * @skb: skb that causes the clash * @h: tuplehash of the clashing entry already in table * @reply_hash: hash slot for reply direction * * A conntrack entry can be inserted to the connection tracking table * if there is no existing entry with an identical tuple. * * If there is one, @skb (and the associated, unconfirmed conntrack) has * to be dropped. In case @skb is retransmitted, next conntrack lookup * will find the already-existing entry. * * The major problem with such packet drop is the extra delay added by * the packet loss -- it will take some time for a retransmit to occur * (or the sender to time out when waiting for a reply). * * This function attempts to handle the situation without packet drop. * * If @skb has no NAT transformation or if the colliding entries are * exactly the same, only the to-be-confirmed conntrack entry is discarded * and @skb is associated with the conntrack entry already in the table. * * Failing that, the new, unconfirmed conntrack is still added to the table * provided that the collision only occurs in the ORIGINAL direction. * The new entry will be added only in the non-clashing REPLY direction, * so packets in the ORIGINAL direction will continue to match the existing * entry. The new entry will also have a fixed timeout so it expires -- * due to the collision, it will only see reply traffic. * * Returns NF_DROP if the clash could not be resolved. */ static __cold noinline int nf_ct_resolve_clash(struct sk_buff *skb, struct nf_conntrack_tuple_hash *h, u32 reply_hash) { /* This is the conntrack entry already in hashes that won race. */ struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); const struct nf_conntrack_l4proto *l4proto; enum ip_conntrack_info ctinfo; struct nf_conn *loser_ct; struct net *net; int ret; loser_ct = nf_ct_get(skb, &ctinfo); net = nf_ct_net(loser_ct); l4proto = nf_ct_l4proto_find(nf_ct_protonum(ct)); if (!l4proto->allow_clash) goto drop; ret = __nf_ct_resolve_clash(skb, h); if (ret == NF_ACCEPT) return ret; ret = nf_ct_resolve_clash_harder(skb, reply_hash); if (ret == NF_ACCEPT) return ret; drop: NF_CT_STAT_INC(net, drop); NF_CT_STAT_INC(net, insert_failed); return NF_DROP; } /* Confirm a connection given skb; places it in hash table */ int __nf_conntrack_confirm(struct sk_buff *skb) { unsigned int chainlen = 0, sequence, max_chainlen; const struct nf_conntrack_zone *zone; unsigned int hash, reply_hash; struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; struct nf_conn_help *help; struct hlist_nulls_node *n; enum ip_conntrack_info ctinfo; struct net *net; int ret = NF_DROP; ct = nf_ct_get(skb, &ctinfo); net = nf_ct_net(ct); /* ipt_REJECT uses nf_conntrack_attach to attach related ICMP/TCP RST packets in other direction. Actual packet which created connection will be IP_CT_NEW or for an expected connection, IP_CT_RELATED. */ if (CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) return NF_ACCEPT; zone = nf_ct_zone(ct); local_bh_disable(); do { sequence = read_seqcount_begin(&nf_conntrack_generation); /* reuse the hash saved before */ hash = *(unsigned long *)&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev; hash = scale_hash(hash); reply_hash = hash_conntrack(net, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY)); } while (nf_conntrack_double_lock(net, hash, reply_hash, sequence)); /* We're not in hash table, and we refuse to set up related * connections for unconfirmed conns. But packet copies and * REJECT will give spurious warnings here. */ /* Another skb with the same unconfirmed conntrack may * win the race. This may happen for bridge(br_flood) * or broadcast/multicast packets do skb_clone with * unconfirmed conntrack. */ if (unlikely(nf_ct_is_confirmed(ct))) { WARN_ON_ONCE(1); nf_conntrack_double_unlock(hash, reply_hash); local_bh_enable(); return NF_DROP; } if (!nf_ct_ext_valid_pre(ct->ext)) { NF_CT_STAT_INC(net, insert_failed); goto dying; } /* We have to check the DYING flag after unlink to prevent * a race against nf_ct_get_next_corpse() possibly called from * user context, else we insert an already 'dead' hash, blocking * further use of that particular connection -JM. */ ct->status |= IPS_CONFIRMED; if (unlikely(nf_ct_is_dying(ct))) { NF_CT_STAT_INC(net, insert_failed); goto dying; } max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN); /* See if there's one in the list already, including reverse: NAT could have grabbed it without realizing, since we're not in the hash. If there is, we lost race. */ hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode) { if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, zone, net)) goto out; if (chainlen++ > max_chainlen) goto chaintoolong; } chainlen = 0; hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode) { if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, zone, net)) goto out; if (chainlen++ > max_chainlen) { chaintoolong: NF_CT_STAT_INC(net, chaintoolong); NF_CT_STAT_INC(net, insert_failed); ret = NF_DROP; goto dying; } } /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in weird delay cases. */ ct->timeout += nfct_time_stamp; __nf_conntrack_insert_prepare(ct); /* Since the lookup is lockless, hash insertion must be done after * starting the timer and setting the CONFIRMED bit. The RCU barriers * guarantee that no other CPU can find the conntrack before the above * stores are visible. */ __nf_conntrack_hash_insert(ct, hash, reply_hash); nf_conntrack_double_unlock(hash, reply_hash); local_bh_enable(); /* ext area is still valid (rcu read lock is held, * but will go out of scope soon, we need to remove * this conntrack again. */ if (!nf_ct_ext_valid_post(ct->ext)) { nf_ct_kill(ct); NF_CT_STAT_INC_ATOMIC(net, drop); return NF_DROP; } help = nfct_help(ct); if (help && help->helper) nf_conntrack_event_cache(IPCT_HELPER, ct); nf_conntrack_event_cache(master_ct(ct) ? IPCT_RELATED : IPCT_NEW, ct); return NF_ACCEPT; out: ret = nf_ct_resolve_clash(skb, h, reply_hash); dying: nf_conntrack_double_unlock(hash, reply_hash); local_bh_enable(); return ret; } EXPORT_SYMBOL_GPL(__nf_conntrack_confirm); /* Returns true if a connection corresponds to the tuple (required for NAT). */ int nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, const struct nf_conn *ignored_conntrack) { struct net *net = nf_ct_net(ignored_conntrack); const struct nf_conntrack_zone *zone; struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; unsigned int hash, hsize; struct hlist_nulls_node *n; struct nf_conn *ct; zone = nf_ct_zone(ignored_conntrack); rcu_read_lock(); begin: nf_conntrack_get_ht(&ct_hash, &hsize); hash = __hash_conntrack(net, tuple, nf_ct_zone_id(zone, IP_CT_DIR_REPLY), hsize); hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) { ct = nf_ct_tuplehash_to_ctrack(h); if (ct == ignored_conntrack) continue; if (nf_ct_is_expired(ct)) { nf_ct_gc_expired(ct); continue; } if (nf_ct_key_equal(h, tuple, zone, net)) { /* Tuple is taken already, so caller will need to find * a new source port to use. * * Only exception: * If the *original tuples* are identical, then both * conntracks refer to the same flow. * This is a rare situation, it can occur e.g. when * more than one UDP packet is sent from same socket * in different threads. * * Let nf_ct_resolve_clash() deal with this later. */ if (nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && nf_ct_zone_equal(ct, zone, IP_CT_DIR_ORIGINAL)) continue; NF_CT_STAT_INC_ATOMIC(net, found); rcu_read_unlock(); return 1; } } if (get_nulls_value(n) != hash) { NF_CT_STAT_INC_ATOMIC(net, search_restart); goto begin; } rcu_read_unlock(); return 0; } EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken); #define NF_CT_EVICTION_RANGE 8 /* There's a small race here where we may free a just-assured connection. Too bad: we're in trouble anyway. */ static unsigned int early_drop_list(struct net *net, struct hlist_nulls_head *head) { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_node *n; unsigned int drops = 0; struct nf_conn *tmp; hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) { tmp = nf_ct_tuplehash_to_ctrack(h); if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); continue; } if (test_bit(IPS_ASSURED_BIT, &tmp->status) || !net_eq(nf_ct_net(tmp), net) || nf_ct_is_dying(tmp)) continue; if (!refcount_inc_not_zero(&tmp->ct_general.use)) continue; /* load ->ct_net and ->status after refcount increase */ smp_acquire__after_ctrl_dep(); /* kill only if still in same netns -- might have moved due to * SLAB_TYPESAFE_BY_RCU rules. * * We steal the timer reference. If that fails timer has * already fired or someone else deleted it. Just drop ref * and move to next entry. */ if (net_eq(nf_ct_net(tmp), net) && nf_ct_is_confirmed(tmp) && nf_ct_delete(tmp, 0, 0)) drops++; nf_ct_put(tmp); } return drops; } static noinline int early_drop(struct net *net, unsigned int hash) { unsigned int i, bucket; for (i = 0; i < NF_CT_EVICTION_RANGE; i++) { struct hlist_nulls_head *ct_hash; unsigned int hsize, drops; rcu_read_lock(); nf_conntrack_get_ht(&ct_hash, &hsize); if (!i) bucket = reciprocal_scale(hash, hsize); else bucket = (bucket + 1) % hsize; drops = early_drop_list(net, &ct_hash[bucket]); rcu_read_unlock(); if (drops) { NF_CT_STAT_ADD_ATOMIC(net, early_drop, drops); return true; } } return false; } static bool gc_worker_skip_ct(const struct nf_conn *ct) { return !nf_ct_is_confirmed(ct) || nf_ct_is_dying(ct); } static bool gc_worker_can_early_drop(const struct nf_conn *ct) { const struct nf_conntrack_l4proto *l4proto; u8 protonum = nf_ct_protonum(ct); if (!test_bit(IPS_ASSURED_BIT, &ct->status)) return true; l4proto = nf_ct_l4proto_find(protonum); if (l4proto->can_early_drop && l4proto->can_early_drop(ct)) return true; return false; } static void gc_worker(struct work_struct *work) { unsigned int i, hashsz, nf_conntrack_max95 = 0; u32 end_time, start_time = nfct_time_stamp; struct conntrack_gc_work *gc_work; unsigned int expired_count = 0; unsigned long next_run; s32 delta_time; long count; gc_work = container_of(work, struct conntrack_gc_work, dwork.work); i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u; if (i == 0) { gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT; gc_work->count = GC_SCAN_INITIAL_COUNT; gc_work->start_time = start_time; } next_run = gc_work->avg_timeout; count = gc_work->count; end_time = start_time + GC_SCAN_MAX_DURATION; do { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; struct hlist_nulls_node *n; struct nf_conn *tmp; rcu_read_lock(); nf_conntrack_get_ht(&ct_hash, &hashsz); if (i >= hashsz) { rcu_read_unlock(); break; } hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct nf_conntrack_net *cnet; struct net *net; long expires; tmp = nf_ct_tuplehash_to_ctrack(h); if (expired_count > GC_SCAN_EXPIRED_MAX) { rcu_read_unlock(); gc_work->next_bucket = i; gc_work->avg_timeout = next_run; gc_work->count = count; delta_time = nfct_time_stamp - gc_work->start_time; /* re-sched immediately if total cycle time is exceeded */ next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; goto early_exit; } if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); expired_count++; continue; } expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); expires = (expires - (long)next_run) / ++count; next_run += expires; if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) continue; net = nf_ct_net(tmp); cnet = nf_ct_pernet(net); if (atomic_read(&cnet->count) < nf_conntrack_max95) continue; /* need to take reference to avoid possible races */ if (!refcount_inc_not_zero(&tmp->ct_general.use)) continue; /* load ->status after refcount increase */ smp_acquire__after_ctrl_dep(); if (gc_worker_skip_ct(tmp)) { nf_ct_put(tmp); continue; } if (gc_worker_can_early_drop(tmp)) { nf_ct_kill(tmp); expired_count++; } nf_ct_put(tmp); } /* could check get_nulls_value() here and restart if ct * was moved to another chain. But given gc is best-effort * we will just continue with next hash slot. */ rcu_read_unlock(); cond_resched(); i++; delta_time = nfct_time_stamp - end_time; if (delta_time > 0 && i < hashsz) { gc_work->avg_timeout = next_run; gc_work->count = count; gc_work->next_bucket = i; next_run = 0; goto early_exit; } } while (i < hashsz); gc_work->next_bucket = 0; next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1); if (next_run > (unsigned long)delta_time) next_run -= delta_time; else next_run = 1; early_exit: if (gc_work->exiting) return; if (next_run) gc_work->early_drop = false; queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); } static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); gc_work->exiting = false; } static struct nf_conn * __nf_conntrack_alloc(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *orig, const struct nf_conntrack_tuple *repl, gfp_t gfp, u32 hash) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); unsigned int ct_count; struct nf_conn *ct; /* We don't want any race condition at early drop stage */ ct_count = atomic_inc_return(&cnet->count); if (nf_conntrack_max && unlikely(ct_count > nf_conntrack_max)) { if (!early_drop(net, hash)) { if (!conntrack_gc_work.early_drop) conntrack_gc_work.early_drop = true; atomic_dec(&cnet->count); net_warn_ratelimited("nf_conntrack: table full, dropping packet\n"); return ERR_PTR(-ENOMEM); } } /* * Do not use kmem_cache_zalloc(), as this cache uses * SLAB_TYPESAFE_BY_RCU. */ ct = kmem_cache_alloc(nf_conntrack_cachep, gfp); if (ct == NULL) goto out; spin_lock_init(&ct->lock); ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig; ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.pprev = NULL; ct->tuplehash[IP_CT_DIR_REPLY].tuple = *repl; /* save hash for reusing when confirming */ *(unsigned long *)(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev) = hash; ct->status = 0; WRITE_ONCE(ct->timeout, 0); write_pnet(&ct->ct_net, net); memset_after(ct, 0, __nfct_init_offset); nf_ct_zone_add(ct, zone); /* Because we use RCU lookups, we set ct_general.use to zero before * this is inserted in any list. */ refcount_set(&ct->ct_general.use, 0); return ct; out: atomic_dec(&cnet->count); return ERR_PTR(-ENOMEM); } struct nf_conn *nf_conntrack_alloc(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *orig, const struct nf_conntrack_tuple *repl, gfp_t gfp) { return __nf_conntrack_alloc(net, zone, orig, repl, gfp, 0); } EXPORT_SYMBOL_GPL(nf_conntrack_alloc); void nf_conntrack_free(struct nf_conn *ct) { struct net *net = nf_ct_net(ct); struct nf_conntrack_net *cnet; /* A freed object has refcnt == 0, that's * the golden rule for SLAB_TYPESAFE_BY_RCU */ WARN_ON(refcount_read(&ct->ct_general.use) != 0); if (ct->status & IPS_SRC_NAT_DONE) { const struct nf_nat_hook *nat_hook; rcu_read_lock(); nat_hook = rcu_dereference(nf_nat_hook); if (nat_hook) nat_hook->remove_nat_bysrc(ct); rcu_read_unlock(); } kfree(ct->ext); kmem_cache_free(nf_conntrack_cachep, ct); cnet = nf_ct_pernet(net); smp_mb__before_atomic(); atomic_dec(&cnet->count); } EXPORT_SYMBOL_GPL(nf_conntrack_free); /* Allocate a new conntrack: we return -ENOMEM if classification failed due to stress. Otherwise it really is unclassifiable. */ static noinline struct nf_conntrack_tuple_hash * init_conntrack(struct net *net, struct nf_conn *tmpl, const struct nf_conntrack_tuple *tuple, struct sk_buff *skb, unsigned int dataoff, u32 hash) { struct nf_conn *ct; struct nf_conn_help *help; struct nf_conntrack_tuple repl_tuple; #ifdef CONFIG_NF_CONNTRACK_EVENTS struct nf_conntrack_ecache *ecache; #endif struct nf_conntrack_expect *exp = NULL; const struct nf_conntrack_zone *zone; struct nf_conn_timeout *timeout_ext; struct nf_conntrack_zone tmp; struct nf_conntrack_net *cnet; if (!nf_ct_invert_tuple(&repl_tuple, tuple)) return NULL; zone = nf_ct_zone_tmpl(tmpl, skb, &tmp); ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC, hash); if (IS_ERR(ct)) return ERR_CAST(ct); if (!nf_ct_add_synproxy(ct, tmpl)) { nf_conntrack_free(ct); return ERR_PTR(-ENOMEM); } timeout_ext = tmpl ? nf_ct_timeout_find(tmpl) : NULL; if (timeout_ext) nf_ct_timeout_ext_add(ct, rcu_dereference(timeout_ext->timeout), GFP_ATOMIC); nf_ct_acct_ext_add(ct, GFP_ATOMIC); nf_ct_tstamp_ext_add(ct, GFP_ATOMIC); nf_ct_labels_ext_add(ct); #ifdef CONFIG_NF_CONNTRACK_EVENTS ecache = tmpl ? nf_ct_ecache_find(tmpl) : NULL; if ((ecache || net->ct.sysctl_events) && !nf_ct_ecache_ext_add(ct, ecache ? ecache->ctmask : 0, ecache ? ecache->expmask : 0, GFP_ATOMIC)) { nf_conntrack_free(ct); return ERR_PTR(-ENOMEM); } #endif cnet = nf_ct_pernet(net); if (cnet->expect_count) { spin_lock_bh(&nf_conntrack_expect_lock); exp = nf_ct_find_expectation(net, zone, tuple, !tmpl || nf_ct_is_confirmed(tmpl)); if (exp) { /* Welcome, Mr. Bond. We've been expecting you... */ __set_bit(IPS_EXPECTED_BIT, &ct->status); /* exp->master safe, refcnt bumped in nf_ct_find_expectation */ ct->master = exp->master; if (exp->helper) { help = nf_ct_helper_ext_add(ct, GFP_ATOMIC); if (help) rcu_assign_pointer(help->helper, exp->helper); } #ifdef CONFIG_NF_CONNTRACK_MARK ct->mark = READ_ONCE(exp->master->mark); #endif #ifdef CONFIG_NF_CONNTRACK_SECMARK ct->secmark = exp->master->secmark; #endif NF_CT_STAT_INC(net, expect_new); } spin_unlock_bh(&nf_conntrack_expect_lock); } if (!exp && tmpl) __nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC); /* Other CPU might have obtained a pointer to this object before it was * released. Because refcount is 0, refcount_inc_not_zero() will fail. * * After refcount_set(1) it will succeed; ensure that zeroing of * ct->status and the correct ct->net pointer are visible; else other * core might observe CONFIRMED bit which means the entry is valid and * in the hash table, but its not (anymore). */ smp_wmb(); /* Now it is going to be associated with an sk_buff, set refcount to 1. */ refcount_set(&ct->ct_general.use, 1); if (exp) { if (exp->expectfn) exp->expectfn(ct, exp); nf_ct_expect_put(exp); } return &ct->tuplehash[IP_CT_DIR_ORIGINAL]; } /* On success, returns 0, sets skb->_nfct | ctinfo */ static int resolve_normal_ct(struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, u_int8_t protonum, const struct nf_hook_state *state) { const struct nf_conntrack_zone *zone; struct nf_conntrack_tuple tuple; struct nf_conntrack_tuple_hash *h; enum ip_conntrack_info ctinfo; struct nf_conntrack_zone tmp; u32 hash, zone_id, rid; struct nf_conn *ct; if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, state->pf, protonum, state->net, &tuple)) return 0; /* look for tuple match */ zone = nf_ct_zone_tmpl(tmpl, skb, &tmp); zone_id = nf_ct_zone_id(zone, IP_CT_DIR_ORIGINAL); hash = hash_conntrack_raw(&tuple, zone_id, state->net); h = __nf_conntrack_find_get(state->net, zone, &tuple, hash); if (!h) { rid = nf_ct_zone_id(zone, IP_CT_DIR_REPLY); if (zone_id != rid) { u32 tmp = hash_conntrack_raw(&tuple, rid, state->net); h = __nf_conntrack_find_get(state->net, zone, &tuple, tmp); } } if (!h) { h = init_conntrack(state->net, tmpl, &tuple, skb, dataoff, hash); if (!h) return 0; if (IS_ERR(h)) return PTR_ERR(h); } ct = nf_ct_tuplehash_to_ctrack(h); /* It exists; we have (non-exclusive) reference. */ if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) { ctinfo = IP_CT_ESTABLISHED_REPLY; } else { unsigned long status = READ_ONCE(ct->status); /* Once we've had two way comms, always ESTABLISHED. */ if (likely(status & IPS_SEEN_REPLY)) ctinfo = IP_CT_ESTABLISHED; else if (status & IPS_EXPECTED) ctinfo = IP_CT_RELATED; else ctinfo = IP_CT_NEW; } nf_ct_set(skb, ct, ctinfo); return 0; } /* * icmp packets need special treatment to handle error messages that are * related to a connection. * * Callers need to check if skb has a conntrack assigned when this * helper returns; in such case skb belongs to an already known connection. */ static unsigned int __cold nf_conntrack_handle_icmp(struct nf_conn *tmpl, struct sk_buff *skb, unsigned int dataoff, u8 protonum, const struct nf_hook_state *state) { int ret; if (state->pf == NFPROTO_IPV4 && protonum == IPPROTO_ICMP) ret = nf_conntrack_icmpv4_error(tmpl, skb, dataoff, state); #if IS_ENABLED(CONFIG_IPV6) else if (state->pf == NFPROTO_IPV6 && protonum == IPPROTO_ICMPV6) ret = nf_conntrack_icmpv6_error(tmpl, skb, dataoff, state); #endif else return NF_ACCEPT; if (ret <= 0) NF_CT_STAT_INC_ATOMIC(state->net, error); return ret; } static int generic_packet(struct nf_conn *ct, struct sk_buff *skb, enum ip_conntrack_info ctinfo) { const unsigned int *timeout = nf_ct_timeout_lookup(ct); if (!timeout) timeout = &nf_generic_pernet(nf_ct_net(ct))->timeout; nf_ct_refresh_acct(ct, ctinfo, skb, *timeout); return NF_ACCEPT; } /* Returns verdict for packet, or -1 for invalid. */ static int nf_conntrack_handle_packet(struct nf_conn *ct, struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo, const struct nf_hook_state *state) { switch (nf_ct_protonum(ct)) { case IPPROTO_TCP: return nf_conntrack_tcp_packet(ct, skb, dataoff, ctinfo, state); case IPPROTO_UDP: return nf_conntrack_udp_packet(ct, skb, dataoff, ctinfo, state); case IPPROTO_ICMP: return nf_conntrack_icmp_packet(ct, skb, ctinfo, state); #if IS_ENABLED(CONFIG_IPV6) case IPPROTO_ICMPV6: return nf_conntrack_icmpv6_packet(ct, skb, ctinfo, state); #endif #ifdef CONFIG_NF_CT_PROTO_UDPLITE case IPPROTO_UDPLITE: return nf_conntrack_udplite_packet(ct, skb, dataoff, ctinfo, state); #endif #ifdef CONFIG_NF_CT_PROTO_SCTP case IPPROTO_SCTP: return nf_conntrack_sctp_packet(ct, skb, dataoff, ctinfo, state); #endif #ifdef CONFIG_NF_CT_PROTO_DCCP case IPPROTO_DCCP: return nf_conntrack_dccp_packet(ct, skb, dataoff, ctinfo, state); #endif #ifdef CONFIG_NF_CT_PROTO_GRE case IPPROTO_GRE: return nf_conntrack_gre_packet(ct, skb, dataoff, ctinfo, state); #endif } return generic_packet(ct, skb, ctinfo); } unsigned int nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state) { enum ip_conntrack_info ctinfo; struct nf_conn *ct, *tmpl; u_int8_t protonum; int dataoff, ret; tmpl = nf_ct_get(skb, &ctinfo); if (tmpl || ctinfo == IP_CT_UNTRACKED) { /* Previously seen (loopback or untracked)? Ignore. */ if ((tmpl && !nf_ct_is_template(tmpl)) || ctinfo == IP_CT_UNTRACKED) return NF_ACCEPT; skb->_nfct = 0; } /* rcu_read_lock()ed by nf_hook_thresh */ dataoff = get_l4proto(skb, skb_network_offset(skb), state->pf, &protonum); if (dataoff <= 0) { NF_CT_STAT_INC_ATOMIC(state->net, invalid); ret = NF_ACCEPT; goto out; } if (protonum == IPPROTO_ICMP || protonum == IPPROTO_ICMPV6) { ret = nf_conntrack_handle_icmp(tmpl, skb, dataoff, protonum, state); if (ret <= 0) { ret = -ret; goto out; } /* ICMP[v6] protocol trackers may assign one conntrack. */ if (skb->_nfct) goto out; } repeat: ret = resolve_normal_ct(tmpl, skb, dataoff, protonum, state); if (ret < 0) { /* Too stressed to deal. */ NF_CT_STAT_INC_ATOMIC(state->net, drop); ret = NF_DROP; goto out; } ct = nf_ct_get(skb, &ctinfo); if (!ct) { /* Not valid part of a connection */ NF_CT_STAT_INC_ATOMIC(state->net, invalid); ret = NF_ACCEPT; goto out; } ret = nf_conntrack_handle_packet(ct, skb, dataoff, ctinfo, state); if (ret <= 0) { /* Invalid: inverse of the return code tells * the netfilter core what to do */ nf_ct_put(ct); skb->_nfct = 0; /* Special case: TCP tracker reports an attempt to reopen a * closed/aborted connection. We have to go back and create a * fresh conntrack. */ if (ret == -NF_REPEAT) goto repeat; NF_CT_STAT_INC_ATOMIC(state->net, invalid); if (ret == NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); ret = -ret; goto out; } if (ctinfo == IP_CT_ESTABLISHED_REPLY && !test_and_set_bit(IPS_SEEN_REPLY_BIT, &ct->status)) nf_conntrack_event_cache(IPCT_REPLY, ct); out: if (tmpl) nf_ct_put(tmpl); return ret; } EXPORT_SYMBOL_GPL(nf_conntrack_in); /* Refresh conntrack for this many jiffies and do accounting if do_acct is 1 */ void __nf_ct_refresh_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo, u32 extra_jiffies, unsigned int bytes) { /* Only update if this is not a fixed timeout */ if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) goto acct; /* If not in hash table, timer will not be active yet */ if (nf_ct_is_confirmed(ct)) extra_jiffies += nfct_time_stamp; if (READ_ONCE(ct->timeout) != extra_jiffies) WRITE_ONCE(ct->timeout, extra_jiffies); acct: if (bytes) nf_ct_acct_update(ct, CTINFO2DIR(ctinfo), bytes); } EXPORT_SYMBOL_GPL(__nf_ct_refresh_acct); bool nf_ct_kill_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo, const struct sk_buff *skb) { nf_ct_acct_update(ct, CTINFO2DIR(ctinfo), skb->len); return nf_ct_delete(ct, 0, 0); } EXPORT_SYMBOL_GPL(nf_ct_kill_acct); #if IS_ENABLED(CONFIG_NF_CT_NETLINK) #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nfnetlink_conntrack.h> #include <linux/mutex.h> /* Generic function for tcp/udp/sctp/dccp and alike. */ int nf_ct_port_tuple_to_nlattr(struct sk_buff *skb, const struct nf_conntrack_tuple *tuple) { if (nla_put_be16(skb, CTA_PROTO_SRC_PORT, tuple->src.u.tcp.port) || nla_put_be16(skb, CTA_PROTO_DST_PORT, tuple->dst.u.tcp.port)) goto nla_put_failure; return 0; nla_put_failure: return -1; } EXPORT_SYMBOL_GPL(nf_ct_port_tuple_to_nlattr); const struct nla_policy nf_ct_port_nla_policy[CTA_PROTO_MAX+1] = { [CTA_PROTO_SRC_PORT] = { .type = NLA_U16 }, [CTA_PROTO_DST_PORT] = { .type = NLA_U16 }, }; EXPORT_SYMBOL_GPL(nf_ct_port_nla_policy); int nf_ct_port_nlattr_to_tuple(struct nlattr *tb[], struct nf_conntrack_tuple *t, u_int32_t flags) { if (flags & CTA_FILTER_FLAG(CTA_PROTO_SRC_PORT)) { if (!tb[CTA_PROTO_SRC_PORT]) return -EINVAL; t->src.u.tcp.port = nla_get_be16(tb[CTA_PROTO_SRC_PORT]); } if (flags & CTA_FILTER_FLAG(CTA_PROTO_DST_PORT)) { if (!tb[CTA_PROTO_DST_PORT]) return -EINVAL; t->dst.u.tcp.port = nla_get_be16(tb[CTA_PROTO_DST_PORT]); } return 0; } EXPORT_SYMBOL_GPL(nf_ct_port_nlattr_to_tuple); unsigned int nf_ct_port_nlattr_tuple_size(void) { static unsigned int size __read_mostly; if (!size) size = nla_policy_len(nf_ct_port_nla_policy, CTA_PROTO_MAX + 1); return size; } EXPORT_SYMBOL_GPL(nf_ct_port_nlattr_tuple_size); #endif /* Used by ipt_REJECT and ip6t_REJECT. */ static void nf_conntrack_attach(struct sk_buff *nskb, const struct sk_buff *skb) { struct nf_conn *ct; enum ip_conntrack_info ctinfo; /* This ICMP is in reverse direction to the packet which caused it */ ct = nf_ct_get(skb, &ctinfo); if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) ctinfo = IP_CT_RELATED_REPLY; else ctinfo = IP_CT_RELATED; /* Attach to new skbuff, and increment count */ nf_ct_set(nskb, ct, ctinfo); nf_conntrack_get(skb_nfct(nskb)); } /* This packet is coming from userspace via nf_queue, complete the packet * processing after the helper invocation in nf_confirm(). */ static int nf_confirm_cthelper(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { const struct nf_conntrack_helper *helper; const struct nf_conn_help *help; int protoff; help = nfct_help(ct); if (!help) return NF_ACCEPT; helper = rcu_dereference(help->helper); if (!helper) return NF_ACCEPT; if (!(helper->flags & NF_CT_HELPER_F_USERSPACE)) return NF_ACCEPT; switch (nf_ct_l3num(ct)) { case NFPROTO_IPV4: protoff = skb_network_offset(skb) + ip_hdrlen(skb); break; #if IS_ENABLED(CONFIG_IPV6) case NFPROTO_IPV6: { __be16 frag_off; u8 pnum; pnum = ipv6_hdr(skb)->nexthdr; protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &pnum, &frag_off); if (protoff < 0 || (frag_off & htons(~0x7)) != 0) return NF_ACCEPT; break; } #endif default: return NF_ACCEPT; } if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) && !nf_is_loopback_packet(skb)) { if (!nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) { NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop); return NF_DROP; } } /* We've seen it coming out the other side: confirm it */ return nf_conntrack_confirm(skb); } static int nf_conntrack_update(struct net *net, struct sk_buff *skb) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); if (!ct) return NF_ACCEPT; return nf_confirm_cthelper(skb, ct, ctinfo); } static bool nf_conntrack_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) { const struct nf_conntrack_tuple *src_tuple; const struct nf_conntrack_tuple_hash *hash; struct nf_conntrack_tuple srctuple; enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, &ctinfo); if (ct) { src_tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple)); return true; } if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), NFPROTO_IPV4, dev_net(skb->dev), &srctuple)) return false; hash = nf_conntrack_find_get(dev_net(skb->dev), &nf_ct_zone_dflt, &srctuple); if (!hash) return false; ct = nf_ct_tuplehash_to_ctrack(hash); src_tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple)); nf_ct_put(ct); return true; } /* Bring out ya dead! */ static struct nf_conn * get_next_corpse(int (*iter)(struct nf_conn *i, void *data), const struct nf_ct_iter_data *iter_data, unsigned int *bucket) { struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; struct hlist_nulls_node *n; spinlock_t *lockp; for (; *bucket < nf_conntrack_htable_size; (*bucket)++) { struct hlist_nulls_head *hslot = &nf_conntrack_hash[*bucket]; if (hlist_nulls_empty(hslot)) continue; lockp = &nf_conntrack_locks[*bucket % CONNTRACK_LOCKS]; local_bh_disable(); nf_conntrack_lock(lockp); hlist_nulls_for_each_entry(h, n, hslot, hnnode) { if (NF_CT_DIRECTION(h) != IP_CT_DIR_REPLY) continue; /* All nf_conn objects are added to hash table twice, one * for original direction tuple, once for the reply tuple. * * Exception: In the IPS_NAT_CLASH case, only the reply * tuple is added (the original tuple already existed for * a different object). * * We only need to call the iterator once for each * conntrack, so we just use the 'reply' direction * tuple while iterating. */ ct = nf_ct_tuplehash_to_ctrack(h); if (iter_data->net && !net_eq(iter_data->net, nf_ct_net(ct))) continue; if (iter(ct, iter_data->data)) goto found; } spin_unlock(lockp); local_bh_enable(); cond_resched(); } return NULL; found: refcount_inc(&ct->ct_general.use); spin_unlock(lockp); local_bh_enable(); return ct; } static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data), const struct nf_ct_iter_data *iter_data) { unsigned int bucket = 0; struct nf_conn *ct; might_sleep(); mutex_lock(&nf_conntrack_mutex); while ((ct = get_next_corpse(iter, iter_data, &bucket)) != NULL) { /* Time to push up daises... */ nf_ct_delete(ct, iter_data->portid, iter_data->report); nf_ct_put(ct); cond_resched(); } mutex_unlock(&nf_conntrack_mutex); } void nf_ct_iterate_cleanup_net(int (*iter)(struct nf_conn *i, void *data), const struct nf_ct_iter_data *iter_data) { struct net *net = iter_data->net; struct nf_conntrack_net *cnet = nf_ct_pernet(net); might_sleep(); if (atomic_read(&cnet->count) == 0) return; nf_ct_iterate_cleanup(iter, iter_data); } EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup_net); /** * nf_ct_iterate_destroy - destroy unconfirmed conntracks and iterate table * @iter: callback to invoke for each conntrack * @data: data to pass to @iter * * Like nf_ct_iterate_cleanup, but first marks conntracks on the * unconfirmed list as dying (so they will not be inserted into * main table). * * Can only be called in module exit path. */ void nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data) { struct nf_ct_iter_data iter_data = {}; struct net *net; down_read(&net_rwsem); for_each_net(net) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); if (atomic_read(&cnet->count) == 0) continue; nf_queue_nf_hook_drop(net); } up_read(&net_rwsem); /* Need to wait for netns cleanup worker to finish, if its * running -- it might have deleted a net namespace from * the global list, so hook drop above might not have * affected all namespaces. */ net_ns_barrier(); /* a skb w. unconfirmed conntrack could have been reinjected just * before we called nf_queue_nf_hook_drop(). * * This makes sure its inserted into conntrack table. */ synchronize_net(); nf_ct_ext_bump_genid(); iter_data.data = data; nf_ct_iterate_cleanup(iter, &iter_data); /* Another cpu might be in a rcu read section with * rcu protected pointer cleared in iter callback * or hidden via nf_ct_ext_bump_genid() above. * * Wait until those are done. */ synchronize_rcu(); } EXPORT_SYMBOL_GPL(nf_ct_iterate_destroy); static int kill_all(struct nf_conn *i, void *data) { return 1; } void nf_conntrack_cleanup_start(void) { cleanup_nf_conntrack_bpf(); conntrack_gc_work.exiting = true; } void nf_conntrack_cleanup_end(void) { RCU_INIT_POINTER(nf_ct_hook, NULL); cancel_delayed_work_sync(&conntrack_gc_work.dwork); kvfree(nf_conntrack_hash); nf_conntrack_proto_fini(); nf_conntrack_helper_fini(); nf_conntrack_expect_fini(); kmem_cache_destroy(nf_conntrack_cachep); } /* * Mishearing the voices in his head, our hero wonders how he's * supposed to kill the mall. */ void nf_conntrack_cleanup_net(struct net *net) { LIST_HEAD(single); list_add(&net->exit_list, &single); nf_conntrack_cleanup_net_list(&single); } void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list) { struct nf_ct_iter_data iter_data = {}; struct net *net; int busy; /* * This makes sure all current packets have passed through * netfilter framework. Roll on, two-stage module * delete... */ synchronize_rcu_expedited(); i_see_dead_people: busy = 0; list_for_each_entry(net, net_exit_list, exit_list) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); iter_data.net = net; nf_ct_iterate_cleanup_net(kill_all, &iter_data); if (atomic_read(&cnet->count) != 0) busy = 1; } if (busy) { schedule(); goto i_see_dead_people; } list_for_each_entry(net, net_exit_list, exit_list) { nf_conntrack_ecache_pernet_fini(net); nf_conntrack_expect_pernet_fini(net); free_percpu(net->ct.stat); } } void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls) { struct hlist_nulls_head *hash; unsigned int nr_slots, i; if (*sizep > (INT_MAX / sizeof(struct hlist_nulls_head))) return NULL; BUILD_BUG_ON(sizeof(struct hlist_nulls_head) != sizeof(struct hlist_head)); nr_slots = *sizep = roundup(*sizep, PAGE_SIZE / sizeof(struct hlist_nulls_head)); if (nr_slots > (INT_MAX / sizeof(struct hlist_nulls_head))) return NULL; hash = kvcalloc(nr_slots, sizeof(struct hlist_nulls_head), GFP_KERNEL); if (hash && nulls) for (i = 0; i < nr_slots; i++) INIT_HLIST_NULLS_HEAD(&hash[i], i); return hash; } EXPORT_SYMBOL_GPL(nf_ct_alloc_hashtable); int nf_conntrack_hash_resize(unsigned int hashsize) { int i, bucket; unsigned int old_size; struct hlist_nulls_head *hash, *old_hash; struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; if (!hashsize) return -EINVAL; hash = nf_ct_alloc_hashtable(&hashsize, 1); if (!hash) return -ENOMEM; mutex_lock(&nf_conntrack_mutex); old_size = nf_conntrack_htable_size; if (old_size == hashsize) { mutex_unlock(&nf_conntrack_mutex); kvfree(hash); return 0; } local_bh_disable(); nf_conntrack_all_lock(); write_seqcount_begin(&nf_conntrack_generation); /* Lookups in the old hash might happen in parallel, which means we * might get false negatives during connection lookup. New connections * created because of a false negative won't make it into the hash * though since that required taking the locks. */ for (i = 0; i < nf_conntrack_htable_size; i++) { while (!hlist_nulls_empty(&nf_conntrack_hash[i])) { unsigned int zone_id; h = hlist_nulls_entry(nf_conntrack_hash[i].first, struct nf_conntrack_tuple_hash, hnnode); ct = nf_ct_tuplehash_to_ctrack(h); hlist_nulls_del_rcu(&h->hnnode); zone_id = nf_ct_zone_id(nf_ct_zone(ct), NF_CT_DIRECTION(h)); bucket = __hash_conntrack(nf_ct_net(ct), &h->tuple, zone_id, hashsize); hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]); } } old_hash = nf_conntrack_hash; nf_conntrack_hash = hash; nf_conntrack_htable_size = hashsize; write_seqcount_end(&nf_conntrack_generation); nf_conntrack_all_unlock(); local_bh_enable(); mutex_unlock(&nf_conntrack_mutex); synchronize_net(); kvfree(old_hash); return 0; } int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp) { unsigned int hashsize; int rc; if (current->nsproxy->net_ns != &init_net) return -EOPNOTSUPP; /* On boot, we can set this without any fancy locking. */ if (!nf_conntrack_hash) return param_set_uint(val, kp); rc = kstrtouint(val, 0, &hashsize); if (rc) return rc; return nf_conntrack_hash_resize(hashsize); } int nf_conntrack_init_start(void) { unsigned long nr_pages = totalram_pages(); int max_factor = 8; int ret = -ENOMEM; int i; seqcount_spinlock_init(&nf_conntrack_generation, &nf_conntrack_locks_all_lock); for (i = 0; i < CONNTRACK_LOCKS; i++) spin_lock_init(&nf_conntrack_locks[i]); if (!nf_conntrack_htable_size) { nf_conntrack_htable_size = (((nr_pages << PAGE_SHIFT) / 16384) / sizeof(struct hlist_head)); if (BITS_PER_LONG >= 64 && nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) nf_conntrack_htable_size = 262144; else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE)) nf_conntrack_htable_size = 65536; if (nf_conntrack_htable_size < 1024) nf_conntrack_htable_size = 1024; /* Use a max. factor of one by default to keep the average * hash chain length at 2 entries. Each entry has to be added * twice (once for original direction, once for reply). * When a table size is given we use the old value of 8 to * avoid implicit reduction of the max entries setting. */ max_factor = 1; } nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1); if (!nf_conntrack_hash) return -ENOMEM; nf_conntrack_max = max_factor * nf_conntrack_htable_size; nf_conntrack_cachep = kmem_cache_create("nf_conntrack", sizeof(struct nf_conn), NFCT_INFOMASK + 1, SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN, NULL); if (!nf_conntrack_cachep) goto err_cachep; ret = nf_conntrack_expect_init(); if (ret < 0) goto err_expect; ret = nf_conntrack_helper_init(); if (ret < 0) goto err_helper; ret = nf_conntrack_proto_init(); if (ret < 0) goto err_proto; conntrack_gc_work_init(&conntrack_gc_work); queue_delayed_work(system_power_efficient_wq, &conntrack_gc_work.dwork, HZ); ret = register_nf_conntrack_bpf(); if (ret < 0) goto err_kfunc; return 0; err_kfunc: cancel_delayed_work_sync(&conntrack_gc_work.dwork); nf_conntrack_proto_fini(); err_proto: nf_conntrack_helper_fini(); err_helper: nf_conntrack_expect_fini(); err_expect: kmem_cache_destroy(nf_conntrack_cachep); err_cachep: kvfree(nf_conntrack_hash); return ret; } static void nf_conntrack_set_closing(struct nf_conntrack *nfct) { struct nf_conn *ct = nf_ct_to_nf_conn(nfct); switch (nf_ct_protonum(ct)) { case IPPROTO_TCP: nf_conntrack_tcp_set_closing(ct); break; } } static const struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = nf_ct_destroy, .get_tuple_skb = nf_conntrack_get_tuple_skb, .attach = nf_conntrack_attach, .set_closing = nf_conntrack_set_closing, .confirm = __nf_conntrack_confirm, .get_id = nf_conntrack_get_id, }; void nf_conntrack_init_end(void) { RCU_INIT_POINTER(nf_ct_hook, &nf_conntrack_hook); } /* * We need to use special "null" values, not used in hash table */ #define UNCONFIRMED_NULLS_VAL ((1<<30)+0) int nf_conntrack_init_net(struct net *net) { struct nf_conntrack_net *cnet = nf_ct_pernet(net); int ret = -ENOMEM; BUILD_BUG_ON(IP_CT_UNTRACKED == IP_CT_NUMBER); BUILD_BUG_ON_NOT_POWER_OF_2(CONNTRACK_LOCKS); atomic_set(&cnet->count, 0); net->ct.stat = alloc_percpu(struct ip_conntrack_stat); if (!net->ct.stat) return ret; ret = nf_conntrack_expect_pernet_init(net); if (ret < 0) goto err_expect; nf_conntrack_acct_pernet_init(net); nf_conntrack_tstamp_pernet_init(net); nf_conntrack_ecache_pernet_init(net); nf_conntrack_proto_pernet_init(net); return 0; err_expect: free_percpu(net->ct.stat); return ret; } /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ int __nf_ct_change_timeout(struct nf_conn *ct, u64 timeout) { if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) return -EPERM; __nf_ct_set_timeout(ct, timeout); if (test_bit(IPS_DYING_BIT, &ct->status)) return -ETIME; return 0; } EXPORT_SYMBOL_GPL(__nf_ct_change_timeout); void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off) { unsigned int bit; /* Ignore these unchangable bits */ on &= ~IPS_UNCHANGEABLE_MASK; off &= ~IPS_UNCHANGEABLE_MASK; for (bit = 0; bit < __IPS_MAX_BIT; bit++) { if (on & (1 << bit)) set_bit(bit, &ct->status); else if (off & (1 << bit)) clear_bit(bit, &ct->status); } } EXPORT_SYMBOL_GPL(__nf_ct_change_status); int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status) { unsigned long d; d = ct->status ^ status; if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING)) /* unchangeable */ return -EBUSY; if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) /* SEEN_REPLY bit can only be set */ return -EBUSY; if (d & IPS_ASSURED && !(status & IPS_ASSURED)) /* ASSURED bit can only be set */ return -EBUSY; __nf_ct_change_status(ct, status, 0); return 0; } EXPORT_SYMBOL_GPL(nf_ct_change_status_common); |
| 15 2 15 15 14 2 2 2 2 2 11 11 11 11 11 11 11 11 11 11 5 5 5 11 11 11 11 11 20 21 20 21 22 22 22 22 21 9 15 15 6 2 2 4 4 2 2 2 2 2 15 15 3 1 1 3 1 1 1 1 1 1 1 3 3 1 1 3 3 3 1 1 3 3 3 3 1 2 2 2 2 2 2 2 1 3 1 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 | // SPDX-License-Identifier: GPL-2.0-or-later /* * inet fragments management * * Authors: Pavel Emelyanov <xemul@openvz.org> * Started as consolidation of ipv4/ip_fragment.c, * ipv6/reassembly. and ipv6 nf conntrack reassembly */ #include <linux/list.h> #include <linux/spinlock.h> #include <linux/module.h> #include <linux/timer.h> #include <linux/mm.h> #include <linux/random.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/slab.h> #include <linux/rhashtable.h> #include <net/sock.h> #include <net/inet_frag.h> #include <net/inet_ecn.h> #include <net/ip.h> #include <net/ipv6.h> #include "../core/sock_destructor.h" /* Use skb->cb to track consecutive/adjacent fragments coming at * the end of the queue. Nodes in the rb-tree queue will * contain "runs" of one or more adjacent fragments. * * Invariants: * - next_frag is NULL at the tail of a "run"; * - the head of a "run" has the sum of all fragment lengths in frag_run_len. */ struct ipfrag_skb_cb { union { struct inet_skb_parm h4; struct inet6_skb_parm h6; }; struct sk_buff *next_frag; int frag_run_len; int ip_defrag_offset; }; #define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) static void fragcb_clear(struct sk_buff *skb) { RB_CLEAR_NODE(&skb->rbnode); FRAG_CB(skb)->next_frag = NULL; FRAG_CB(skb)->frag_run_len = skb->len; } /* Append skb to the last "run". */ static void fragrun_append_to_last(struct inet_frag_queue *q, struct sk_buff *skb) { fragcb_clear(skb); FRAG_CB(q->last_run_head)->frag_run_len += skb->len; FRAG_CB(q->fragments_tail)->next_frag = skb; q->fragments_tail = skb; } /* Create a new "run" with the skb. */ static void fragrun_create(struct inet_frag_queue *q, struct sk_buff *skb) { BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); fragcb_clear(skb); if (q->last_run_head) rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, &q->last_run_head->rbnode.rb_right); else rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); rb_insert_color(&skb->rbnode, &q->rb_fragments); q->fragments_tail = skb; q->last_run_head = skb; } /* Given the OR values of all fragments, apply RFC 3168 5.3 requirements * Value : 0xff if frame should be dropped. * 0 or INET_ECN_CE value, to be ORed in to final iph->tos field */ const u8 ip_frag_ecn_table[16] = { /* at least one fragment had CE, and others ECT_0 or ECT_1 */ [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0] = INET_ECN_CE, [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_1] = INET_ECN_CE, [IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = INET_ECN_CE, /* invalid combinations : drop frame */ [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_0] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_1] = 0xff, [IPFRAG_ECN_NOT_ECT | IPFRAG_ECN_CE | IPFRAG_ECN_ECT_0 | IPFRAG_ECN_ECT_1] = 0xff, }; EXPORT_SYMBOL(ip_frag_ecn_table); int inet_frags_init(struct inet_frags *f) { f->frags_cachep = kmem_cache_create(f->frags_cache_name, f->qsize, 0, 0, NULL); if (!f->frags_cachep) return -ENOMEM; refcount_set(&f->refcnt, 1); init_completion(&f->completion); return 0; } EXPORT_SYMBOL(inet_frags_init); void inet_frags_fini(struct inet_frags *f) { if (refcount_dec_and_test(&f->refcnt)) complete(&f->completion); wait_for_completion(&f->completion); kmem_cache_destroy(f->frags_cachep); f->frags_cachep = NULL; } EXPORT_SYMBOL(inet_frags_fini); /* called from rhashtable_free_and_destroy() at netns_frags dismantle */ static void inet_frags_free_cb(void *ptr, void *arg) { struct inet_frag_queue *fq = ptr; int count; count = timer_delete_sync(&fq->timer) ? 1 : 0; spin_lock_bh(&fq->lock); fq->flags |= INET_FRAG_DROP; if (!(fq->flags & INET_FRAG_COMPLETE)) { fq->flags |= INET_FRAG_COMPLETE; count++; } else if (fq->flags & INET_FRAG_HASH_DEAD) { count++; } spin_unlock_bh(&fq->lock); inet_frag_putn(fq, count); } static LLIST_HEAD(fqdir_free_list); static void fqdir_free_fn(struct work_struct *work) { struct llist_node *kill_list; struct fqdir *fqdir, *tmp; struct inet_frags *f; /* Atomically snapshot the list of fqdirs to free */ kill_list = llist_del_all(&fqdir_free_list); /* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu) * have completed, since they need to dereference fqdir. * Would it not be nice to have kfree_rcu_barrier() ? :) */ rcu_barrier(); llist_for_each_entry_safe(fqdir, tmp, kill_list, free_list) { f = fqdir->f; if (refcount_dec_and_test(&f->refcnt)) complete(&f->completion); kfree(fqdir); } } static DECLARE_DELAYED_WORK(fqdir_free_work, fqdir_free_fn); static void fqdir_work_fn(struct work_struct *work) { struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work); rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL); if (llist_add(&fqdir->free_list, &fqdir_free_list)) queue_delayed_work(system_wq, &fqdir_free_work, HZ); } int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f, struct net *net) { struct fqdir *fqdir = kzalloc(sizeof(*fqdir), GFP_KERNEL); int res; if (!fqdir) return -ENOMEM; fqdir->f = f; fqdir->net = net; res = rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params); if (res < 0) { kfree(fqdir); return res; } refcount_inc(&f->refcnt); *fqdirp = fqdir; return 0; } EXPORT_SYMBOL(fqdir_init); static struct workqueue_struct *inet_frag_wq; static int __init inet_frag_wq_init(void) { inet_frag_wq = create_workqueue("inet_frag_wq"); if (!inet_frag_wq) panic("Could not create inet frag workq"); return 0; } pure_initcall(inet_frag_wq_init); void fqdir_exit(struct fqdir *fqdir) { INIT_WORK(&fqdir->destroy_work, fqdir_work_fn); queue_work(inet_frag_wq, &fqdir->destroy_work); } EXPORT_SYMBOL(fqdir_exit); void inet_frag_kill(struct inet_frag_queue *fq, int *refs) { if (timer_delete(&fq->timer)) (*refs)++; if (!(fq->flags & INET_FRAG_COMPLETE)) { struct fqdir *fqdir = fq->fqdir; fq->flags |= INET_FRAG_COMPLETE; rcu_read_lock(); /* The RCU read lock provides a memory barrier * guaranteeing that if fqdir->dead is false then * the hash table destruction will not start until * after we unlock. Paired with fqdir_pre_exit(). */ if (!READ_ONCE(fqdir->dead)) { rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params); (*refs)++; } else { fq->flags |= INET_FRAG_HASH_DEAD; } rcu_read_unlock(); } } EXPORT_SYMBOL(inet_frag_kill); static void inet_frag_destroy_rcu(struct rcu_head *head) { struct inet_frag_queue *q = container_of(head, struct inet_frag_queue, rcu); struct inet_frags *f = q->fqdir->f; if (f->destructor) f->destructor(q); kmem_cache_free(f->frags_cachep, q); } unsigned int inet_frag_rbtree_purge(struct rb_root *root, enum skb_drop_reason reason) { struct rb_node *p = rb_first(root); unsigned int sum = 0; while (p) { struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); p = rb_next(p); rb_erase(&skb->rbnode, root); while (skb) { struct sk_buff *next = FRAG_CB(skb)->next_frag; sum += skb->truesize; kfree_skb_reason(skb, reason); skb = next; } } return sum; } EXPORT_SYMBOL(inet_frag_rbtree_purge); void inet_frag_destroy(struct inet_frag_queue *q) { unsigned int sum, sum_truesize = 0; enum skb_drop_reason reason; struct inet_frags *f; struct fqdir *fqdir; WARN_ON(!(q->flags & INET_FRAG_COMPLETE)); reason = (q->flags & INET_FRAG_DROP) ? SKB_DROP_REASON_FRAG_REASM_TIMEOUT : SKB_CONSUMED; WARN_ON(timer_delete(&q->timer) != 0); /* Release all fragment data. */ fqdir = q->fqdir; f = fqdir->f; sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments, reason); sum = sum_truesize + f->qsize; call_rcu(&q->rcu, inet_frag_destroy_rcu); sub_frag_mem_limit(fqdir, sum); } EXPORT_SYMBOL(inet_frag_destroy); static struct inet_frag_queue *inet_frag_alloc(struct fqdir *fqdir, struct inet_frags *f, void *arg) { struct inet_frag_queue *q; q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC); if (!q) return NULL; q->fqdir = fqdir; f->constructor(q, arg); add_frag_mem_limit(fqdir, f->qsize); timer_setup(&q->timer, f->frag_expire, 0); spin_lock_init(&q->lock); /* One reference for the timer, one for the hash table. */ refcount_set(&q->refcnt, 2); return q; } static struct inet_frag_queue *inet_frag_create(struct fqdir *fqdir, void *arg, struct inet_frag_queue **prev) { struct inet_frags *f = fqdir->f; struct inet_frag_queue *q; q = inet_frag_alloc(fqdir, f, arg); if (!q) { *prev = ERR_PTR(-ENOMEM); return NULL; } mod_timer(&q->timer, jiffies + fqdir->timeout); *prev = rhashtable_lookup_get_insert_key(&fqdir->rhashtable, &q->key, &q->node, f->rhash_params); if (*prev) { /* We could not insert in the hash table, * we need to cancel what inet_frag_alloc() * anticipated. */ int refs = 1; q->flags |= INET_FRAG_COMPLETE; inet_frag_kill(q, &refs); inet_frag_putn(q, refs); return NULL; } return q; } struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key) { /* This pairs with WRITE_ONCE() in fqdir_pre_exit(). */ long high_thresh = READ_ONCE(fqdir->high_thresh); struct inet_frag_queue *fq = NULL, *prev; if (!high_thresh || frag_mem_limit(fqdir) > high_thresh) return NULL; prev = rhashtable_lookup(&fqdir->rhashtable, key, fqdir->f->rhash_params); if (!prev) fq = inet_frag_create(fqdir, key, &prev); if (!IS_ERR_OR_NULL(prev)) fq = prev; return fq; } EXPORT_SYMBOL(inet_frag_find); int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, int offset, int end) { struct sk_buff *last = q->fragments_tail; /* RFC5722, Section 4, amended by Errata ID : 3089 * When reassembling an IPv6 datagram, if * one or more its constituent fragments is determined to be an * overlapping fragment, the entire datagram (and any constituent * fragments) MUST be silently discarded. * * Duplicates, however, should be ignored (i.e. skb dropped, but the * queue/fragments kept for later reassembly). */ if (!last) fragrun_create(q, skb); /* First fragment. */ else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) { /* This is the common case: skb goes to the end. */ /* Detect and discard overlaps. */ if (offset < FRAG_CB(last)->ip_defrag_offset + last->len) return IPFRAG_OVERLAP; if (offset == FRAG_CB(last)->ip_defrag_offset + last->len) fragrun_append_to_last(q, skb); else fragrun_create(q, skb); } else { /* Binary search. Note that skb can become the first fragment, * but not the last (covered above). */ struct rb_node **rbn, *parent; rbn = &q->rb_fragments.rb_node; do { struct sk_buff *curr; int curr_run_end; parent = *rbn; curr = rb_to_skb(parent); curr_run_end = FRAG_CB(curr)->ip_defrag_offset + FRAG_CB(curr)->frag_run_len; if (end <= FRAG_CB(curr)->ip_defrag_offset) rbn = &parent->rb_left; else if (offset >= curr_run_end) rbn = &parent->rb_right; else if (offset >= FRAG_CB(curr)->ip_defrag_offset && end <= curr_run_end) return IPFRAG_DUP; else return IPFRAG_OVERLAP; } while (*rbn); /* Here we have parent properly set, and rbn pointing to * one of its NULL left/right children. Insert skb. */ fragcb_clear(skb); rb_link_node(&skb->rbnode, parent, rbn); rb_insert_color(&skb->rbnode, &q->rb_fragments); } FRAG_CB(skb)->ip_defrag_offset = offset; return IPFRAG_OK; } EXPORT_SYMBOL(inet_frag_queue_insert); void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, struct sk_buff *parent) { struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments); void (*destructor)(struct sk_buff *); unsigned int orig_truesize = 0; struct sk_buff **nextp = NULL; struct sock *sk = skb->sk; int delta; if (sk && is_skb_wmem(skb)) { /* TX: skb->sk might have been passed as argument to * dst->output and must remain valid until tx completes. * * Move sk to reassembled skb and fix up wmem accounting. */ orig_truesize = skb->truesize; destructor = skb->destructor; } if (head != skb) { fp = skb_clone(skb, GFP_ATOMIC); if (!fp) { head = skb; goto out_restore_sk; } FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; if (RB_EMPTY_NODE(&skb->rbnode)) FRAG_CB(parent)->next_frag = fp; else rb_replace_node(&skb->rbnode, &fp->rbnode, &q->rb_fragments); if (q->fragments_tail == skb) q->fragments_tail = fp; if (orig_truesize) { /* prevent skb_morph from releasing sk */ skb->sk = NULL; skb->destructor = NULL; } skb_morph(skb, head); FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; rb_replace_node(&head->rbnode, &skb->rbnode, &q->rb_fragments); consume_skb(head); head = skb; } WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0); delta = -head->truesize; /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) goto out_restore_sk; delta += head->truesize; if (delta) add_frag_mem_limit(q->fqdir, delta); /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */ if (skb_has_frag_list(head)) { struct sk_buff *clone; int i, plen = 0; clone = alloc_skb(0, GFP_ATOMIC); if (!clone) goto out_restore_sk; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) plen += skb_frag_size(&skb_shinfo(head)->frags[i]); clone->data_len = head->data_len - plen; clone->len = clone->data_len; head->truesize += clone->truesize; clone->csum = 0; clone->ip_summed = head->ip_summed; add_frag_mem_limit(q->fqdir, clone->truesize); skb_shinfo(head)->frag_list = clone; nextp = &clone->next; } else { nextp = &skb_shinfo(head)->frag_list; } out_restore_sk: if (orig_truesize) { int ts_delta = head->truesize - orig_truesize; /* if this reassembled skb is fragmented later, * fraglist skbs will get skb->sk assigned from head->sk, * and each frag skb will be released via sock_wfree. * * Update sk_wmem_alloc. */ head->sk = sk; head->destructor = destructor; refcount_add(ts_delta, &sk->sk_wmem_alloc); } return nextp; } EXPORT_SYMBOL(inet_frag_reasm_prepare); void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, void *reasm_data, bool try_coalesce) { struct sock *sk = is_skb_wmem(head) ? head->sk : NULL; const unsigned int head_truesize = head->truesize; struct sk_buff **nextp = reasm_data; struct rb_node *rbn; struct sk_buff *fp; int sum_truesize; skb_push(head, head->data - skb_network_header(head)); /* Traverse the tree in order, to build frag_list. */ fp = FRAG_CB(head)->next_frag; rbn = rb_next(&head->rbnode); rb_erase(&head->rbnode, &q->rb_fragments); sum_truesize = head->truesize; while (rbn || fp) { /* fp points to the next sk_buff in the current run; * rbn points to the next run. */ /* Go through the current run. */ while (fp) { struct sk_buff *next_frag = FRAG_CB(fp)->next_frag; bool stolen; int delta; sum_truesize += fp->truesize; if (head->ip_summed != fp->ip_summed) head->ip_summed = CHECKSUM_NONE; else if (head->ip_summed == CHECKSUM_COMPLETE) head->csum = csum_add(head->csum, fp->csum); if (try_coalesce && skb_try_coalesce(head, fp, &stolen, &delta)) { kfree_skb_partial(fp, stolen); } else { fp->prev = NULL; memset(&fp->rbnode, 0, sizeof(fp->rbnode)); fp->sk = NULL; head->data_len += fp->len; head->len += fp->len; head->truesize += fp->truesize; *nextp = fp; nextp = &fp->next; } fp = next_frag; } /* Move to the next run. */ if (rbn) { struct rb_node *rbnext = rb_next(rbn); fp = rb_to_skb(rbn); rb_erase(rbn, &q->rb_fragments); rbn = rbnext; } } sub_frag_mem_limit(q->fqdir, sum_truesize); *nextp = NULL; skb_mark_not_on_list(head); head->prev = NULL; head->tstamp = q->stamp; head->tstamp_type = q->tstamp_type; if (sk) refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc); } EXPORT_SYMBOL(inet_frag_reasm_finish); struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q) { struct sk_buff *head, *skb; head = skb_rb_first(&q->rb_fragments); if (!head) return NULL; skb = FRAG_CB(head)->next_frag; if (skb) rb_replace_node(&head->rbnode, &skb->rbnode, &q->rb_fragments); else rb_erase(&head->rbnode, &q->rb_fragments); memset(&head->rbnode, 0, sizeof(head->rbnode)); barrier(); if (head == q->fragments_tail) q->fragments_tail = NULL; sub_frag_mem_limit(q->fqdir, head->truesize); return head; } EXPORT_SYMBOL(inet_frag_pull_head); |
| 2 1 1 1 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | /* * Copyright (c) 2006 Oracle. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/percpu.h> #include <linux/seq_file.h> #include <linux/proc_fs.h> #include "rds.h" #include "ib.h" DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats); static const char *const rds_ib_stat_names[] = { "ib_connect_raced", "ib_listen_closed_stale", "ib_evt_handler_call", "ib_tasklet_call", "ib_tx_cq_event", "ib_tx_ring_full", "ib_tx_throttle", "ib_tx_sg_mapping_failure", "ib_tx_stalled", "ib_tx_credit_updates", "ib_rx_cq_event", "ib_rx_ring_empty", "ib_rx_refill_from_cq", "ib_rx_refill_from_thread", "ib_rx_alloc_limit", "ib_rx_total_frags", "ib_rx_total_incs", "ib_rx_credit_updates", "ib_ack_sent", "ib_ack_send_failure", "ib_ack_send_delayed", "ib_ack_send_piggybacked", "ib_ack_received", "ib_rdma_mr_8k_alloc", "ib_rdma_mr_8k_free", "ib_rdma_mr_8k_used", "ib_rdma_mr_8k_pool_flush", "ib_rdma_mr_8k_pool_wait", "ib_rdma_mr_8k_pool_depleted", "ib_rdma_mr_1m_alloc", "ib_rdma_mr_1m_free", "ib_rdma_mr_1m_used", "ib_rdma_mr_1m_pool_flush", "ib_rdma_mr_1m_pool_wait", "ib_rdma_mr_1m_pool_depleted", "ib_rdma_mr_8k_reused", "ib_rdma_mr_1m_reused", "ib_atomic_cswp", "ib_atomic_fadd", }; unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter, unsigned int avail) { struct rds_ib_statistics stats = {0, }; uint64_t *src; uint64_t *sum; size_t i; int cpu; if (avail < ARRAY_SIZE(rds_ib_stat_names)) goto out; for_each_online_cpu(cpu) { src = (uint64_t *)&(per_cpu(rds_ib_stats, cpu)); sum = (uint64_t *)&stats; for (i = 0; i < sizeof(stats) / sizeof(uint64_t); i++) *(sum++) += *(src++); } rds_stats_info_copy(iter, (uint64_t *)&stats, rds_ib_stat_names, ARRAY_SIZE(rds_ib_stat_names)); out: return ARRAY_SIZE(rds_ib_stat_names); } |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | // SPDX-License-Identifier: GPL-2.0 /* * NHPoly1305 - ε-almost-∆-universal hash function for Adiantum * * Copyright 2018 Google LLC */ /* * "NHPoly1305" is the main component of Adiantum hashing. * Specifically, it is the calculation * * H_L ← Poly1305_{K_L}(NH_{K_N}(pad_{128}(L))) * * from the procedure in section 6.4 of the Adiantum paper [1]. It is an * ε-almost-∆-universal (ε-∆U) hash function for equal-length inputs over * Z/(2^{128}Z), where the "∆" operation is addition. It hashes 1024-byte * chunks of the input with the NH hash function [2], reducing the input length * by 32x. The resulting NH digests are evaluated as a polynomial in * GF(2^{130}-5), like in the Poly1305 MAC [3]. Note that the polynomial * evaluation by itself would suffice to achieve the ε-∆U property; NH is used * for performance since it's over twice as fast as Poly1305. * * This is *not* a cryptographic hash function; do not use it as such! * * [1] Adiantum: length-preserving encryption for entry-level processors * (https://eprint.iacr.org/2018/720.pdf) * [2] UMAC: Fast and Secure Message Authentication * (https://fastcrypto.org/umac/umac_proc.pdf) * [3] The Poly1305-AES message-authentication code * (https://cr.yp.to/mac/poly1305-20050329.pdf) */ #include <linux/unaligned.h> #include <crypto/algapi.h> #include <crypto/internal/hash.h> #include <crypto/internal/poly1305.h> #include <crypto/nhpoly1305.h> #include <linux/crypto.h> #include <linux/kernel.h> #include <linux/module.h> static void nh_generic(const u32 *key, const u8 *message, size_t message_len, __le64 hash[NH_NUM_PASSES]) { u64 sums[4] = { 0, 0, 0, 0 }; BUILD_BUG_ON(NH_PAIR_STRIDE != 2); BUILD_BUG_ON(NH_NUM_PASSES != 4); while (message_len) { u32 m0 = get_unaligned_le32(message + 0); u32 m1 = get_unaligned_le32(message + 4); u32 m2 = get_unaligned_le32(message + 8); u32 m3 = get_unaligned_le32(message + 12); sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]); sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]); sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]); sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]); sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]); sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]); sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]); sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]); key += NH_MESSAGE_UNIT / sizeof(key[0]); message += NH_MESSAGE_UNIT; message_len -= NH_MESSAGE_UNIT; } hash[0] = cpu_to_le64(sums[0]); hash[1] = cpu_to_le64(sums[1]); hash[2] = cpu_to_le64(sums[2]); hash[3] = cpu_to_le64(sums[3]); } /* Pass the next NH hash value through Poly1305 */ static void process_nh_hash_value(struct nhpoly1305_state *state, const struct nhpoly1305_key *key) { BUILD_BUG_ON(NH_HASH_BYTES % POLY1305_BLOCK_SIZE != 0); poly1305_core_blocks(&state->poly_state, &key->poly_key, state->nh_hash, NH_HASH_BYTES / POLY1305_BLOCK_SIZE, 1); } /* * Feed the next portion of the source data, as a whole number of 16-byte * "NH message units", through NH and Poly1305. Each NH hash is taken over * 1024 bytes, except possibly the final one which is taken over a multiple of * 16 bytes up to 1024. Also, in the case where data is passed in misaligned * chunks, we combine partial hashes; the end result is the same either way. */ static void nhpoly1305_units(struct nhpoly1305_state *state, const struct nhpoly1305_key *key, const u8 *src, unsigned int srclen, nh_t nh_fn) { do { unsigned int bytes; if (state->nh_remaining == 0) { /* Starting a new NH message */ bytes = min_t(unsigned int, srclen, NH_MESSAGE_BYTES); nh_fn(key->nh_key, src, bytes, state->nh_hash); state->nh_remaining = NH_MESSAGE_BYTES - bytes; } else { /* Continuing a previous NH message */ __le64 tmp_hash[NH_NUM_PASSES]; unsigned int pos; int i; pos = NH_MESSAGE_BYTES - state->nh_remaining; bytes = min(srclen, state->nh_remaining); nh_fn(&key->nh_key[pos / 4], src, bytes, tmp_hash); for (i = 0; i < NH_NUM_PASSES; i++) le64_add_cpu(&state->nh_hash[i], le64_to_cpu(tmp_hash[i])); state->nh_remaining -= bytes; } if (state->nh_remaining == 0) process_nh_hash_value(state, key); src += bytes; srclen -= bytes; } while (srclen); } int crypto_nhpoly1305_setkey(struct crypto_shash *tfm, const u8 *key, unsigned int keylen) { struct nhpoly1305_key *ctx = crypto_shash_ctx(tfm); int i; if (keylen != NHPOLY1305_KEY_SIZE) return -EINVAL; poly1305_core_setkey(&ctx->poly_key, key); key += POLY1305_BLOCK_SIZE; for (i = 0; i < NH_KEY_WORDS; i++) ctx->nh_key[i] = get_unaligned_le32(key + i * sizeof(u32)); return 0; } EXPORT_SYMBOL(crypto_nhpoly1305_setkey); int crypto_nhpoly1305_init(struct shash_desc *desc) { struct nhpoly1305_state *state = shash_desc_ctx(desc); poly1305_core_init(&state->poly_state); state->buflen = 0; state->nh_remaining = 0; return 0; } EXPORT_SYMBOL(crypto_nhpoly1305_init); int crypto_nhpoly1305_update_helper(struct shash_desc *desc, const u8 *src, unsigned int srclen, nh_t nh_fn) { struct nhpoly1305_state *state = shash_desc_ctx(desc); const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm); unsigned int bytes; if (state->buflen) { bytes = min(srclen, (int)NH_MESSAGE_UNIT - state->buflen); memcpy(&state->buffer[state->buflen], src, bytes); state->buflen += bytes; if (state->buflen < NH_MESSAGE_UNIT) return 0; nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT, nh_fn); state->buflen = 0; src += bytes; srclen -= bytes; } if (srclen >= NH_MESSAGE_UNIT) { bytes = round_down(srclen, NH_MESSAGE_UNIT); nhpoly1305_units(state, key, src, bytes, nh_fn); src += bytes; srclen -= bytes; } if (srclen) { memcpy(state->buffer, src, srclen); state->buflen = srclen; } return 0; } EXPORT_SYMBOL(crypto_nhpoly1305_update_helper); int crypto_nhpoly1305_update(struct shash_desc *desc, const u8 *src, unsigned int srclen) { return crypto_nhpoly1305_update_helper(desc, src, srclen, nh_generic); } EXPORT_SYMBOL(crypto_nhpoly1305_update); int crypto_nhpoly1305_final_helper(struct shash_desc *desc, u8 *dst, nh_t nh_fn) { struct nhpoly1305_state *state = shash_desc_ctx(desc); const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm); if (state->buflen) { memset(&state->buffer[state->buflen], 0, NH_MESSAGE_UNIT - state->buflen); nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT, nh_fn); } if (state->nh_remaining) process_nh_hash_value(state, key); poly1305_core_emit(&state->poly_state, NULL, dst); return 0; } EXPORT_SYMBOL(crypto_nhpoly1305_final_helper); int crypto_nhpoly1305_final(struct shash_desc *desc, u8 *dst) { return crypto_nhpoly1305_final_helper(desc, dst, nh_generic); } EXPORT_SYMBOL(crypto_nhpoly1305_final); static struct shash_alg nhpoly1305_alg = { .base.cra_name = "nhpoly1305", .base.cra_driver_name = "nhpoly1305-generic", .base.cra_priority = 100, .base.cra_ctxsize = sizeof(struct nhpoly1305_key), .base.cra_module = THIS_MODULE, .digestsize = POLY1305_DIGEST_SIZE, .init = crypto_nhpoly1305_init, .update = crypto_nhpoly1305_update, .final = crypto_nhpoly1305_final, .setkey = crypto_nhpoly1305_setkey, .descsize = sizeof(struct nhpoly1305_state), }; static int __init nhpoly1305_mod_init(void) { return crypto_register_shash(&nhpoly1305_alg); } static void __exit nhpoly1305_mod_exit(void) { crypto_unregister_shash(&nhpoly1305_alg); } module_init(nhpoly1305_mod_init); module_exit(nhpoly1305_mod_exit); MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function"); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>"); MODULE_ALIAS_CRYPTO("nhpoly1305"); MODULE_ALIAS_CRYPTO("nhpoly1305-generic"); |
| 6 6 1 5 1 4 1 3 3 1 2 2 2 2 1 2 1 1 6 7 7 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 | // SPDX-License-Identifier: GPL-2.0 /* Copyright 2011-2014 Autronica Fire and Security AS * * Author(s): * 2011-2014 Arvid Brodin, arvid.brodin@alten.se * * Routines for handling Netlink messages for HSR and PRP. */ #include "hsr_netlink.h" #include <linux/kernel.h> #include <net/rtnetlink.h> #include <net/genetlink.h> #include "hsr_main.h" #include "hsr_device.h" #include "hsr_framereg.h" static const struct nla_policy hsr_policy[IFLA_HSR_MAX + 1] = { [IFLA_HSR_SLAVE1] = { .type = NLA_U32 }, [IFLA_HSR_SLAVE2] = { .type = NLA_U32 }, [IFLA_HSR_MULTICAST_SPEC] = { .type = NLA_U8 }, [IFLA_HSR_VERSION] = { .type = NLA_U8 }, [IFLA_HSR_SUPERVISION_ADDR] = { .len = ETH_ALEN }, [IFLA_HSR_SEQ_NR] = { .type = NLA_U16 }, [IFLA_HSR_PROTOCOL] = { .type = NLA_U8 }, [IFLA_HSR_INTERLINK] = { .type = NLA_U32 }, }; /* Here, it seems a netdevice has already been allocated for us, and the * hsr_dev_setup routine has been executed. Nice! */ static int hsr_newlink(struct net_device *dev, struct rtnl_newlink_params *params, struct netlink_ext_ack *extack) { struct net *link_net = rtnl_newlink_link_net(params); struct nlattr **data = params->data; enum hsr_version proto_version; unsigned char multicast_spec; u8 proto = HSR_PROTOCOL_HSR; struct net_device *link[2], *interlink = NULL; if (!data) { NL_SET_ERR_MSG_MOD(extack, "No slave devices specified"); return -EINVAL; } if (!data[IFLA_HSR_SLAVE1]) { NL_SET_ERR_MSG_MOD(extack, "Slave1 device not specified"); return -EINVAL; } link[0] = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_SLAVE1])); if (!link[0]) { NL_SET_ERR_MSG_MOD(extack, "Slave1 does not exist"); return -EINVAL; } if (!data[IFLA_HSR_SLAVE2]) { NL_SET_ERR_MSG_MOD(extack, "Slave2 device not specified"); return -EINVAL; } link[1] = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_SLAVE2])); if (!link[1]) { NL_SET_ERR_MSG_MOD(extack, "Slave2 does not exist"); return -EINVAL; } if (link[0] == link[1]) { NL_SET_ERR_MSG_MOD(extack, "Slave1 and Slave2 are same"); return -EINVAL; } if (data[IFLA_HSR_INTERLINK]) interlink = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_INTERLINK])); if (interlink && interlink == link[0]) { NL_SET_ERR_MSG_MOD(extack, "Interlink and Slave1 are the same"); return -EINVAL; } if (interlink && interlink == link[1]) { NL_SET_ERR_MSG_MOD(extack, "Interlink and Slave2 are the same"); return -EINVAL; } multicast_spec = nla_get_u8_default(data[IFLA_HSR_MULTICAST_SPEC], 0); if (data[IFLA_HSR_PROTOCOL]) proto = nla_get_u8(data[IFLA_HSR_PROTOCOL]); if (proto >= HSR_PROTOCOL_MAX) { NL_SET_ERR_MSG_MOD(extack, "Unsupported protocol"); return -EINVAL; } if (!data[IFLA_HSR_VERSION]) { proto_version = HSR_V0; } else { if (proto == HSR_PROTOCOL_PRP) { NL_SET_ERR_MSG_MOD(extack, "PRP version unsupported"); return -EINVAL; } proto_version = nla_get_u8(data[IFLA_HSR_VERSION]); if (proto_version > HSR_V1) { NL_SET_ERR_MSG_MOD(extack, "Only HSR version 0/1 supported"); return -EINVAL; } } if (proto == HSR_PROTOCOL_PRP) { proto_version = PRP_V1; if (interlink) { NL_SET_ERR_MSG_MOD(extack, "Interlink only works with HSR"); return -EINVAL; } } return hsr_dev_finalize(dev, link, interlink, multicast_spec, proto_version, extack); } static void hsr_dellink(struct net_device *dev, struct list_head *head) { struct hsr_priv *hsr = netdev_priv(dev); timer_delete_sync(&hsr->prune_timer); timer_delete_sync(&hsr->prune_proxy_timer); timer_delete_sync(&hsr->announce_timer); timer_delete_sync(&hsr->announce_proxy_timer); hsr_debugfs_term(hsr); hsr_del_ports(hsr); hsr_del_self_node(hsr); hsr_del_nodes(&hsr->node_db); hsr_del_nodes(&hsr->proxy_node_db); unregister_netdevice_queue(dev, head); } static int hsr_fill_info(struct sk_buff *skb, const struct net_device *dev) { struct hsr_priv *hsr = netdev_priv(dev); u8 proto = HSR_PROTOCOL_HSR; struct hsr_port *port; port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A); if (port) { if (nla_put_u32(skb, IFLA_HSR_SLAVE1, port->dev->ifindex)) goto nla_put_failure; } port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); if (port) { if (nla_put_u32(skb, IFLA_HSR_SLAVE2, port->dev->ifindex)) goto nla_put_failure; } if (nla_put(skb, IFLA_HSR_SUPERVISION_ADDR, ETH_ALEN, hsr->sup_multicast_addr) || nla_put_u16(skb, IFLA_HSR_SEQ_NR, hsr->sequence_nr)) goto nla_put_failure; if (hsr->prot_version == PRP_V1) proto = HSR_PROTOCOL_PRP; if (nla_put_u8(skb, IFLA_HSR_PROTOCOL, proto)) goto nla_put_failure; return 0; nla_put_failure: return -EMSGSIZE; } static struct rtnl_link_ops hsr_link_ops __read_mostly = { .kind = "hsr", .maxtype = IFLA_HSR_MAX, .policy = hsr_policy, .priv_size = sizeof(struct hsr_priv), .setup = hsr_dev_setup, .newlink = hsr_newlink, .dellink = hsr_dellink, .fill_info = hsr_fill_info, }; /* attribute policy */ static const struct nla_policy hsr_genl_policy[HSR_A_MAX + 1] = { [HSR_A_NODE_ADDR] = { .len = ETH_ALEN }, [HSR_A_NODE_ADDR_B] = { .len = ETH_ALEN }, [HSR_A_IFINDEX] = { .type = NLA_U32 }, [HSR_A_IF1_AGE] = { .type = NLA_U32 }, [HSR_A_IF2_AGE] = { .type = NLA_U32 }, [HSR_A_IF1_SEQ] = { .type = NLA_U16 }, [HSR_A_IF2_SEQ] = { .type = NLA_U16 }, }; static struct genl_family hsr_genl_family; static const struct genl_multicast_group hsr_mcgrps[] = { { .name = "hsr-network", }, }; /* This is called if for some node with MAC address addr, we only get frames * over one of the slave interfaces. This would indicate an open network ring * (i.e. a link has failed somewhere). */ void hsr_nl_ringerror(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN], struct hsr_port *port) { struct sk_buff *skb; void *msg_head; struct hsr_port *master; int res; skb = genlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) goto fail; msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0, HSR_C_RING_ERROR); if (!msg_head) goto nla_put_failure; res = nla_put(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr); if (res < 0) goto nla_put_failure; res = nla_put_u32(skb, HSR_A_IFINDEX, port->dev->ifindex); if (res < 0) goto nla_put_failure; genlmsg_end(skb, msg_head); genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC); return; nla_put_failure: kfree_skb(skb); fail: rcu_read_lock(); master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); netdev_warn(master->dev, "Could not send HSR ring error message\n"); rcu_read_unlock(); } /* This is called when we haven't heard from the node with MAC address addr for * some time (just before the node is removed from the node table/list). */ void hsr_nl_nodedown(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN]) { struct sk_buff *skb; void *msg_head; struct hsr_port *master; int res; skb = genlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) goto fail; msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0, HSR_C_NODE_DOWN); if (!msg_head) goto nla_put_failure; res = nla_put(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr); if (res < 0) goto nla_put_failure; genlmsg_end(skb, msg_head); genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC); return; nla_put_failure: kfree_skb(skb); fail: rcu_read_lock(); master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); netdev_warn(master->dev, "Could not send HSR node down\n"); rcu_read_unlock(); } /* HSR_C_GET_NODE_STATUS lets userspace query the internal HSR node table * about the status of a specific node in the network, defined by its MAC * address. * * Input: hsr ifindex, node mac address * Output: hsr ifindex, node mac address (copied from request), * age of latest frame from node over slave 1, slave 2 [ms] */ static int hsr_get_node_status(struct sk_buff *skb_in, struct genl_info *info) { /* For receiving */ struct nlattr *na; struct net_device *hsr_dev; /* For sending */ struct sk_buff *skb_out; void *msg_head; struct hsr_priv *hsr; struct hsr_port *port; unsigned char hsr_node_addr_b[ETH_ALEN]; int hsr_node_if1_age; u16 hsr_node_if1_seq; int hsr_node_if2_age; u16 hsr_node_if2_seq; int addr_b_ifindex; int res; if (!info) goto invalid; na = info->attrs[HSR_A_IFINDEX]; if (!na) goto invalid; na = info->attrs[HSR_A_NODE_ADDR]; if (!na) goto invalid; rcu_read_lock(); hsr_dev = dev_get_by_index_rcu(genl_info_net(info), nla_get_u32(info->attrs[HSR_A_IFINDEX])); if (!hsr_dev) goto rcu_unlock; if (!is_hsr_master(hsr_dev)) goto rcu_unlock; /* Send reply */ skb_out = genlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb_out) { res = -ENOMEM; goto fail; } msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid, info->snd_seq, &hsr_genl_family, 0, HSR_C_SET_NODE_STATUS); if (!msg_head) { res = -ENOMEM; goto nla_put_failure; } res = nla_put_u32(skb_out, HSR_A_IFINDEX, hsr_dev->ifindex); if (res < 0) goto nla_put_failure; hsr = netdev_priv(hsr_dev); res = hsr_get_node_data(hsr, (unsigned char *) nla_data(info->attrs[HSR_A_NODE_ADDR]), hsr_node_addr_b, &addr_b_ifindex, &hsr_node_if1_age, &hsr_node_if1_seq, &hsr_node_if2_age, &hsr_node_if2_seq); if (res < 0) goto nla_put_failure; res = nla_put(skb_out, HSR_A_NODE_ADDR, ETH_ALEN, nla_data(info->attrs[HSR_A_NODE_ADDR])); if (res < 0) goto nla_put_failure; if (addr_b_ifindex > -1) { res = nla_put(skb_out, HSR_A_NODE_ADDR_B, ETH_ALEN, hsr_node_addr_b); if (res < 0) goto nla_put_failure; res = nla_put_u32(skb_out, HSR_A_ADDR_B_IFINDEX, addr_b_ifindex); if (res < 0) goto nla_put_failure; } res = nla_put_u32(skb_out, HSR_A_IF1_AGE, hsr_node_if1_age); if (res < 0) goto nla_put_failure; res = nla_put_u16(skb_out, HSR_A_IF1_SEQ, hsr_node_if1_seq); if (res < 0) goto nla_put_failure; port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A); if (port) res = nla_put_u32(skb_out, HSR_A_IF1_IFINDEX, port->dev->ifindex); if (res < 0) goto nla_put_failure; res = nla_put_u32(skb_out, HSR_A_IF2_AGE, hsr_node_if2_age); if (res < 0) goto nla_put_failure; res = nla_put_u16(skb_out, HSR_A_IF2_SEQ, hsr_node_if2_seq); if (res < 0) goto nla_put_failure; port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); if (port) res = nla_put_u32(skb_out, HSR_A_IF2_IFINDEX, port->dev->ifindex); if (res < 0) goto nla_put_failure; rcu_read_unlock(); genlmsg_end(skb_out, msg_head); genlmsg_unicast(genl_info_net(info), skb_out, info->snd_portid); return 0; rcu_unlock: rcu_read_unlock(); invalid: netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL, NULL); return 0; nla_put_failure: kfree_skb(skb_out); /* Fall through */ fail: rcu_read_unlock(); return res; } /* Get a list of MacAddressA of all nodes known to this node (including self). */ static int hsr_get_node_list(struct sk_buff *skb_in, struct genl_info *info) { unsigned char addr[ETH_ALEN]; struct net_device *hsr_dev; struct sk_buff *skb_out; struct hsr_priv *hsr; bool restart = false; struct nlattr *na; void *pos = NULL; void *msg_head; int res; if (!info) goto invalid; na = info->attrs[HSR_A_IFINDEX]; if (!na) goto invalid; rcu_read_lock(); hsr_dev = dev_get_by_index_rcu(genl_info_net(info), nla_get_u32(info->attrs[HSR_A_IFINDEX])); if (!hsr_dev) goto rcu_unlock; if (!is_hsr_master(hsr_dev)) goto rcu_unlock; restart: /* Send reply */ skb_out = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_ATOMIC); if (!skb_out) { res = -ENOMEM; goto fail; } msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid, info->snd_seq, &hsr_genl_family, 0, HSR_C_SET_NODE_LIST); if (!msg_head) { res = -ENOMEM; goto nla_put_failure; } if (!restart) { res = nla_put_u32(skb_out, HSR_A_IFINDEX, hsr_dev->ifindex); if (res < 0) goto nla_put_failure; } hsr = netdev_priv(hsr_dev); if (!pos) pos = hsr_get_next_node(hsr, NULL, addr); while (pos) { res = nla_put(skb_out, HSR_A_NODE_ADDR, ETH_ALEN, addr); if (res < 0) { if (res == -EMSGSIZE) { genlmsg_end(skb_out, msg_head); genlmsg_unicast(genl_info_net(info), skb_out, info->snd_portid); restart = true; goto restart; } goto nla_put_failure; } pos = hsr_get_next_node(hsr, pos, addr); } rcu_read_unlock(); genlmsg_end(skb_out, msg_head); genlmsg_unicast(genl_info_net(info), skb_out, info->snd_portid); return 0; rcu_unlock: rcu_read_unlock(); invalid: netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL, NULL); return 0; nla_put_failure: nlmsg_free(skb_out); /* Fall through */ fail: rcu_read_unlock(); return res; } static const struct genl_small_ops hsr_ops[] = { { .cmd = HSR_C_GET_NODE_STATUS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = 0, .doit = hsr_get_node_status, .dumpit = NULL, }, { .cmd = HSR_C_GET_NODE_LIST, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = 0, .doit = hsr_get_node_list, .dumpit = NULL, }, }; static struct genl_family hsr_genl_family __ro_after_init = { .hdrsize = 0, .name = "HSR", .version = 1, .maxattr = HSR_A_MAX, .policy = hsr_genl_policy, .netnsok = true, .module = THIS_MODULE, .small_ops = hsr_ops, .n_small_ops = ARRAY_SIZE(hsr_ops), .resv_start_op = HSR_C_SET_NODE_LIST + 1, .mcgrps = hsr_mcgrps, .n_mcgrps = ARRAY_SIZE(hsr_mcgrps), }; int __init hsr_netlink_init(void) { int rc; rc = rtnl_link_register(&hsr_link_ops); if (rc) goto fail_rtnl_link_register; rc = genl_register_family(&hsr_genl_family); if (rc) goto fail_genl_register_family; hsr_debugfs_create_root(); return 0; fail_genl_register_family: rtnl_link_unregister(&hsr_link_ops); fail_rtnl_link_register: return rc; } void __exit hsr_netlink_exit(void) { genl_unregister_family(&hsr_genl_family); rtnl_link_unregister(&hsr_link_ops); } MODULE_ALIAS_RTNL_LINK("hsr"); |
| 35 47 58 12 12 12 12 12 12 12 12 5 5 31 29 5 4 4 4 4 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Fast and scalable bitmaps. * * Copyright (C) 2016 Facebook * Copyright (C) 2013-2014 Jens Axboe */ #ifndef __LINUX_SCALE_BITMAP_H #define __LINUX_SCALE_BITMAP_H #include <linux/atomic.h> #include <linux/bitops.h> #include <linux/cache.h> #include <linux/list.h> #include <linux/log2.h> #include <linux/minmax.h> #include <linux/percpu.h> #include <linux/slab.h> #include <linux/smp.h> #include <linux/types.h> #include <linux/wait.h> struct seq_file; /** * struct sbitmap_word - Word in a &struct sbitmap. */ struct sbitmap_word { /** * @word: word holding free bits */ unsigned long word; /** * @cleared: word holding cleared bits */ unsigned long cleared ____cacheline_aligned_in_smp; /** * @swap_lock: serializes simultaneous updates of ->word and ->cleared */ raw_spinlock_t swap_lock; } ____cacheline_aligned_in_smp; /** * struct sbitmap - Scalable bitmap. * * A &struct sbitmap is spread over multiple cachelines to avoid ping-pong. This * trades off higher memory usage for better scalability. */ struct sbitmap { /** * @depth: Number of bits used in the whole bitmap. */ unsigned int depth; /** * @shift: log2(number of bits used per word) */ unsigned int shift; /** * @map_nr: Number of words (cachelines) being used for the bitmap. */ unsigned int map_nr; /** * @round_robin: Allocate bits in strict round-robin order. */ bool round_robin; /** * @map: Allocated bitmap. */ struct sbitmap_word *map; /* * @alloc_hint: Cache of last successfully allocated or freed bit. * * This is per-cpu, which allows multiple users to stick to different * cachelines until the map is exhausted. */ unsigned int __percpu *alloc_hint; }; #define SBQ_WAIT_QUEUES 8 #define SBQ_WAKE_BATCH 8 /** * struct sbq_wait_state - Wait queue in a &struct sbitmap_queue. */ struct sbq_wait_state { /** * @wait: Wait queue. */ wait_queue_head_t wait; } ____cacheline_aligned_in_smp; /** * struct sbitmap_queue - Scalable bitmap with the added ability to wait on free * bits. * * A &struct sbitmap_queue uses multiple wait queues and rolling wakeups to * avoid contention on the wait queue spinlock. This ensures that we don't hit a * scalability wall when we run out of free bits and have to start putting tasks * to sleep. */ struct sbitmap_queue { /** * @sb: Scalable bitmap. */ struct sbitmap sb; /** * @wake_batch: Number of bits which must be freed before we wake up any * waiters. */ unsigned int wake_batch; /** * @wake_index: Next wait queue in @ws to wake up. */ atomic_t wake_index; /** * @ws: Wait queues. */ struct sbq_wait_state *ws; /* * @ws_active: count of currently active ws waitqueues */ atomic_t ws_active; /** * @min_shallow_depth: The minimum shallow depth which may be passed to * sbitmap_queue_get_shallow() */ unsigned int min_shallow_depth; /** * @completion_cnt: Number of bits cleared passed to the * wakeup function. */ atomic_t completion_cnt; /** * @wakeup_cnt: Number of thread wake ups issued. */ atomic_t wakeup_cnt; }; /** * sbitmap_init_node() - Initialize a &struct sbitmap on a specific memory node. * @sb: Bitmap to initialize. * @depth: Number of bits to allocate. * @shift: Use 2^@shift bits per word in the bitmap; if a negative number if * given, a good default is chosen. * @flags: Allocation flags. * @node: Memory node to allocate on. * @round_robin: If true, be stricter about allocation order; always allocate * starting from the last allocated bit. This is less efficient * than the default behavior (false). * @alloc_hint: If true, apply percpu hint for where to start searching for * a free bit. * * Return: Zero on success or negative errno on failure. */ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, gfp_t flags, int node, bool round_robin, bool alloc_hint); /* sbitmap internal helper */ static inline unsigned int __map_depth(const struct sbitmap *sb, int index) { if (index == sb->map_nr - 1) return sb->depth - (index << sb->shift); return 1U << sb->shift; } /** * sbitmap_free() - Free memory used by a &struct sbitmap. * @sb: Bitmap to free. */ static inline void sbitmap_free(struct sbitmap *sb) { free_percpu(sb->alloc_hint); kvfree(sb->map); sb->map = NULL; } /** * sbitmap_resize() - Resize a &struct sbitmap. * @sb: Bitmap to resize. * @depth: New number of bits to resize to. * * Doesn't reallocate anything. It's up to the caller to ensure that the new * depth doesn't exceed the depth that the sb was initialized with. */ void sbitmap_resize(struct sbitmap *sb, unsigned int depth); /** * sbitmap_get() - Try to allocate a free bit from a &struct sbitmap. * @sb: Bitmap to allocate from. * * This operation provides acquire barrier semantics if it succeeds. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int sbitmap_get(struct sbitmap *sb); /** * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap, * limiting the depth used from each word. * @sb: Bitmap to allocate from. * @shallow_depth: The maximum number of bits to allocate from a single word. * * This rather specific operation allows for having multiple users with * different allocation limits. E.g., there can be a high-priority class that * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow() * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority * class can only allocate half of the total bits in the bitmap, preventing it * from starving out the high-priority class. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth); /** * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap. * @sb: Bitmap to check. * * Return: true if any bit in the bitmap is set, false otherwise. */ bool sbitmap_any_bit_set(const struct sbitmap *sb); #define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift) #define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U)) typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *); /** * __sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap. * @start: Where to start the iteration. * @sb: Bitmap to iterate over. * @fn: Callback. Should return true to continue or false to break early. * @data: Pointer to pass to callback. * * This is inline even though it's non-trivial so that the function calls to the * callback will hopefully get optimized away. */ static inline void __sbitmap_for_each_set(struct sbitmap *sb, unsigned int start, sb_for_each_fn fn, void *data) { unsigned int index; unsigned int nr; unsigned int scanned = 0; if (start >= sb->depth) start = 0; index = SB_NR_TO_INDEX(sb, start); nr = SB_NR_TO_BIT(sb, start); while (scanned < sb->depth) { unsigned long word; unsigned int depth = min_t(unsigned int, __map_depth(sb, index) - nr, sb->depth - scanned); scanned += depth; word = sb->map[index].word & ~sb->map[index].cleared; if (!word) goto next; /* * On the first iteration of the outer loop, we need to add the * bit offset back to the size of the word for find_next_bit(). * On all other iterations, nr is zero, so this is a noop. */ depth += nr; while (1) { nr = find_next_bit(&word, depth, nr); if (nr >= depth) break; if (!fn(sb, (index << sb->shift) + nr, data)) return; nr++; } next: nr = 0; if (++index >= sb->map_nr) index = 0; } } /** * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap. * @sb: Bitmap to iterate over. * @fn: Callback. Should return true to continue or false to break early. * @data: Pointer to pass to callback. */ static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn, void *data) { __sbitmap_for_each_set(sb, 0, fn, data); } static inline unsigned long *__sbitmap_word(struct sbitmap *sb, unsigned int bitnr) { return &sb->map[SB_NR_TO_INDEX(sb, bitnr)].word; } /* Helpers equivalent to the operations in asm/bitops.h and linux/bitmap.h */ static inline void sbitmap_set_bit(struct sbitmap *sb, unsigned int bitnr) { set_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } static inline void sbitmap_clear_bit(struct sbitmap *sb, unsigned int bitnr) { clear_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } /* * This one is special, since it doesn't actually clear the bit, rather it * sets the corresponding bit in the ->cleared mask instead. Paired with * the caller doing sbitmap_deferred_clear() if a given index is full, which * will clear the previously freed entries in the corresponding ->word. */ static inline void sbitmap_deferred_clear_bit(struct sbitmap *sb, unsigned int bitnr) { unsigned long *addr = &sb->map[SB_NR_TO_INDEX(sb, bitnr)].cleared; set_bit(SB_NR_TO_BIT(sb, bitnr), addr); } /* * Pair of sbitmap_get, and this one applies both cleared bit and * allocation hint. */ static inline void sbitmap_put(struct sbitmap *sb, unsigned int bitnr) { sbitmap_deferred_clear_bit(sb, bitnr); if (likely(sb->alloc_hint && !sb->round_robin && bitnr < sb->depth)) *raw_cpu_ptr(sb->alloc_hint) = bitnr; } static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) { return test_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } static inline int sbitmap_calculate_shift(unsigned int depth) { int shift = ilog2(BITS_PER_LONG); /* * If the bitmap is small, shrink the number of bits per word so * we spread over a few cachelines, at least. If less than 4 * bits, just forget about it, it's not going to work optimally * anyway. */ if (depth >= 4) { while ((4U << shift) > depth) shift--; } return shift; } /** * sbitmap_show() - Dump &struct sbitmap information to a &struct seq_file. * @sb: Bitmap to show. * @m: struct seq_file to write to. * * This is intended for debugging. The format may change at any time. */ void sbitmap_show(struct sbitmap *sb, struct seq_file *m); /** * sbitmap_weight() - Return how many set and not cleared bits in a &struct * sbitmap. * @sb: Bitmap to check. * * Return: How many set and not cleared bits set */ unsigned int sbitmap_weight(const struct sbitmap *sb); /** * sbitmap_bitmap_show() - Write a hex dump of a &struct sbitmap to a &struct * seq_file. * @sb: Bitmap to show. * @m: struct seq_file to write to. * * This is intended for debugging. The output isn't guaranteed to be internally * consistent. */ void sbitmap_bitmap_show(struct sbitmap *sb, struct seq_file *m); /** * sbitmap_queue_init_node() - Initialize a &struct sbitmap_queue on a specific * memory node. * @sbq: Bitmap queue to initialize. * @depth: See sbitmap_init_node(). * @shift: See sbitmap_init_node(). * @round_robin: See sbitmap_get(). * @flags: Allocation flags. * @node: Memory node to allocate on. * * Return: Zero on success or negative errno on failure. */ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, int shift, bool round_robin, gfp_t flags, int node); /** * sbitmap_queue_free() - Free memory used by a &struct sbitmap_queue. * * @sbq: Bitmap queue to free. */ static inline void sbitmap_queue_free(struct sbitmap_queue *sbq) { kfree(sbq->ws); sbitmap_free(&sbq->sb); } /** * sbitmap_queue_recalculate_wake_batch() - Recalculate wake batch * @sbq: Bitmap queue to recalculate wake batch. * @users: Number of shares. * * Like sbitmap_queue_update_wake_batch(), this will calculate wake batch * by depth. This interface is for HCTX shared tags or queue shared tags. */ void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq, unsigned int users); /** * sbitmap_queue_resize() - Resize a &struct sbitmap_queue. * @sbq: Bitmap queue to resize. * @depth: New number of bits to resize to. * * Like sbitmap_resize(), this doesn't reallocate anything. It has to do * some extra work on the &struct sbitmap_queue, so it's not safe to just * resize the underlying &struct sbitmap. */ void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth); /** * __sbitmap_queue_get() - Try to allocate a free bit from a &struct * sbitmap_queue with preemption already disabled. * @sbq: Bitmap queue to allocate from. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int __sbitmap_queue_get(struct sbitmap_queue *sbq); /** * __sbitmap_queue_get_batch() - Try to allocate a batch of free bits * @sbq: Bitmap queue to allocate from. * @nr_tags: number of tags requested * @offset: offset to add to returned bits * * Return: Mask of allocated tags, 0 if none are found. Each tag allocated is * a bit in the mask returned, and the caller must add @offset to the value to * get the absolute tag value. */ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, unsigned int *offset); /** * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct * sbitmap_queue, limiting the depth used from each word, with preemption * already disabled. * @sbq: Bitmap queue to allocate from. * @shallow_depth: The maximum number of bits to allocate from a single word. * See sbitmap_get_shallow(). * * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after * initializing @sbq. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, unsigned int shallow_depth); /** * sbitmap_queue_get() - Try to allocate a free bit from a &struct * sbitmap_queue. * @sbq: Bitmap queue to allocate from. * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to * sbitmap_queue_clear()). * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ static inline int sbitmap_queue_get(struct sbitmap_queue *sbq, unsigned int *cpu) { int nr; *cpu = get_cpu(); nr = __sbitmap_queue_get(sbq); put_cpu(); return nr; } /** * sbitmap_queue_min_shallow_depth() - Inform a &struct sbitmap_queue of the * minimum shallow depth that will be used. * @sbq: Bitmap queue in question. * @min_shallow_depth: The minimum shallow depth that will be passed to * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). * * sbitmap_queue_clear() batches wakeups as an optimization. The batch size * depends on the depth of the bitmap. Since the shallow allocation functions * effectively operate with a different depth, the shallow depth must be taken * into account when calculating the batch size. This function must be called * with the minimum shallow depth that will be used. Failure to do so can result * in missed wakeups. */ void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq, unsigned int min_shallow_depth); /** * sbitmap_queue_clear() - Free an allocated bit and wake up waiters on a * &struct sbitmap_queue. * @sbq: Bitmap to free from. * @nr: Bit number to free. * @cpu: CPU the bit was allocated on. */ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, unsigned int cpu); /** * sbitmap_queue_clear_batch() - Free a batch of allocated bits * &struct sbitmap_queue. * @sbq: Bitmap to free from. * @offset: offset for each tag in array * @tags: array of tags * @nr_tags: number of tags in array */ void sbitmap_queue_clear_batch(struct sbitmap_queue *sbq, int offset, int *tags, int nr_tags); static inline int sbq_index_inc(int index) { return (index + 1) & (SBQ_WAIT_QUEUES - 1); } static inline void sbq_index_atomic_inc(atomic_t *index) { int old = atomic_read(index); int new = sbq_index_inc(old); atomic_cmpxchg(index, old, new); } /** * sbq_wait_ptr() - Get the next wait queue to use for a &struct * sbitmap_queue. * @sbq: Bitmap queue to wait on. * @wait_index: A counter per "user" of @sbq. */ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, atomic_t *wait_index) { struct sbq_wait_state *ws; ws = &sbq->ws[atomic_read(wait_index)]; sbq_index_atomic_inc(wait_index); return ws; } /** * sbitmap_queue_wake_all() - Wake up everything waiting on a &struct * sbitmap_queue. * @sbq: Bitmap queue to wake up. */ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); /** * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue * on a &struct sbitmap_queue. * @sbq: Bitmap queue to wake up. * @nr: Number of bits cleared. */ void sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr); /** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct * seq_file. * @sbq: Bitmap queue to show. * @m: struct seq_file to write to. * * This is intended for debugging. The format may change at any time. */ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m); struct sbq_wait { struct sbitmap_queue *sbq; /* if set, sbq_wait is accounted */ struct wait_queue_entry wait; }; #define DEFINE_SBQ_WAIT(name) \ struct sbq_wait name = { \ .sbq = NULL, \ .wait = { \ .private = current, \ .func = autoremove_wake_function, \ .entry = LIST_HEAD_INIT((name).wait.entry), \ } \ } /* * Wrapper around prepare_to_wait_exclusive(), which maintains some extra * internal state. */ void sbitmap_prepare_to_wait(struct sbitmap_queue *sbq, struct sbq_wait_state *ws, struct sbq_wait *sbq_wait, int state); /* * Must be paired with sbitmap_prepare_to_wait(). */ void sbitmap_finish_wait(struct sbitmap_queue *sbq, struct sbq_wait_state *ws, struct sbq_wait *sbq_wait); /* * Wrapper around add_wait_queue(), which maintains some extra internal state */ void sbitmap_add_wait_queue(struct sbitmap_queue *sbq, struct sbq_wait_state *ws, struct sbq_wait *sbq_wait); /* * Must be paired with sbitmap_add_wait_queue() */ void sbitmap_del_wait_queue(struct sbq_wait *sbq_wait); #endif /* __LINUX_SCALE_BITMAP_H */ |
| 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Internal header to deal with irq_desc->status which will be renamed * to irq_desc->settings. */ enum { _IRQ_DEFAULT_INIT_FLAGS = IRQ_DEFAULT_INIT_FLAGS, _IRQ_PER_CPU = IRQ_PER_CPU, _IRQ_LEVEL = IRQ_LEVEL, _IRQ_NOPROBE = IRQ_NOPROBE, _IRQ_NOREQUEST = IRQ_NOREQUEST, _IRQ_NOTHREAD = IRQ_NOTHREAD, _IRQ_NOAUTOEN = IRQ_NOAUTOEN, _IRQ_NO_BALANCING = IRQ_NO_BALANCING, _IRQ_NESTED_THREAD = IRQ_NESTED_THREAD, _IRQ_PER_CPU_DEVID = IRQ_PER_CPU_DEVID, _IRQ_IS_POLLED = IRQ_IS_POLLED, _IRQ_DISABLE_UNLAZY = IRQ_DISABLE_UNLAZY, _IRQ_HIDDEN = IRQ_HIDDEN, _IRQ_NO_DEBUG = IRQ_NO_DEBUG, _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK, }; #define IRQ_PER_CPU GOT_YOU_MORON #define IRQ_NO_BALANCING GOT_YOU_MORON #define IRQ_LEVEL GOT_YOU_MORON #define IRQ_NOPROBE GOT_YOU_MORON #define IRQ_NOREQUEST GOT_YOU_MORON #define IRQ_NOTHREAD GOT_YOU_MORON #define IRQ_NOAUTOEN GOT_YOU_MORON #define IRQ_NESTED_THREAD GOT_YOU_MORON #define IRQ_PER_CPU_DEVID GOT_YOU_MORON #define IRQ_IS_POLLED GOT_YOU_MORON #define IRQ_DISABLE_UNLAZY GOT_YOU_MORON #define IRQ_HIDDEN GOT_YOU_MORON #define IRQ_NO_DEBUG GOT_YOU_MORON #undef IRQF_MODIFY_MASK #define IRQF_MODIFY_MASK GOT_YOU_MORON static inline void irq_settings_clr_and_set(struct irq_desc *desc, u32 clr, u32 set) { desc->status_use_accessors &= ~(clr & _IRQF_MODIFY_MASK); desc->status_use_accessors |= (set & _IRQF_MODIFY_MASK); } static inline bool irq_settings_is_per_cpu(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_PER_CPU; } static inline bool irq_settings_is_per_cpu_devid(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_PER_CPU_DEVID; } static inline void irq_settings_set_per_cpu(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_PER_CPU; } static inline void irq_settings_set_no_balancing(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NO_BALANCING; } static inline bool irq_settings_has_no_balance_set(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_NO_BALANCING; } static inline u32 irq_settings_get_trigger_mask(struct irq_desc *desc) { return desc->status_use_accessors & IRQ_TYPE_SENSE_MASK; } static inline void irq_settings_set_trigger_mask(struct irq_desc *desc, u32 mask) { desc->status_use_accessors &= ~IRQ_TYPE_SENSE_MASK; desc->status_use_accessors |= mask & IRQ_TYPE_SENSE_MASK; } static inline bool irq_settings_is_level(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_LEVEL; } static inline void irq_settings_clr_level(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_LEVEL; } static inline void irq_settings_set_level(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_LEVEL; } static inline bool irq_settings_can_request(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOREQUEST); } static inline void irq_settings_clr_norequest(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOREQUEST; } static inline void irq_settings_set_norequest(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOREQUEST; } static inline bool irq_settings_can_thread(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOTHREAD); } static inline void irq_settings_clr_nothread(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOTHREAD; } static inline void irq_settings_set_nothread(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOTHREAD; } static inline bool irq_settings_can_probe(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOPROBE); } static inline void irq_settings_clr_noprobe(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOPROBE; } static inline void irq_settings_set_noprobe(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOPROBE; } static inline bool irq_settings_can_autoenable(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOAUTOEN); } static inline bool irq_settings_is_nested_thread(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_NESTED_THREAD; } static inline bool irq_settings_is_polled(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_IS_POLLED; } static inline bool irq_settings_disable_unlazy(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_DISABLE_UNLAZY; } static inline void irq_settings_clr_disable_unlazy(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_DISABLE_UNLAZY; } static inline bool irq_settings_is_hidden(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_HIDDEN; } static inline void irq_settings_set_no_debug(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NO_DEBUG; } static inline bool irq_settings_no_debug(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_NO_DEBUG; } |
| 5 5 5 5 5 5 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2016 Thomas Gleixner. * Copyright (C) 2016-2017 Christoph Hellwig. */ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/cpu.h> #include <linux/sort.h> #include <linux/group_cpus.h> #ifdef CONFIG_SMP static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, unsigned int cpus_per_grp) { const struct cpumask *siblmsk; int cpu, sibl; for ( ; cpus_per_grp > 0; ) { cpu = cpumask_first(nmsk); /* Should not happen, but I'm too lazy to think about it */ if (cpu >= nr_cpu_ids) return; cpumask_clear_cpu(cpu, nmsk); cpumask_set_cpu(cpu, irqmsk); cpus_per_grp--; /* If the cpu has siblings, use them first */ siblmsk = topology_sibling_cpumask(cpu); for (sibl = -1; cpus_per_grp > 0; ) { sibl = cpumask_next(sibl, siblmsk); if (sibl >= nr_cpu_ids) break; if (!cpumask_test_and_clear_cpu(sibl, nmsk)) continue; cpumask_set_cpu(sibl, irqmsk); cpus_per_grp--; } } } static cpumask_var_t *alloc_node_to_cpumask(void) { cpumask_var_t *masks; int node; masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL); if (!masks) return NULL; for (node = 0; node < nr_node_ids; node++) { if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL)) goto out_unwind; } return masks; out_unwind: while (--node >= 0) free_cpumask_var(masks[node]); kfree(masks); return NULL; } static void free_node_to_cpumask(cpumask_var_t *masks) { int node; for (node = 0; node < nr_node_ids; node++) free_cpumask_var(masks[node]); kfree(masks); } static void build_node_to_cpumask(cpumask_var_t *masks) { int cpu; for_each_possible_cpu(cpu) cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); } static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask, const struct cpumask *mask, nodemask_t *nodemsk) { int n, nodes = 0; /* Calculate the number of nodes in the supplied affinity mask */ for_each_node(n) { if (cpumask_intersects(mask, node_to_cpumask[n])) { node_set(n, *nodemsk); nodes++; } } return nodes; } struct node_groups { unsigned id; union { unsigned ngroups; unsigned ncpus; }; }; static int ncpus_cmp_func(const void *l, const void *r) { const struct node_groups *ln = l; const struct node_groups *rn = r; return ln->ncpus - rn->ncpus; } /* * Allocate group number for each node, so that for each node: * * 1) the allocated number is >= 1 * * 2) the allocated number is <= active CPU number of this node * * The actual allocated total groups may be less than @numgrps when * active total CPU number is less than @numgrps. * * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]' * for each node. */ static void alloc_nodes_groups(unsigned int numgrps, cpumask_var_t *node_to_cpumask, const struct cpumask *cpu_mask, const nodemask_t nodemsk, struct cpumask *nmsk, struct node_groups *node_groups) { unsigned n, remaining_ncpus = 0; for (n = 0; n < nr_node_ids; n++) { node_groups[n].id = n; node_groups[n].ncpus = UINT_MAX; } for_each_node_mask(n, nodemsk) { unsigned ncpus; cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); ncpus = cpumask_weight(nmsk); if (!ncpus) continue; remaining_ncpus += ncpus; node_groups[n].ncpus = ncpus; } numgrps = min_t(unsigned, remaining_ncpus, numgrps); sort(node_groups, nr_node_ids, sizeof(node_groups[0]), ncpus_cmp_func, NULL); /* * Allocate groups for each node according to the ratio of this * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is * bigger than number of active numa nodes. Always start the * allocation from the node with minimized nr_cpus. * * This way guarantees that each active node gets allocated at * least one group, and the theory is simple: over-allocation * is only done when this node is assigned by one group, so * other nodes will be allocated >= 1 groups, since 'numgrps' is * bigger than number of numa nodes. * * One perfect invariant is that number of allocated groups for * each node is <= CPU count of this node: * * 1) suppose there are two nodes: A and B * ncpu(X) is CPU count of node X * grps(X) is the group count allocated to node X via this * algorithm * * ncpu(A) <= ncpu(B) * ncpu(A) + ncpu(B) = N * grps(A) + grps(B) = G * * grps(A) = max(1, round_down(G * ncpu(A) / N)) * grps(B) = G - grps(A) * * both N and G are integer, and 2 <= G <= N, suppose * G = N - delta, and 0 <= delta <= N - 2 * * 2) obviously grps(A) <= ncpu(A) because: * * if grps(A) is 1, then grps(A) <= ncpu(A) given * ncpu(A) >= 1 * * otherwise, * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N * * 3) prove how grps(B) <= ncpu(B): * * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be * over-allocated, so grps(B) <= ncpu(B), * * otherwise: * * grps(A) = * round_down(G * ncpu(A) / N) = * round_down((N - delta) * ncpu(A) / N) = * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >= * round_down((N * ncpu(A) - delta * N) / N) = * cpu(A) - delta * * then: * * grps(A) - G >= ncpu(A) - delta - G * => * G - grps(A) <= G + delta - ncpu(A) * => * grps(B) <= N - ncpu(A) * => * grps(B) <= cpu(B) * * For nodes >= 3, it can be thought as one node and another big * node given that is exactly what this algorithm is implemented, * and we always re-calculate 'remaining_ncpus' & 'numgrps', and * finally for each node X: grps(X) <= ncpu(X). * */ for (n = 0; n < nr_node_ids; n++) { unsigned ngroups, ncpus; if (node_groups[n].ncpus == UINT_MAX) continue; WARN_ON_ONCE(numgrps == 0); ncpus = node_groups[n].ncpus; ngroups = max_t(unsigned, 1, numgrps * ncpus / remaining_ncpus); WARN_ON_ONCE(ngroups > ncpus); node_groups[n].ngroups = ngroups; remaining_ncpus -= ncpus; numgrps -= ngroups; } } static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, cpumask_var_t *node_to_cpumask, const struct cpumask *cpu_mask, struct cpumask *nmsk, struct cpumask *masks) { unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0; unsigned int last_grp = numgrps; unsigned int curgrp = startgrp; nodemask_t nodemsk = NODE_MASK_NONE; struct node_groups *node_groups; if (cpumask_empty(cpu_mask)) return 0; nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk); /* * If the number of nodes in the mask is greater than or equal the * number of groups we just spread the groups across the nodes. */ if (numgrps <= nodes) { for_each_node_mask(n, nodemsk) { /* Ensure that only CPUs which are in both masks are set */ cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); cpumask_or(&masks[curgrp], &masks[curgrp], nmsk); if (++curgrp == last_grp) curgrp = 0; } return numgrps; } node_groups = kcalloc(nr_node_ids, sizeof(struct node_groups), GFP_KERNEL); if (!node_groups) return -ENOMEM; /* allocate group number for each node */ alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask, nodemsk, nmsk, node_groups); for (i = 0; i < nr_node_ids; i++) { unsigned int ncpus, v; struct node_groups *nv = &node_groups[i]; if (nv->ngroups == UINT_MAX) continue; /* Get the cpus on this node which are in the mask */ cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]); ncpus = cpumask_weight(nmsk); if (!ncpus) continue; WARN_ON_ONCE(nv->ngroups > ncpus); /* Account for rounding errors */ extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups); /* Spread allocated groups on CPUs of the current node */ for (v = 0; v < nv->ngroups; v++, curgrp++) { cpus_per_grp = ncpus / nv->ngroups; /* Account for extra groups to compensate rounding errors */ if (extra_grps) { cpus_per_grp++; --extra_grps; } /* * wrapping has to be considered given 'startgrp' * may start anywhere */ if (curgrp >= last_grp) curgrp = 0; grp_spread_init_one(&masks[curgrp], nmsk, cpus_per_grp); } done += nv->ngroups; } kfree(node_groups); return done; } /** * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality * @numgrps: number of groups * * Return: cpumask array if successful, NULL otherwise. And each element * includes CPUs assigned to this group * * Try to put close CPUs from viewpoint of CPU and NUMA locality into * same group, and run two-stage grouping: * 1) allocate present CPUs on these groups evenly first * 2) allocate other possible CPUs on these groups evenly * * We guarantee in the resulted grouping that all CPUs are covered, and * no same CPU is assigned to multiple groups */ struct cpumask *group_cpus_evenly(unsigned int numgrps) { unsigned int curgrp = 0, nr_present = 0, nr_others = 0; cpumask_var_t *node_to_cpumask; cpumask_var_t nmsk, npresmsk; int ret = -ENOMEM; struct cpumask *masks = NULL; if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) return NULL; if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL)) goto fail_nmsk; node_to_cpumask = alloc_node_to_cpumask(); if (!node_to_cpumask) goto fail_npresmsk; masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); if (!masks) goto fail_node_to_cpumask; build_node_to_cpumask(node_to_cpumask); /* * Make a local cache of 'cpu_present_mask', so the two stages * spread can observe consistent 'cpu_present_mask' without holding * cpu hotplug lock, then we can reduce deadlock risk with cpu * hotplug code. * * Here CPU hotplug may happen when reading `cpu_present_mask`, and * we can live with the case because it only affects that hotplug * CPU is handled in the 1st or 2nd stage, and either way is correct * from API user viewpoint since 2-stage spread is sort of * optimization. */ cpumask_copy(npresmsk, data_race(cpu_present_mask)); /* grouping present CPUs first */ ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, npresmsk, nmsk, masks); if (ret < 0) goto fail_build_affinity; nr_present = ret; /* * Allocate non present CPUs starting from the next group to be * handled. If the grouping of present CPUs already exhausted the * group space, assign the non present CPUs to the already * allocated out groups. */ if (nr_present >= numgrps) curgrp = 0; else curgrp = nr_present; cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk); ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, npresmsk, nmsk, masks); if (ret >= 0) nr_others = ret; fail_build_affinity: if (ret >= 0) WARN_ON(nr_present + nr_others < numgrps); fail_node_to_cpumask: free_node_to_cpumask(node_to_cpumask); fail_npresmsk: free_cpumask_var(npresmsk); fail_nmsk: free_cpumask_var(nmsk); if (ret < 0) { kfree(masks); return NULL; } return masks; } #else /* CONFIG_SMP */ struct cpumask *group_cpus_evenly(unsigned int numgrps) { struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); if (!masks) return NULL; /* assign all CPUs(cpu 0) to the 1st group only */ cpumask_copy(&masks[0], cpu_possible_mask); return masks; } #endif /* CONFIG_SMP */ EXPORT_SYMBOL_GPL(group_cpus_evenly); |
| 13 12 10 12 1 1 5 4 3 2 2 2 1 1 2 8 6 6 6 11 11 11 9 8 1 8 3 3 3 1 3 2 2 3 3 3 1 3 2 2 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 | // SPDX-License-Identifier: GPL-2.0-or-later /* * (C) 2012 Pablo Neira Ayuso <pablo@netfilter.org> * * This software has been sponsored by Vyatta Inc. <http://www.vyatta.com> */ #include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/skbuff.h> #include <linux/netlink.h> #include <linux/rculist.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/list.h> #include <linux/errno.h> #include <linux/capability.h> #include <net/netlink.h> #include <net/sock.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_expect.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nfnetlink_conntrack.h> #include <linux/netfilter/nfnetlink_cthelper.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); MODULE_DESCRIPTION("nfnl_cthelper: User-space connection tracking helpers"); struct nfnl_cthelper { struct list_head list; struct nf_conntrack_helper helper; }; static LIST_HEAD(nfnl_cthelper_list); static int nfnl_userspace_cthelper(struct sk_buff *skb, unsigned int protoff, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { const struct nf_conn_help *help; struct nf_conntrack_helper *helper; help = nfct_help(ct); if (help == NULL) return NF_DROP; /* rcu_read_lock()ed by nf_hook_thresh */ helper = rcu_dereference(help->helper); if (helper == NULL) return NF_DROP; /* This is a user-space helper not yet configured, skip. */ if ((helper->flags & (NF_CT_HELPER_F_USERSPACE | NF_CT_HELPER_F_CONFIGURED)) == NF_CT_HELPER_F_USERSPACE) return NF_ACCEPT; /* If the user-space helper is not available, don't block traffic. */ return NF_QUEUE_NR(helper->queue_num) | NF_VERDICT_FLAG_QUEUE_BYPASS; } static const struct nla_policy nfnl_cthelper_tuple_pol[NFCTH_TUPLE_MAX+1] = { [NFCTH_TUPLE_L3PROTONUM] = { .type = NLA_U16, }, [NFCTH_TUPLE_L4PROTONUM] = { .type = NLA_U8, }, }; static int nfnl_cthelper_parse_tuple(struct nf_conntrack_tuple *tuple, const struct nlattr *attr) { int err; struct nlattr *tb[NFCTH_TUPLE_MAX+1]; err = nla_parse_nested_deprecated(tb, NFCTH_TUPLE_MAX, attr, nfnl_cthelper_tuple_pol, NULL); if (err < 0) return err; if (!tb[NFCTH_TUPLE_L3PROTONUM] || !tb[NFCTH_TUPLE_L4PROTONUM]) return -EINVAL; /* Not all fields are initialized so first zero the tuple */ memset(tuple, 0, sizeof(struct nf_conntrack_tuple)); tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM])); tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]); return 0; } static int nfnl_cthelper_from_nlattr(struct nlattr *attr, struct nf_conn *ct) { struct nf_conn_help *help = nfct_help(ct); const struct nf_conntrack_helper *helper; if (attr == NULL) return -EINVAL; helper = rcu_dereference(help->helper); if (!helper || helper->data_len == 0) return -EINVAL; nla_memcpy(help->data, attr, sizeof(help->data)); return 0; } static int nfnl_cthelper_to_nlattr(struct sk_buff *skb, const struct nf_conn *ct) { const struct nf_conn_help *help = nfct_help(ct); const struct nf_conntrack_helper *helper; helper = rcu_dereference(help->helper); if (helper && helper->data_len && nla_put(skb, CTA_HELP_INFO, helper->data_len, &help->data)) goto nla_put_failure; return 0; nla_put_failure: return -ENOSPC; } static const struct nla_policy nfnl_cthelper_expect_pol[NFCTH_POLICY_MAX+1] = { [NFCTH_POLICY_NAME] = { .type = NLA_NUL_STRING, .len = NF_CT_HELPER_NAME_LEN-1 }, [NFCTH_POLICY_EXPECT_MAX] = { .type = NLA_U32, }, [NFCTH_POLICY_EXPECT_TIMEOUT] = { .type = NLA_U32, }, }; static int nfnl_cthelper_expect_policy(struct nf_conntrack_expect_policy *expect_policy, const struct nlattr *attr) { int err; struct nlattr *tb[NFCTH_POLICY_MAX+1]; err = nla_parse_nested_deprecated(tb, NFCTH_POLICY_MAX, attr, nfnl_cthelper_expect_pol, NULL); if (err < 0) return err; if (!tb[NFCTH_POLICY_NAME] || !tb[NFCTH_POLICY_EXPECT_MAX] || !tb[NFCTH_POLICY_EXPECT_TIMEOUT]) return -EINVAL; nla_strscpy(expect_policy->name, tb[NFCTH_POLICY_NAME], NF_CT_HELPER_NAME_LEN); expect_policy->max_expected = ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_MAX])); if (expect_policy->max_expected > NF_CT_EXPECT_MAX_CNT) return -EINVAL; expect_policy->timeout = ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_TIMEOUT])); return 0; } static const struct nla_policy nfnl_cthelper_expect_policy_set[NFCTH_POLICY_SET_MAX+1] = { [NFCTH_POLICY_SET_NUM] = { .type = NLA_U32, }, }; static int nfnl_cthelper_parse_expect_policy(struct nf_conntrack_helper *helper, const struct nlattr *attr) { int i, ret; struct nf_conntrack_expect_policy *expect_policy; struct nlattr *tb[NFCTH_POLICY_SET_MAX+1]; unsigned int class_max; ret = nla_parse_nested_deprecated(tb, NFCTH_POLICY_SET_MAX, attr, nfnl_cthelper_expect_policy_set, NULL); if (ret < 0) return ret; if (!tb[NFCTH_POLICY_SET_NUM]) return -EINVAL; class_max = ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM])); if (class_max == 0) return -EINVAL; if (class_max > NF_CT_MAX_EXPECT_CLASSES) return -EOVERFLOW; expect_policy = kcalloc(class_max, sizeof(struct nf_conntrack_expect_policy), GFP_KERNEL); if (expect_policy == NULL) return -ENOMEM; for (i = 0; i < class_max; i++) { if (!tb[NFCTH_POLICY_SET+i]) goto err; ret = nfnl_cthelper_expect_policy(&expect_policy[i], tb[NFCTH_POLICY_SET+i]); if (ret < 0) goto err; } helper->expect_class_max = class_max - 1; helper->expect_policy = expect_policy; return 0; err: kfree(expect_policy); return -EINVAL; } static int nfnl_cthelper_create(const struct nlattr * const tb[], struct nf_conntrack_tuple *tuple) { struct nf_conntrack_helper *helper; struct nfnl_cthelper *nfcth; unsigned int size; int ret; if (!tb[NFCTH_TUPLE] || !tb[NFCTH_POLICY] || !tb[NFCTH_PRIV_DATA_LEN]) return -EINVAL; nfcth = kzalloc(sizeof(*nfcth), GFP_KERNEL); if (nfcth == NULL) return -ENOMEM; helper = &nfcth->helper; ret = nfnl_cthelper_parse_expect_policy(helper, tb[NFCTH_POLICY]); if (ret < 0) goto err1; nla_strscpy(helper->name, tb[NFCTH_NAME], NF_CT_HELPER_NAME_LEN); size = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN])); if (size > sizeof_field(struct nf_conn_help, data)) { ret = -ENOMEM; goto err2; } helper->data_len = size; helper->flags |= NF_CT_HELPER_F_USERSPACE; memcpy(&helper->tuple, tuple, sizeof(struct nf_conntrack_tuple)); helper->me = THIS_MODULE; helper->help = nfnl_userspace_cthelper; helper->from_nlattr = nfnl_cthelper_from_nlattr; helper->to_nlattr = nfnl_cthelper_to_nlattr; /* Default to queue number zero, this can be updated at any time. */ if (tb[NFCTH_QUEUE_NUM]) helper->queue_num = ntohl(nla_get_be32(tb[NFCTH_QUEUE_NUM])); if (tb[NFCTH_STATUS]) { int status = ntohl(nla_get_be32(tb[NFCTH_STATUS])); switch(status) { case NFCT_HELPER_STATUS_ENABLED: helper->flags |= NF_CT_HELPER_F_CONFIGURED; break; case NFCT_HELPER_STATUS_DISABLED: helper->flags &= ~NF_CT_HELPER_F_CONFIGURED; break; } } ret = nf_conntrack_helper_register(helper); if (ret < 0) goto err2; list_add_tail(&nfcth->list, &nfnl_cthelper_list); return 0; err2: kfree(helper->expect_policy); err1: kfree(nfcth); return ret; } static int nfnl_cthelper_update_policy_one(const struct nf_conntrack_expect_policy *policy, struct nf_conntrack_expect_policy *new_policy, const struct nlattr *attr) { struct nlattr *tb[NFCTH_POLICY_MAX + 1]; int err; err = nla_parse_nested_deprecated(tb, NFCTH_POLICY_MAX, attr, nfnl_cthelper_expect_pol, NULL); if (err < 0) return err; if (!tb[NFCTH_POLICY_NAME] || !tb[NFCTH_POLICY_EXPECT_MAX] || !tb[NFCTH_POLICY_EXPECT_TIMEOUT]) return -EINVAL; if (nla_strcmp(tb[NFCTH_POLICY_NAME], policy->name)) return -EBUSY; new_policy->max_expected = ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_MAX])); if (new_policy->max_expected > NF_CT_EXPECT_MAX_CNT) return -EINVAL; new_policy->timeout = ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_TIMEOUT])); return 0; } static int nfnl_cthelper_update_policy_all(struct nlattr *tb[], struct nf_conntrack_helper *helper) { struct nf_conntrack_expect_policy *new_policy; struct nf_conntrack_expect_policy *policy; int i, ret = 0; new_policy = kmalloc_array(helper->expect_class_max + 1, sizeof(*new_policy), GFP_KERNEL); if (!new_policy) return -ENOMEM; /* Check first that all policy attributes are well-formed, so we don't * leave things in inconsistent state on errors. */ for (i = 0; i < helper->expect_class_max + 1; i++) { if (!tb[NFCTH_POLICY_SET + i]) { ret = -EINVAL; goto err; } ret = nfnl_cthelper_update_policy_one(&helper->expect_policy[i], &new_policy[i], tb[NFCTH_POLICY_SET + i]); if (ret < 0) goto err; } /* Now we can safely update them. */ for (i = 0; i < helper->expect_class_max + 1; i++) { policy = (struct nf_conntrack_expect_policy *) &helper->expect_policy[i]; policy->max_expected = new_policy->max_expected; policy->timeout = new_policy->timeout; } err: kfree(new_policy); return ret; } static int nfnl_cthelper_update_policy(struct nf_conntrack_helper *helper, const struct nlattr *attr) { struct nlattr *tb[NFCTH_POLICY_SET_MAX + 1]; unsigned int class_max; int err; err = nla_parse_nested_deprecated(tb, NFCTH_POLICY_SET_MAX, attr, nfnl_cthelper_expect_policy_set, NULL); if (err < 0) return err; if (!tb[NFCTH_POLICY_SET_NUM]) return -EINVAL; class_max = ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM])); if (helper->expect_class_max + 1 != class_max) return -EBUSY; return nfnl_cthelper_update_policy_all(tb, helper); } static int nfnl_cthelper_update(const struct nlattr * const tb[], struct nf_conntrack_helper *helper) { u32 size; int ret; if (tb[NFCTH_PRIV_DATA_LEN]) { size = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN])); if (size != helper->data_len) return -EBUSY; } if (tb[NFCTH_POLICY]) { ret = nfnl_cthelper_update_policy(helper, tb[NFCTH_POLICY]); if (ret < 0) return ret; } if (tb[NFCTH_QUEUE_NUM]) helper->queue_num = ntohl(nla_get_be32(tb[NFCTH_QUEUE_NUM])); if (tb[NFCTH_STATUS]) { int status = ntohl(nla_get_be32(tb[NFCTH_STATUS])); switch(status) { case NFCT_HELPER_STATUS_ENABLED: helper->flags |= NF_CT_HELPER_F_CONFIGURED; break; case NFCT_HELPER_STATUS_DISABLED: helper->flags &= ~NF_CT_HELPER_F_CONFIGURED; break; } } return 0; } static int nfnl_cthelper_new(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const tb[]) { const char *helper_name; struct nf_conntrack_helper *cur, *helper = NULL; struct nf_conntrack_tuple tuple; struct nfnl_cthelper *nlcth; int ret = 0; if (!capable(CAP_NET_ADMIN)) return -EPERM; if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE]) return -EINVAL; helper_name = nla_data(tb[NFCTH_NAME]); ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); if (ret < 0) return ret; list_for_each_entry(nlcth, &nfnl_cthelper_list, list) { cur = &nlcth->helper; if (strncmp(cur->name, helper_name, NF_CT_HELPER_NAME_LEN)) continue; if ((tuple.src.l3num != cur->tuple.src.l3num || tuple.dst.protonum != cur->tuple.dst.protonum)) continue; if (info->nlh->nlmsg_flags & NLM_F_EXCL) return -EEXIST; helper = cur; break; } if (helper == NULL) ret = nfnl_cthelper_create(tb, &tuple); else ret = nfnl_cthelper_update(tb, helper); return ret; } static int nfnl_cthelper_dump_tuple(struct sk_buff *skb, struct nf_conntrack_helper *helper) { struct nlattr *nest_parms; nest_parms = nla_nest_start(skb, NFCTH_TUPLE); if (nest_parms == NULL) goto nla_put_failure; if (nla_put_be16(skb, NFCTH_TUPLE_L3PROTONUM, htons(helper->tuple.src.l3num))) goto nla_put_failure; if (nla_put_u8(skb, NFCTH_TUPLE_L4PROTONUM, helper->tuple.dst.protonum)) goto nla_put_failure; nla_nest_end(skb, nest_parms); return 0; nla_put_failure: return -1; } static int nfnl_cthelper_dump_policy(struct sk_buff *skb, struct nf_conntrack_helper *helper) { int i; struct nlattr *nest_parms1, *nest_parms2; nest_parms1 = nla_nest_start(skb, NFCTH_POLICY); if (nest_parms1 == NULL) goto nla_put_failure; if (nla_put_be32(skb, NFCTH_POLICY_SET_NUM, htonl(helper->expect_class_max + 1))) goto nla_put_failure; for (i = 0; i < helper->expect_class_max + 1; i++) { nest_parms2 = nla_nest_start(skb, (NFCTH_POLICY_SET + i)); if (nest_parms2 == NULL) goto nla_put_failure; if (nla_put_string(skb, NFCTH_POLICY_NAME, helper->expect_policy[i].name)) goto nla_put_failure; if (nla_put_be32(skb, NFCTH_POLICY_EXPECT_MAX, htonl(helper->expect_policy[i].max_expected))) goto nla_put_failure; if (nla_put_be32(skb, NFCTH_POLICY_EXPECT_TIMEOUT, htonl(helper->expect_policy[i].timeout))) goto nla_put_failure; nla_nest_end(skb, nest_parms2); } nla_nest_end(skb, nest_parms1); return 0; nla_put_failure: return -1; } static int nfnl_cthelper_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, int event, struct nf_conntrack_helper *helper) { struct nlmsghdr *nlh; unsigned int flags = portid ? NLM_F_MULTI : 0; int status; event = nfnl_msg_type(NFNL_SUBSYS_CTHELPER, event); nlh = nfnl_msg_put(skb, portid, seq, event, flags, AF_UNSPEC, NFNETLINK_V0, 0); if (!nlh) goto nlmsg_failure; if (nla_put_string(skb, NFCTH_NAME, helper->name)) goto nla_put_failure; if (nla_put_be32(skb, NFCTH_QUEUE_NUM, htonl(helper->queue_num))) goto nla_put_failure; if (nfnl_cthelper_dump_tuple(skb, helper) < 0) goto nla_put_failure; if (nfnl_cthelper_dump_policy(skb, helper) < 0) goto nla_put_failure; if (nla_put_be32(skb, NFCTH_PRIV_DATA_LEN, htonl(helper->data_len))) goto nla_put_failure; if (helper->flags & NF_CT_HELPER_F_CONFIGURED) status = NFCT_HELPER_STATUS_ENABLED; else status = NFCT_HELPER_STATUS_DISABLED; if (nla_put_be32(skb, NFCTH_STATUS, htonl(status))) goto nla_put_failure; nlmsg_end(skb, nlh); return skb->len; nlmsg_failure: nla_put_failure: nlmsg_cancel(skb, nlh); return -1; } static int nfnl_cthelper_dump_table(struct sk_buff *skb, struct netlink_callback *cb) { struct nf_conntrack_helper *cur, *last; rcu_read_lock(); last = (struct nf_conntrack_helper *)cb->args[1]; for (; cb->args[0] < nf_ct_helper_hsize; cb->args[0]++) { restart: hlist_for_each_entry_rcu(cur, &nf_ct_helper_hash[cb->args[0]], hnode) { /* skip non-userspace conntrack helpers. */ if (!(cur->flags & NF_CT_HELPER_F_USERSPACE)) continue; if (cb->args[1]) { if (cur != last) continue; cb->args[1] = 0; } if (nfnl_cthelper_fill_info(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NFNL_MSG_TYPE(cb->nlh->nlmsg_type), NFNL_MSG_CTHELPER_NEW, cur) < 0) { cb->args[1] = (unsigned long)cur; goto out; } } } if (cb->args[1]) { cb->args[1] = 0; goto restart; } out: rcu_read_unlock(); return skb->len; } static int nfnl_cthelper_get(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const tb[]) { int ret = -ENOENT; struct nf_conntrack_helper *cur; struct sk_buff *skb2; char *helper_name = NULL; struct nf_conntrack_tuple tuple; struct nfnl_cthelper *nlcth; bool tuple_set = false; if (!capable(CAP_NET_ADMIN)) return -EPERM; if (info->nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = nfnl_cthelper_dump_table, }; return netlink_dump_start(info->sk, skb, info->nlh, &c); } if (tb[NFCTH_NAME]) helper_name = nla_data(tb[NFCTH_NAME]); if (tb[NFCTH_TUPLE]) { ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); if (ret < 0) return ret; tuple_set = true; } list_for_each_entry(nlcth, &nfnl_cthelper_list, list) { cur = &nlcth->helper; if (helper_name && strncmp(cur->name, helper_name, NF_CT_HELPER_NAME_LEN)) continue; if (tuple_set && (tuple.src.l3num != cur->tuple.src.l3num || tuple.dst.protonum != cur->tuple.dst.protonum)) continue; skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (skb2 == NULL) { ret = -ENOMEM; break; } ret = nfnl_cthelper_fill_info(skb2, NETLINK_CB(skb).portid, info->nlh->nlmsg_seq, NFNL_MSG_TYPE(info->nlh->nlmsg_type), NFNL_MSG_CTHELPER_NEW, cur); if (ret <= 0) { kfree_skb(skb2); break; } ret = nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid); break; } return ret; } static int nfnl_cthelper_del(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const tb[]) { char *helper_name = NULL; struct nf_conntrack_helper *cur; struct nf_conntrack_tuple tuple; bool tuple_set = false, found = false; struct nfnl_cthelper *nlcth, *n; int j = 0, ret; if (!capable(CAP_NET_ADMIN)) return -EPERM; if (tb[NFCTH_NAME]) helper_name = nla_data(tb[NFCTH_NAME]); if (tb[NFCTH_TUPLE]) { ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); if (ret < 0) return ret; tuple_set = true; } ret = -ENOENT; list_for_each_entry_safe(nlcth, n, &nfnl_cthelper_list, list) { cur = &nlcth->helper; j++; if (helper_name && strncmp(cur->name, helper_name, NF_CT_HELPER_NAME_LEN)) continue; if (tuple_set && (tuple.src.l3num != cur->tuple.src.l3num || tuple.dst.protonum != cur->tuple.dst.protonum)) continue; if (refcount_dec_if_one(&cur->refcnt)) { found = true; nf_conntrack_helper_unregister(cur); kfree(cur->expect_policy); list_del(&nlcth->list); kfree(nlcth); } else { ret = -EBUSY; } } /* Make sure we return success if we flush and there is no helpers */ return (found || j == 0) ? 0 : ret; } static const struct nla_policy nfnl_cthelper_policy[NFCTH_MAX+1] = { [NFCTH_NAME] = { .type = NLA_NUL_STRING, .len = NF_CT_HELPER_NAME_LEN-1 }, [NFCTH_QUEUE_NUM] = { .type = NLA_U32, }, [NFCTH_PRIV_DATA_LEN] = { .type = NLA_U32, }, [NFCTH_STATUS] = { .type = NLA_U32, }, }; static const struct nfnl_callback nfnl_cthelper_cb[NFNL_MSG_CTHELPER_MAX] = { [NFNL_MSG_CTHELPER_NEW] = { .call = nfnl_cthelper_new, .type = NFNL_CB_MUTEX, .attr_count = NFCTH_MAX, .policy = nfnl_cthelper_policy }, [NFNL_MSG_CTHELPER_GET] = { .call = nfnl_cthelper_get, .type = NFNL_CB_MUTEX, .attr_count = NFCTH_MAX, .policy = nfnl_cthelper_policy }, [NFNL_MSG_CTHELPER_DEL] = { .call = nfnl_cthelper_del, .type = NFNL_CB_MUTEX, .attr_count = NFCTH_MAX, .policy = nfnl_cthelper_policy }, }; static const struct nfnetlink_subsystem nfnl_cthelper_subsys = { .name = "cthelper", .subsys_id = NFNL_SUBSYS_CTHELPER, .cb_count = NFNL_MSG_CTHELPER_MAX, .cb = nfnl_cthelper_cb, }; MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTHELPER); static int __init nfnl_cthelper_init(void) { int ret; ret = nfnetlink_subsys_register(&nfnl_cthelper_subsys); if (ret < 0) { pr_err("nfnl_cthelper: cannot register with nfnetlink.\n"); goto err_out; } return 0; err_out: return ret; } static void __exit nfnl_cthelper_exit(void) { struct nf_conntrack_helper *cur; struct nfnl_cthelper *nlcth, *n; nfnetlink_subsys_unregister(&nfnl_cthelper_subsys); list_for_each_entry_safe(nlcth, n, &nfnl_cthelper_list, list) { cur = &nlcth->helper; nf_conntrack_helper_unregister(cur); kfree(cur->expect_policy); kfree(nlcth); } } module_init(nfnl_cthelper_init); module_exit(nfnl_cthelper_exit); |
| 16 1 1 1 24 2 1 1 3 3 2 2 2 1 1 1 1 1 1 1 1 1 1 3 4 4 1 21 21 21 21 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 7 7 7 7 3 3 3 3 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 | /* * Copyright (C) 2017 Netronome Systems, Inc. * * This software is licensed under the GNU General License Version 2, * June 1991 as shown in the file COPYING in the top-level directory of this * source tree. * * THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" * WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE * OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME * THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. */ #include <linux/debugfs.h> #include <linux/etherdevice.h> #include <linux/ethtool_netlink.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/netdevice.h> #include <linux/slab.h> #include <net/netdev_queues.h> #include <net/netdev_rx_queue.h> #include <net/page_pool/helpers.h> #include <net/netlink.h> #include <net/net_shaper.h> #include <net/netdev_lock.h> #include <net/pkt_cls.h> #include <net/rtnetlink.h> #include <net/udp_tunnel.h> #include <net/busy_poll.h> #include "netdevsim.h" MODULE_IMPORT_NS("NETDEV_INTERNAL"); #define NSIM_RING_SIZE 256 static int nsim_napi_rx(struct nsim_rq *rq, struct sk_buff *skb) { if (skb_queue_len(&rq->skb_queue) > NSIM_RING_SIZE) { dev_kfree_skb_any(skb); return NET_RX_DROP; } skb_queue_tail(&rq->skb_queue, skb); return NET_RX_SUCCESS; } static int nsim_forward_skb(struct net_device *dev, struct sk_buff *skb, struct nsim_rq *rq) { return __dev_forward_skb(dev, skb) ?: nsim_napi_rx(rq, skb); } static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); struct net_device *peer_dev; unsigned int len = skb->len; struct netdevsim *peer_ns; struct netdev_config *cfg; struct nsim_rq *rq; int rxq; rcu_read_lock(); if (!nsim_ipsec_tx(ns, skb)) goto out_drop_free; peer_ns = rcu_dereference(ns->peer); if (!peer_ns) goto out_drop_free; peer_dev = peer_ns->netdev; rxq = skb_get_queue_mapping(skb); if (rxq >= peer_dev->num_rx_queues) rxq = rxq % peer_dev->num_rx_queues; rq = peer_ns->rq[rxq]; cfg = peer_dev->cfg; if (skb_is_nonlinear(skb) && (cfg->hds_config != ETHTOOL_TCP_DATA_SPLIT_ENABLED || (cfg->hds_config == ETHTOOL_TCP_DATA_SPLIT_ENABLED && cfg->hds_thresh > len))) skb_linearize(skb); skb_tx_timestamp(skb); if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) goto out_drop_cnt; if (!hrtimer_active(&rq->napi_timer)) hrtimer_start(&rq->napi_timer, us_to_ktime(5), HRTIMER_MODE_REL); rcu_read_unlock(); u64_stats_update_begin(&ns->syncp); ns->tx_packets++; ns->tx_bytes += len; u64_stats_update_end(&ns->syncp); return NETDEV_TX_OK; out_drop_free: dev_kfree_skb(skb); out_drop_cnt: rcu_read_unlock(); u64_stats_update_begin(&ns->syncp); ns->tx_dropped++; u64_stats_update_end(&ns->syncp); return NETDEV_TX_OK; } static void nsim_set_rx_mode(struct net_device *dev) { } static int nsim_change_mtu(struct net_device *dev, int new_mtu) { struct netdevsim *ns = netdev_priv(dev); if (ns->xdp.prog && !ns->xdp.prog->aux->xdp_has_frags && new_mtu > NSIM_XDP_MAX_MTU) return -EBUSY; WRITE_ONCE(dev->mtu, new_mtu); return 0; } static void nsim_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct netdevsim *ns = netdev_priv(dev); unsigned int start; do { start = u64_stats_fetch_begin(&ns->syncp); stats->tx_bytes = ns->tx_bytes; stats->tx_packets = ns->tx_packets; stats->tx_dropped = ns->tx_dropped; } while (u64_stats_fetch_retry(&ns->syncp, start)); } static int nsim_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv) { return nsim_bpf_setup_tc_block_cb(type, type_data, cb_priv); } static int nsim_set_vf_mac(struct net_device *dev, int vf, u8 *mac) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; /* Only refuse multicast addresses, zero address can mean unset/any. */ if (vf >= nsim_dev_get_vfs(nsim_dev) || is_multicast_ether_addr(mac)) return -EINVAL; memcpy(nsim_dev->vfconfigs[vf].vf_mac, mac, ETH_ALEN); return 0; } static int nsim_set_vf_vlan(struct net_device *dev, int vf, u16 vlan, u8 qos, __be16 vlan_proto) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev) || vlan > 4095 || qos > 7) return -EINVAL; nsim_dev->vfconfigs[vf].vlan = vlan; nsim_dev->vfconfigs[vf].qos = qos; nsim_dev->vfconfigs[vf].vlan_proto = vlan_proto; return 0; } static int nsim_set_vf_rate(struct net_device *dev, int vf, int min, int max) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (nsim_esw_mode_is_switchdev(ns->nsim_dev)) { pr_err("Not supported in switchdev mode. Please use devlink API.\n"); return -EOPNOTSUPP; } if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; nsim_dev->vfconfigs[vf].min_tx_rate = min; nsim_dev->vfconfigs[vf].max_tx_rate = max; return 0; } static int nsim_set_vf_spoofchk(struct net_device *dev, int vf, bool val) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; nsim_dev->vfconfigs[vf].spoofchk_enabled = val; return 0; } static int nsim_set_vf_rss_query_en(struct net_device *dev, int vf, bool val) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; nsim_dev->vfconfigs[vf].rss_query_enabled = val; return 0; } static int nsim_set_vf_trust(struct net_device *dev, int vf, bool val) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; nsim_dev->vfconfigs[vf].trusted = val; return 0; } static int nsim_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivi) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; ivi->vf = vf; ivi->linkstate = nsim_dev->vfconfigs[vf].link_state; ivi->min_tx_rate = nsim_dev->vfconfigs[vf].min_tx_rate; ivi->max_tx_rate = nsim_dev->vfconfigs[vf].max_tx_rate; ivi->vlan = nsim_dev->vfconfigs[vf].vlan; ivi->vlan_proto = nsim_dev->vfconfigs[vf].vlan_proto; ivi->qos = nsim_dev->vfconfigs[vf].qos; memcpy(&ivi->mac, nsim_dev->vfconfigs[vf].vf_mac, ETH_ALEN); ivi->spoofchk = nsim_dev->vfconfigs[vf].spoofchk_enabled; ivi->trusted = nsim_dev->vfconfigs[vf].trusted; ivi->rss_query_en = nsim_dev->vfconfigs[vf].rss_query_enabled; return 0; } static int nsim_set_vf_link_state(struct net_device *dev, int vf, int state) { struct netdevsim *ns = netdev_priv(dev); struct nsim_dev *nsim_dev = ns->nsim_dev; if (vf >= nsim_dev_get_vfs(nsim_dev)) return -EINVAL; switch (state) { case IFLA_VF_LINK_STATE_AUTO: case IFLA_VF_LINK_STATE_ENABLE: case IFLA_VF_LINK_STATE_DISABLE: break; default: return -EINVAL; } nsim_dev->vfconfigs[vf].link_state = state; return 0; } static void nsim_taprio_stats(struct tc_taprio_qopt_stats *stats) { stats->window_drops = 0; stats->tx_overruns = 0; } static int nsim_setup_tc_taprio(struct net_device *dev, struct tc_taprio_qopt_offload *offload) { int err = 0; switch (offload->cmd) { case TAPRIO_CMD_REPLACE: case TAPRIO_CMD_DESTROY: break; case TAPRIO_CMD_STATS: nsim_taprio_stats(&offload->stats); break; default: err = -EOPNOTSUPP; } return err; } static LIST_HEAD(nsim_block_cb_list); static int nsim_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { struct netdevsim *ns = netdev_priv(dev); switch (type) { case TC_SETUP_QDISC_TAPRIO: return nsim_setup_tc_taprio(dev, type_data); case TC_SETUP_BLOCK: return flow_block_cb_setup_simple(type_data, &nsim_block_cb_list, nsim_setup_tc_block_cb, ns, ns, true); default: return -EOPNOTSUPP; } } static int nsim_set_features(struct net_device *dev, netdev_features_t features) { struct netdevsim *ns = netdev_priv(dev); if ((dev->features & NETIF_F_HW_TC) > (features & NETIF_F_HW_TC)) return nsim_bpf_disable_tc(ns); return 0; } static int nsim_get_iflink(const struct net_device *dev) { struct netdevsim *nsim, *peer; int iflink; nsim = netdev_priv(dev); rcu_read_lock(); peer = rcu_dereference(nsim->peer); iflink = peer ? READ_ONCE(peer->netdev->ifindex) : READ_ONCE(dev->ifindex); rcu_read_unlock(); return iflink; } static int nsim_rcv(struct nsim_rq *rq, int budget) { struct sk_buff *skb; int i; for (i = 0; i < budget; i++) { if (skb_queue_empty(&rq->skb_queue)) break; skb = skb_dequeue(&rq->skb_queue); skb_mark_napi_id(skb, &rq->napi); netif_receive_skb(skb); } return i; } static int nsim_poll(struct napi_struct *napi, int budget) { struct nsim_rq *rq = container_of(napi, struct nsim_rq, napi); int done; done = nsim_rcv(rq, budget); napi_complete(napi); return done; } static int nsim_create_page_pool(struct page_pool **p, struct napi_struct *napi) { struct page_pool_params params = { .order = 0, .pool_size = NSIM_RING_SIZE, .nid = NUMA_NO_NODE, .dev = &napi->dev->dev, .napi = napi, .dma_dir = DMA_BIDIRECTIONAL, .netdev = napi->dev, }; struct page_pool *pool; pool = page_pool_create(¶ms); if (IS_ERR(pool)) return PTR_ERR(pool); *p = pool; return 0; } static int nsim_init_napi(struct netdevsim *ns) { struct net_device *dev = ns->netdev; struct nsim_rq *rq; int err, i; for (i = 0; i < dev->num_rx_queues; i++) { rq = ns->rq[i]; netif_napi_add_config_locked(dev, &rq->napi, nsim_poll, i); } for (i = 0; i < dev->num_rx_queues; i++) { rq = ns->rq[i]; err = nsim_create_page_pool(&rq->page_pool, &rq->napi); if (err) goto err_pp_destroy; } return 0; err_pp_destroy: while (i--) { page_pool_destroy(ns->rq[i]->page_pool); ns->rq[i]->page_pool = NULL; } for (i = 0; i < dev->num_rx_queues; i++) __netif_napi_del_locked(&ns->rq[i]->napi); return err; } static enum hrtimer_restart nsim_napi_schedule(struct hrtimer *timer) { struct nsim_rq *rq; rq = container_of(timer, struct nsim_rq, napi_timer); napi_schedule(&rq->napi); return HRTIMER_NORESTART; } static void nsim_rq_timer_init(struct nsim_rq *rq) { hrtimer_setup(&rq->napi_timer, nsim_napi_schedule, CLOCK_MONOTONIC, HRTIMER_MODE_REL); } static void nsim_enable_napi(struct netdevsim *ns) { struct net_device *dev = ns->netdev; int i; for (i = 0; i < dev->num_rx_queues; i++) { struct nsim_rq *rq = ns->rq[i]; netif_queue_set_napi(dev, i, NETDEV_QUEUE_TYPE_RX, &rq->napi); napi_enable_locked(&rq->napi); } } static int nsim_open(struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); int err; netdev_assert_locked(dev); err = nsim_init_napi(ns); if (err) return err; nsim_enable_napi(ns); return 0; } static void nsim_del_napi(struct netdevsim *ns) { struct net_device *dev = ns->netdev; int i; for (i = 0; i < dev->num_rx_queues; i++) { struct nsim_rq *rq = ns->rq[i]; napi_disable_locked(&rq->napi); __netif_napi_del_locked(&rq->napi); } synchronize_net(); for (i = 0; i < dev->num_rx_queues; i++) { page_pool_destroy(ns->rq[i]->page_pool); ns->rq[i]->page_pool = NULL; } } static int nsim_stop(struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); struct netdevsim *peer; netdev_assert_locked(dev); netif_carrier_off(dev); peer = rtnl_dereference(ns->peer); if (peer) netif_carrier_off(peer->netdev); nsim_del_napi(ns); return 0; } static int nsim_shaper_set(struct net_shaper_binding *binding, const struct net_shaper *shaper, struct netlink_ext_ack *extack) { return 0; } static int nsim_shaper_del(struct net_shaper_binding *binding, const struct net_shaper_handle *handle, struct netlink_ext_ack *extack) { return 0; } static int nsim_shaper_group(struct net_shaper_binding *binding, int leaves_count, const struct net_shaper *leaves, const struct net_shaper *root, struct netlink_ext_ack *extack) { return 0; } static void nsim_shaper_cap(struct net_shaper_binding *binding, enum net_shaper_scope scope, unsigned long *flags) { *flags = ULONG_MAX; } static const struct net_shaper_ops nsim_shaper_ops = { .set = nsim_shaper_set, .delete = nsim_shaper_del, .group = nsim_shaper_group, .capabilities = nsim_shaper_cap, }; static const struct net_device_ops nsim_netdev_ops = { .ndo_start_xmit = nsim_start_xmit, .ndo_set_rx_mode = nsim_set_rx_mode, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = nsim_change_mtu, .ndo_get_stats64 = nsim_get_stats64, .ndo_set_vf_mac = nsim_set_vf_mac, .ndo_set_vf_vlan = nsim_set_vf_vlan, .ndo_set_vf_rate = nsim_set_vf_rate, .ndo_set_vf_spoofchk = nsim_set_vf_spoofchk, .ndo_set_vf_trust = nsim_set_vf_trust, .ndo_get_vf_config = nsim_get_vf_config, .ndo_set_vf_link_state = nsim_set_vf_link_state, .ndo_set_vf_rss_query_en = nsim_set_vf_rss_query_en, .ndo_setup_tc = nsim_setup_tc, .ndo_set_features = nsim_set_features, .ndo_get_iflink = nsim_get_iflink, .ndo_bpf = nsim_bpf, .ndo_open = nsim_open, .ndo_stop = nsim_stop, .net_shaper_ops = &nsim_shaper_ops, }; static const struct net_device_ops nsim_vf_netdev_ops = { .ndo_start_xmit = nsim_start_xmit, .ndo_set_rx_mode = nsim_set_rx_mode, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = nsim_change_mtu, .ndo_get_stats64 = nsim_get_stats64, .ndo_setup_tc = nsim_setup_tc, .ndo_set_features = nsim_set_features, }; /* We don't have true per-queue stats, yet, so do some random fakery here. * Only report stuff for queue 0. */ static void nsim_get_queue_stats_rx(struct net_device *dev, int idx, struct netdev_queue_stats_rx *stats) { struct rtnl_link_stats64 rtstats = {}; if (!idx) nsim_get_stats64(dev, &rtstats); stats->packets = rtstats.rx_packets - !!rtstats.rx_packets; stats->bytes = rtstats.rx_bytes; } static void nsim_get_queue_stats_tx(struct net_device *dev, int idx, struct netdev_queue_stats_tx *stats) { struct rtnl_link_stats64 rtstats = {}; if (!idx) nsim_get_stats64(dev, &rtstats); stats->packets = rtstats.tx_packets - !!rtstats.tx_packets; stats->bytes = rtstats.tx_bytes; } static void nsim_get_base_stats(struct net_device *dev, struct netdev_queue_stats_rx *rx, struct netdev_queue_stats_tx *tx) { struct rtnl_link_stats64 rtstats = {}; nsim_get_stats64(dev, &rtstats); rx->packets = !!rtstats.rx_packets; rx->bytes = 0; tx->packets = !!rtstats.tx_packets; tx->bytes = 0; } static const struct netdev_stat_ops nsim_stat_ops = { .get_queue_stats_tx = nsim_get_queue_stats_tx, .get_queue_stats_rx = nsim_get_queue_stats_rx, .get_base_stats = nsim_get_base_stats, }; static struct nsim_rq *nsim_queue_alloc(void) { struct nsim_rq *rq; rq = kzalloc(sizeof(*rq), GFP_KERNEL_ACCOUNT); if (!rq) return NULL; skb_queue_head_init(&rq->skb_queue); nsim_rq_timer_init(rq); return rq; } static void nsim_queue_free(struct nsim_rq *rq) { hrtimer_cancel(&rq->napi_timer); skb_queue_purge_reason(&rq->skb_queue, SKB_DROP_REASON_QUEUE_PURGE); kfree(rq); } /* Queue reset mode is controlled by ns->rq_reset_mode. * - normal - new NAPI new pool (old NAPI enabled when new added) * - mode 1 - allocate new pool (NAPI is only disabled / enabled) * - mode 2 - new NAPI new pool (old NAPI removed before new added) * - mode 3 - new NAPI new pool (old NAPI disabled when new added) */ struct nsim_queue_mem { struct nsim_rq *rq; struct page_pool *pp; }; static int nsim_queue_mem_alloc(struct net_device *dev, void *per_queue_mem, int idx) { struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); int err; if (ns->rq_reset_mode > 3) return -EINVAL; if (ns->rq_reset_mode == 1) { if (!netif_running(ns->netdev)) return -ENETDOWN; return nsim_create_page_pool(&qmem->pp, &ns->rq[idx]->napi); } qmem->rq = nsim_queue_alloc(); if (!qmem->rq) return -ENOMEM; err = nsim_create_page_pool(&qmem->rq->page_pool, &qmem->rq->napi); if (err) goto err_free; if (!ns->rq_reset_mode) netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); return 0; err_free: nsim_queue_free(qmem->rq); return err; } static void nsim_queue_mem_free(struct net_device *dev, void *per_queue_mem) { struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); page_pool_destroy(qmem->pp); if (qmem->rq) { if (!ns->rq_reset_mode) netif_napi_del_locked(&qmem->rq->napi); page_pool_destroy(qmem->rq->page_pool); nsim_queue_free(qmem->rq); } } static int nsim_queue_start(struct net_device *dev, void *per_queue_mem, int idx) { struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); netdev_assert_locked(dev); if (ns->rq_reset_mode == 1) { ns->rq[idx]->page_pool = qmem->pp; napi_enable_locked(&ns->rq[idx]->napi); return 0; } /* netif_napi_add()/_del() should normally be called from alloc/free, * here we want to test various call orders. */ if (ns->rq_reset_mode == 2) { netif_napi_del_locked(&ns->rq[idx]->napi); netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); } else if (ns->rq_reset_mode == 3) { netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); netif_napi_del_locked(&ns->rq[idx]->napi); } ns->rq[idx] = qmem->rq; napi_enable_locked(&ns->rq[idx]->napi); return 0; } static int nsim_queue_stop(struct net_device *dev, void *per_queue_mem, int idx) { struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); netdev_assert_locked(dev); napi_disable_locked(&ns->rq[idx]->napi); if (ns->rq_reset_mode == 1) { qmem->pp = ns->rq[idx]->page_pool; page_pool_disable_direct_recycling(qmem->pp); } else { qmem->rq = ns->rq[idx]; } return 0; } static const struct netdev_queue_mgmt_ops nsim_queue_mgmt_ops = { .ndo_queue_mem_size = sizeof(struct nsim_queue_mem), .ndo_queue_mem_alloc = nsim_queue_mem_alloc, .ndo_queue_mem_free = nsim_queue_mem_free, .ndo_queue_start = nsim_queue_start, .ndo_queue_stop = nsim_queue_stop, }; static ssize_t nsim_qreset_write(struct file *file, const char __user *data, size_t count, loff_t *ppos) { struct netdevsim *ns = file->private_data; unsigned int queue, mode; char buf[32]; ssize_t ret; if (count >= sizeof(buf)) return -EINVAL; if (copy_from_user(buf, data, count)) return -EFAULT; buf[count] = '\0'; ret = sscanf(buf, "%u %u", &queue, &mode); if (ret != 2) return -EINVAL; netdev_lock(ns->netdev); if (queue >= ns->netdev->real_num_rx_queues) { ret = -EINVAL; goto exit_unlock; } ns->rq_reset_mode = mode; ret = netdev_rx_queue_restart(ns->netdev, queue); ns->rq_reset_mode = 0; if (ret) goto exit_unlock; ret = count; exit_unlock: netdev_unlock(ns->netdev); return ret; } static const struct file_operations nsim_qreset_fops = { .open = simple_open, .write = nsim_qreset_write, .owner = THIS_MODULE, }; static ssize_t nsim_pp_hold_read(struct file *file, char __user *data, size_t count, loff_t *ppos) { struct netdevsim *ns = file->private_data; char buf[3] = "n\n"; if (ns->page) buf[0] = 'y'; return simple_read_from_buffer(data, count, ppos, buf, 2); } static ssize_t nsim_pp_hold_write(struct file *file, const char __user *data, size_t count, loff_t *ppos) { struct netdevsim *ns = file->private_data; ssize_t ret; bool val; ret = kstrtobool_from_user(data, count, &val); if (ret) return ret; rtnl_lock(); ret = count; if (val == !!ns->page) goto exit; if (!netif_running(ns->netdev) && val) { ret = -ENETDOWN; } else if (val) { ns->page = page_pool_dev_alloc_pages(ns->rq[0]->page_pool); if (!ns->page) ret = -ENOMEM; } else { page_pool_put_full_page(ns->page->pp, ns->page, false); ns->page = NULL; } exit: rtnl_unlock(); return ret; } static const struct file_operations nsim_pp_hold_fops = { .open = simple_open, .read = nsim_pp_hold_read, .write = nsim_pp_hold_write, .llseek = generic_file_llseek, .owner = THIS_MODULE, }; static void nsim_setup(struct net_device *dev) { ether_setup(dev); eth_hw_addr_random(dev); dev->tx_queue_len = 0; dev->flags &= ~IFF_MULTICAST; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE; dev->features |= NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | NETIF_F_LRO | NETIF_F_TSO; dev->hw_features |= NETIF_F_HW_TC | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | NETIF_F_LRO | NETIF_F_TSO; dev->max_mtu = ETH_MAX_MTU; dev->xdp_features = NETDEV_XDP_ACT_HW_OFFLOAD; } static int nsim_queue_init(struct netdevsim *ns) { struct net_device *dev = ns->netdev; int i; ns->rq = kcalloc(dev->num_rx_queues, sizeof(*ns->rq), GFP_KERNEL_ACCOUNT); if (!ns->rq) return -ENOMEM; for (i = 0; i < dev->num_rx_queues; i++) { ns->rq[i] = nsim_queue_alloc(); if (!ns->rq[i]) goto err_free_prev; } return 0; err_free_prev: while (i--) kfree(ns->rq[i]); kfree(ns->rq); return -ENOMEM; } static void nsim_queue_uninit(struct netdevsim *ns) { struct net_device *dev = ns->netdev; int i; for (i = 0; i < dev->num_rx_queues; i++) nsim_queue_free(ns->rq[i]); kfree(ns->rq); ns->rq = NULL; } static int nsim_init_netdevsim(struct netdevsim *ns) { struct mock_phc *phc; int err; phc = mock_phc_create(&ns->nsim_bus_dev->dev); if (IS_ERR(phc)) return PTR_ERR(phc); ns->phc = phc; ns->netdev->netdev_ops = &nsim_netdev_ops; ns->netdev->stat_ops = &nsim_stat_ops; ns->netdev->queue_mgmt_ops = &nsim_queue_mgmt_ops; netdev_lockdep_set_classes(ns->netdev); err = nsim_udp_tunnels_info_create(ns->nsim_dev, ns->netdev); if (err) goto err_phc_destroy; rtnl_lock(); err = nsim_queue_init(ns); if (err) goto err_utn_destroy; err = nsim_bpf_init(ns); if (err) goto err_rq_destroy; nsim_macsec_init(ns); nsim_ipsec_init(ns); err = register_netdevice(ns->netdev); if (err) goto err_ipsec_teardown; rtnl_unlock(); if (IS_ENABLED(CONFIG_DEBUG_NET)) { ns->nb.notifier_call = netdev_debug_event; if (register_netdevice_notifier_dev_net(ns->netdev, &ns->nb, &ns->nn)) ns->nb.notifier_call = NULL; } return 0; err_ipsec_teardown: nsim_ipsec_teardown(ns); nsim_macsec_teardown(ns); nsim_bpf_uninit(ns); err_rq_destroy: nsim_queue_uninit(ns); err_utn_destroy: rtnl_unlock(); nsim_udp_tunnels_info_destroy(ns->netdev); err_phc_destroy: mock_phc_destroy(ns->phc); return err; } static int nsim_init_netdevsim_vf(struct netdevsim *ns) { int err; ns->netdev->netdev_ops = &nsim_vf_netdev_ops; rtnl_lock(); err = register_netdevice(ns->netdev); rtnl_unlock(); return err; } static void nsim_exit_netdevsim(struct netdevsim *ns) { nsim_udp_tunnels_info_destroy(ns->netdev); mock_phc_destroy(ns->phc); } struct netdevsim * nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port) { struct net_device *dev; struct netdevsim *ns; int err; dev = alloc_netdev_mq(sizeof(*ns), "eth%d", NET_NAME_UNKNOWN, nsim_setup, nsim_dev->nsim_bus_dev->num_queues); if (!dev) return ERR_PTR(-ENOMEM); dev_net_set(dev, nsim_dev_net(nsim_dev)); ns = netdev_priv(dev); ns->netdev = dev; u64_stats_init(&ns->syncp); ns->nsim_dev = nsim_dev; ns->nsim_dev_port = nsim_dev_port; ns->nsim_bus_dev = nsim_dev->nsim_bus_dev; SET_NETDEV_DEV(dev, &ns->nsim_bus_dev->dev); SET_NETDEV_DEVLINK_PORT(dev, &nsim_dev_port->devlink_port); nsim_ethtool_init(ns); if (nsim_dev_port_is_pf(nsim_dev_port)) err = nsim_init_netdevsim(ns); else err = nsim_init_netdevsim_vf(ns); if (err) goto err_free_netdev; ns->pp_dfs = debugfs_create_file("pp_hold", 0600, nsim_dev_port->ddir, ns, &nsim_pp_hold_fops); ns->qr_dfs = debugfs_create_file("queue_reset", 0200, nsim_dev_port->ddir, ns, &nsim_qreset_fops); return ns; err_free_netdev: free_netdev(dev); return ERR_PTR(err); } void nsim_destroy(struct netdevsim *ns) { struct net_device *dev = ns->netdev; struct netdevsim *peer; debugfs_remove(ns->qr_dfs); debugfs_remove(ns->pp_dfs); if (ns->nb.notifier_call) unregister_netdevice_notifier_dev_net(ns->netdev, &ns->nb, &ns->nn); rtnl_lock(); peer = rtnl_dereference(ns->peer); if (peer) RCU_INIT_POINTER(peer->peer, NULL); RCU_INIT_POINTER(ns->peer, NULL); unregister_netdevice(dev); if (nsim_dev_port_is_pf(ns->nsim_dev_port)) { nsim_macsec_teardown(ns); nsim_ipsec_teardown(ns); nsim_bpf_uninit(ns); nsim_queue_uninit(ns); } rtnl_unlock(); if (nsim_dev_port_is_pf(ns->nsim_dev_port)) nsim_exit_netdevsim(ns); /* Put this intentionally late to exercise the orphaning path */ if (ns->page) { page_pool_put_full_page(ns->page->pp, ns->page, false); ns->page = NULL; } free_netdev(dev); } bool netdev_is_nsim(struct net_device *dev) { return dev->netdev_ops == &nsim_netdev_ops; } static int nsim_validate(struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { NL_SET_ERR_MSG_MOD(extack, "Please use: echo \"[ID] [PORT_COUNT] [NUM_QUEUES]\" > /sys/bus/netdevsim/new_device"); return -EOPNOTSUPP; } static struct rtnl_link_ops nsim_link_ops __read_mostly = { .kind = DRV_NAME, .validate = nsim_validate, }; static int __init nsim_module_init(void) { int err; err = nsim_dev_init(); if (err) return err; err = nsim_bus_init(); if (err) goto err_dev_exit; err = rtnl_link_register(&nsim_link_ops); if (err) goto err_bus_exit; return 0; err_bus_exit: nsim_bus_exit(); err_dev_exit: nsim_dev_exit(); return err; } static void __exit nsim_module_exit(void) { rtnl_link_unregister(&nsim_link_ops); nsim_bus_exit(); nsim_dev_exit(); } module_init(nsim_module_init); module_exit(nsim_module_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Simulated networking device for testing"); MODULE_ALIAS_RTNL_LINK(DRV_NAME); |
| 363 364 362 363 509 511 508 510 509 510 362 448 450 361 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 | // SPDX-License-Identifier: GPL-2.0 /* * This file contains functions which manage clock event devices. * * Copyright(C) 2005-2006, Thomas Gleixner <tglx@linutronix.de> * Copyright(C) 2005-2007, Red Hat, Inc., Ingo Molnar * Copyright(C) 2006-2007, Timesys Corp., Thomas Gleixner */ #include <linux/clockchips.h> #include <linux/hrtimer.h> #include <linux/init.h> #include <linux/module.h> #include <linux/smp.h> #include <linux/device.h> #include "tick-internal.h" /* The registered clock event devices */ static LIST_HEAD(clockevent_devices); static LIST_HEAD(clockevents_released); /* Protection for the above */ static DEFINE_RAW_SPINLOCK(clockevents_lock); /* Protection for unbind operations */ static DEFINE_MUTEX(clockevents_mutex); struct ce_unbind { struct clock_event_device *ce; int res; }; static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt, bool ismax) { u64 clc = (u64) latch << evt->shift; u64 rnd; if (WARN_ON(!evt->mult)) evt->mult = 1; rnd = (u64) evt->mult - 1; /* * Upper bound sanity check. If the backwards conversion is * not equal latch, we know that the above shift overflowed. */ if ((clc >> evt->shift) != (u64)latch) clc = ~0ULL; /* * Scaled math oddities: * * For mult <= (1 << shift) we can safely add mult - 1 to * prevent integer rounding loss. So the backwards conversion * from nsec to device ticks will be correct. * * For mult > (1 << shift), i.e. device frequency is > 1GHz we * need to be careful. Adding mult - 1 will result in a value * which when converted back to device ticks can be larger * than latch by up to (mult - 1) >> shift. For the min_delta * calculation we still want to apply this in order to stay * above the minimum device ticks limit. For the upper limit * we would end up with a latch value larger than the upper * limit of the device, so we omit the add to stay below the * device upper boundary. * * Also omit the add if it would overflow the u64 boundary. */ if ((~0ULL - clc > rnd) && (!ismax || evt->mult <= (1ULL << evt->shift))) clc += rnd; do_div(clc, evt->mult); /* Deltas less than 1usec are pointless noise */ return clc > 1000 ? clc : 1000; } /** * clockevent_delta2ns - Convert a latch value (device ticks) to nanoseconds * @latch: value to convert * @evt: pointer to clock event device descriptor * * Math helper, returns latch value converted to nanoseconds (bound checked) */ u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt) { return cev_delta2ns(latch, evt, false); } EXPORT_SYMBOL_GPL(clockevent_delta2ns); static int __clockevents_switch_state(struct clock_event_device *dev, enum clock_event_state state) { if (dev->features & CLOCK_EVT_FEAT_DUMMY) return 0; /* Transition with new state-specific callbacks */ switch (state) { case CLOCK_EVT_STATE_DETACHED: /* The clockevent device is getting replaced. Shut it down. */ case CLOCK_EVT_STATE_SHUTDOWN: if (dev->set_state_shutdown) return dev->set_state_shutdown(dev); return 0; case CLOCK_EVT_STATE_PERIODIC: /* Core internal bug */ if (!(dev->features & CLOCK_EVT_FEAT_PERIODIC)) return -ENOSYS; if (dev->set_state_periodic) return dev->set_state_periodic(dev); return 0; case CLOCK_EVT_STATE_ONESHOT: /* Core internal bug */ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT)) return -ENOSYS; if (dev->set_state_oneshot) return dev->set_state_oneshot(dev); return 0; case CLOCK_EVT_STATE_ONESHOT_STOPPED: /* Core internal bug */ if (WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n", clockevent_get_state(dev))) return -EINVAL; if (dev->set_state_oneshot_stopped) return dev->set_state_oneshot_stopped(dev); else return -ENOSYS; default: return -ENOSYS; } } /** * clockevents_switch_state - set the operating state of a clock event device * @dev: device to modify * @state: new state * * Must be called with interrupts disabled ! */ void clockevents_switch_state(struct clock_event_device *dev, enum clock_event_state state) { if (clockevent_get_state(dev) != state) { if (__clockevents_switch_state(dev, state)) return; clockevent_set_state(dev, state); /* * A nsec2cyc multiplicator of 0 is invalid and we'd crash * on it, so fix it up and emit a warning: */ if (clockevent_state_oneshot(dev)) { if (WARN_ON(!dev->mult)) dev->mult = 1; } } } /** * clockevents_shutdown - shutdown the device and clear next_event * @dev: device to shutdown */ void clockevents_shutdown(struct clock_event_device *dev) { clockevents_switch_state(dev, CLOCK_EVT_STATE_SHUTDOWN); dev->next_event = KTIME_MAX; } /** * clockevents_tick_resume - Resume the tick device before using it again * @dev: device to resume */ int clockevents_tick_resume(struct clock_event_device *dev) { int ret = 0; if (dev->tick_resume) ret = dev->tick_resume(dev); return ret; } #ifdef CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST /* Limit min_delta to a jiffy */ #define MIN_DELTA_LIMIT (NSEC_PER_SEC / HZ) /** * clockevents_increase_min_delta - raise minimum delta of a clock event device * @dev: device to increase the minimum delta * * Returns 0 on success, -ETIME when the minimum delta reached the limit. */ static int clockevents_increase_min_delta(struct clock_event_device *dev) { /* Nothing to do if we already reached the limit */ if (dev->min_delta_ns >= MIN_DELTA_LIMIT) { printk_deferred(KERN_WARNING "CE: Reprogramming failure. Giving up\n"); dev->next_event = KTIME_MAX; return -ETIME; } if (dev->min_delta_ns < 5000) dev->min_delta_ns = 5000; else dev->min_delta_ns += dev->min_delta_ns >> 1; if (dev->min_delta_ns > MIN_DELTA_LIMIT) dev->min_delta_ns = MIN_DELTA_LIMIT; printk_deferred(KERN_WARNING "CE: %s increased min_delta_ns to %llu nsec\n", dev->name ? dev->name : "?", (unsigned long long) dev->min_delta_ns); return 0; } /** * clockevents_program_min_delta - Set clock event device to the minimum delay. * @dev: device to program * * Returns 0 on success, -ETIME when the retry loop failed. */ static int clockevents_program_min_delta(struct clock_event_device *dev) { unsigned long long clc; int64_t delta; int i; for (i = 0;;) { delta = dev->min_delta_ns; dev->next_event = ktime_add_ns(ktime_get(), delta); if (clockevent_state_shutdown(dev)) return 0; dev->retries++; clc = ((unsigned long long) delta * dev->mult) >> dev->shift; if (dev->set_next_event((unsigned long) clc, dev) == 0) return 0; if (++i > 2) { /* * We tried 3 times to program the device with the * given min_delta_ns. Try to increase the minimum * delta, if that fails as well get out of here. */ if (clockevents_increase_min_delta(dev)) return -ETIME; i = 0; } } } #else /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */ /** * clockevents_program_min_delta - Set clock event device to the minimum delay. * @dev: device to program * * Returns 0 on success, -ETIME when the retry loop failed. */ static int clockevents_program_min_delta(struct clock_event_device *dev) { unsigned long long clc; int64_t delta = 0; int i; for (i = 0; i < 10; i++) { delta += dev->min_delta_ns; dev->next_event = ktime_add_ns(ktime_get(), delta); if (clockevent_state_shutdown(dev)) return 0; dev->retries++; clc = ((unsigned long long) delta * dev->mult) >> dev->shift; if (dev->set_next_event((unsigned long) clc, dev) == 0) return 0; } return -ETIME; } #endif /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */ /** * clockevents_program_event - Reprogram the clock event device. * @dev: device to program * @expires: absolute expiry time (monotonic clock) * @force: program minimum delay if expires can not be set * * Returns 0 on success, -ETIME when the event is in the past. */ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, bool force) { unsigned long long clc; int64_t delta; int rc; if (WARN_ON_ONCE(expires < 0)) return -ETIME; dev->next_event = expires; if (clockevent_state_shutdown(dev)) return 0; /* We must be in ONESHOT state here */ WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n", clockevent_get_state(dev)); /* Shortcut for clockevent devices that can deal with ktime. */ if (dev->features & CLOCK_EVT_FEAT_KTIME) return dev->set_next_ktime(expires, dev); delta = ktime_to_ns(ktime_sub(expires, ktime_get())); if (delta <= 0) return force ? clockevents_program_min_delta(dev) : -ETIME; delta = min(delta, (int64_t) dev->max_delta_ns); delta = max(delta, (int64_t) dev->min_delta_ns); clc = ((unsigned long long) delta * dev->mult) >> dev->shift; rc = dev->set_next_event((unsigned long) clc, dev); return (rc && force) ? clockevents_program_min_delta(dev) : rc; } /* * Called after a clockevent has been added which might * have replaced a current regular or broadcast device. A * released normal device might be a suitable replacement * for the current broadcast device. Similarly a released * broadcast device might be a suitable replacement for a * normal device. */ static void clockevents_notify_released(void) { struct clock_event_device *dev; /* * Keep iterating as long as tick_check_new_device() * replaces a device. */ while (!list_empty(&clockevents_released)) { dev = list_entry(clockevents_released.next, struct clock_event_device, list); list_move(&dev->list, &clockevent_devices); tick_check_new_device(dev); } } /* * Try to install a replacement clock event device */ static int clockevents_replace(struct clock_event_device *ced) { struct clock_event_device *dev, *newdev = NULL; list_for_each_entry(dev, &clockevent_devices, list) { if (dev == ced || !clockevent_state_detached(dev)) continue; if (!tick_check_replacement(newdev, dev)) continue; if (!try_module_get(dev->owner)) continue; if (newdev) module_put(newdev->owner); newdev = dev; } if (newdev) { tick_install_replacement(newdev); list_del_init(&ced->list); } return newdev ? 0 : -EBUSY; } /* * Called with clockevents_mutex and clockevents_lock held */ static int __clockevents_try_unbind(struct clock_event_device *ced, int cpu) { /* Fast track. Device is unused */ if (clockevent_state_detached(ced)) { list_del_init(&ced->list); return 0; } return ced == per_cpu(tick_cpu_device, cpu).evtdev ? -EAGAIN : -EBUSY; } /* * SMP function call to unbind a device */ static void __clockevents_unbind(void *arg) { struct ce_unbind *cu = arg; int res; raw_spin_lock(&clockevents_lock); res = __clockevents_try_unbind(cu->ce, smp_processor_id()); if (res == -EAGAIN) res = clockevents_replace(cu->ce); cu->res = res; raw_spin_unlock(&clockevents_lock); } /* * Issues smp function call to unbind a per cpu device. Called with * clockevents_mutex held. */ static int clockevents_unbind(struct clock_event_device *ced, int cpu) { struct ce_unbind cu = { .ce = ced, .res = -ENODEV }; smp_call_function_single(cpu, __clockevents_unbind, &cu, 1); return cu.res; } /* * Unbind a clockevents device. */ int clockevents_unbind_device(struct clock_event_device *ced, int cpu) { int ret; mutex_lock(&clockevents_mutex); ret = clockevents_unbind(ced, cpu); mutex_unlock(&clockevents_mutex); return ret; } EXPORT_SYMBOL_GPL(clockevents_unbind_device); /** * clockevents_register_device - register a clock event device * @dev: device to register */ void clockevents_register_device(struct clock_event_device *dev) { unsigned long flags; /* Initialize state to DETACHED */ clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED); if (!dev->cpumask) { WARN_ON(num_possible_cpus() > 1); dev->cpumask = cpumask_of(smp_processor_id()); } if (dev->cpumask == cpu_all_mask) { WARN(1, "%s cpumask == cpu_all_mask, using cpu_possible_mask instead\n", dev->name); dev->cpumask = cpu_possible_mask; } raw_spin_lock_irqsave(&clockevents_lock, flags); list_add(&dev->list, &clockevent_devices); tick_check_new_device(dev); clockevents_notify_released(); raw_spin_unlock_irqrestore(&clockevents_lock, flags); } EXPORT_SYMBOL_GPL(clockevents_register_device); static void clockevents_config(struct clock_event_device *dev, u32 freq) { u64 sec; if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT)) return; /* * Calculate the maximum number of seconds we can sleep. Limit * to 10 minutes for hardware which can program more than * 32bit ticks so we still get reasonable conversion values. */ sec = dev->max_delta_ticks; do_div(sec, freq); if (!sec) sec = 1; else if (sec > 600 && dev->max_delta_ticks > UINT_MAX) sec = 600; clockevents_calc_mult_shift(dev, freq, sec); dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false); dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true); } /** * clockevents_config_and_register - Configure and register a clock event device * @dev: device to register * @freq: The clock frequency * @min_delta: The minimum clock ticks to program in oneshot mode * @max_delta: The maximum clock ticks to program in oneshot mode * * min/max_delta can be 0 for devices which do not support oneshot mode. */ void clockevents_config_and_register(struct clock_event_device *dev, u32 freq, unsigned long min_delta, unsigned long max_delta) { dev->min_delta_ticks = min_delta; dev->max_delta_ticks = max_delta; clockevents_config(dev, freq); clockevents_register_device(dev); } EXPORT_SYMBOL_GPL(clockevents_config_and_register); int __clockevents_update_freq(struct clock_event_device *dev, u32 freq) { clockevents_config(dev, freq); if (clockevent_state_oneshot(dev)) return clockevents_program_event(dev, dev->next_event, false); if (clockevent_state_periodic(dev)) return __clockevents_switch_state(dev, CLOCK_EVT_STATE_PERIODIC); return 0; } /** * clockevents_update_freq - Update frequency and reprogram a clock event device. * @dev: device to modify * @freq: new device frequency * * Reconfigure and reprogram a clock event device in oneshot * mode. Must be called on the cpu for which the device delivers per * cpu timer events. If called for the broadcast device the core takes * care of serialization. * * Returns 0 on success, -ETIME when the event is in the past. */ int clockevents_update_freq(struct clock_event_device *dev, u32 freq) { unsigned long flags; int ret; local_irq_save(flags); ret = tick_broadcast_update_freq(dev, freq); if (ret == -ENODEV) ret = __clockevents_update_freq(dev, freq); local_irq_restore(flags); return ret; } /* * Noop handler when we shut down an event device */ void clockevents_handle_noop(struct clock_event_device *dev) { } /** * clockevents_exchange_device - release and request clock devices * @old: device to release (can be NULL) * @new: device to request (can be NULL) * * Called from various tick functions with clockevents_lock held and * interrupts disabled. */ void clockevents_exchange_device(struct clock_event_device *old, struct clock_event_device *new) { /* * Caller releases a clock event device. We queue it into the * released list and do a notify add later. */ if (old) { module_put(old->owner); clockevents_switch_state(old, CLOCK_EVT_STATE_DETACHED); list_move(&old->list, &clockevents_released); } if (new) { BUG_ON(!clockevent_state_detached(new)); clockevents_shutdown(new); } } /** * clockevents_suspend - suspend clock devices */ void clockevents_suspend(void) { struct clock_event_device *dev; list_for_each_entry_reverse(dev, &clockevent_devices, list) if (dev->suspend && !clockevent_state_detached(dev)) dev->suspend(dev); } /** * clockevents_resume - resume clock devices */ void clockevents_resume(void) { struct clock_event_device *dev; list_for_each_entry(dev, &clockevent_devices, list) if (dev->resume && !clockevent_state_detached(dev)) dev->resume(dev); } #ifdef CONFIG_HOTPLUG_CPU /** * tick_offline_cpu - Shutdown all clock events related * to this CPU and take it out of the * broadcast mechanism. * @cpu: The outgoing CPU * * Called by the dying CPU during teardown. */ void tick_offline_cpu(unsigned int cpu) { struct clock_event_device *dev, *tmp; raw_spin_lock(&clockevents_lock); tick_broadcast_offline(cpu); tick_shutdown(cpu); /* * Unregister the clock event devices which were * released above. */ list_for_each_entry_safe(dev, tmp, &clockevents_released, list) list_del(&dev->list); /* * Now check whether the CPU has left unused per cpu devices */ list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) { if (cpumask_test_cpu(cpu, dev->cpumask) && cpumask_weight(dev->cpumask) == 1 && !tick_is_broadcast_device(dev)) { BUG_ON(!clockevent_state_detached(dev)); list_del(&dev->list); } } raw_spin_unlock(&clockevents_lock); } #endif #ifdef CONFIG_SYSFS static const struct bus_type clockevents_subsys = { .name = "clockevents", .dev_name = "clockevent", }; static DEFINE_PER_CPU(struct device, tick_percpu_dev); static struct tick_device *tick_get_tick_dev(struct device *dev); static ssize_t current_device_show(struct device *dev, struct device_attribute *attr, char *buf) { struct tick_device *td; ssize_t count = 0; raw_spin_lock_irq(&clockevents_lock); td = tick_get_tick_dev(dev); if (td && td->evtdev) count = sysfs_emit(buf, "%s\n", td->evtdev->name); raw_spin_unlock_irq(&clockevents_lock); return count; } static DEVICE_ATTR_RO(current_device); /* We don't support the abomination of removable broadcast devices */ static ssize_t unbind_device_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { char name[CS_NAME_LEN]; ssize_t ret = sysfs_get_uname(buf, name, count); struct clock_event_device *ce = NULL, *iter; if (ret < 0) return ret; ret = -ENODEV; mutex_lock(&clockevents_mutex); raw_spin_lock_irq(&clockevents_lock); list_for_each_entry(iter, &clockevent_devices, list) { if (!strcmp(iter->name, name)) { ret = __clockevents_try_unbind(iter, dev->id); ce = iter; break; } } raw_spin_unlock_irq(&clockevents_lock); /* * We hold clockevents_mutex, so ce can't go away */ if (ret == -EAGAIN) ret = clockevents_unbind(ce, dev->id); mutex_unlock(&clockevents_mutex); return ret ? ret : count; } static DEVICE_ATTR_WO(unbind_device); #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST static struct device tick_bc_dev = { .init_name = "broadcast", .id = 0, .bus = &clockevents_subsys, }; static struct tick_device *tick_get_tick_dev(struct device *dev) { return dev == &tick_bc_dev ? tick_get_broadcast_device() : &per_cpu(tick_cpu_device, dev->id); } static __init int tick_broadcast_init_sysfs(void) { int err = device_register(&tick_bc_dev); if (!err) err = device_create_file(&tick_bc_dev, &dev_attr_current_device); return err; } #else static struct tick_device *tick_get_tick_dev(struct device *dev) { return &per_cpu(tick_cpu_device, dev->id); } static inline int tick_broadcast_init_sysfs(void) { return 0; } #endif static int __init tick_init_sysfs(void) { int cpu; for_each_possible_cpu(cpu) { struct device *dev = &per_cpu(tick_percpu_dev, cpu); int err; dev->id = cpu; dev->bus = &clockevents_subsys; err = device_register(dev); if (!err) err = device_create_file(dev, &dev_attr_current_device); if (!err) err = device_create_file(dev, &dev_attr_unbind_device); if (err) return err; } return tick_broadcast_init_sysfs(); } static int __init clockevents_init_sysfs(void) { int err = subsys_system_register(&clockevents_subsys, NULL); if (!err) err = tick_init_sysfs(); return err; } device_initcall(clockevents_init_sysfs); #endif /* SYSFS */ |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 | // SPDX-License-Identifier: GPL-2.0+ /* * Driver for Freecom USB/IDE adaptor * * Freecom v0.1: * * First release * * Current development and maintenance by: * (C) 2000 David Brown <usb-storage@davidb.org> * * This driver was developed with information provided in FREECOM's USB * Programmers Reference Guide. For further information contact Freecom * (https://www.freecom.de/) */ #include <linux/module.h> #include <scsi/scsi.h> #include <scsi/scsi_cmnd.h> #include "usb.h" #include "transport.h" #include "protocol.h" #include "debug.h" #include "scsiglue.h" #define DRV_NAME "ums-freecom" MODULE_DESCRIPTION("Driver for Freecom USB/IDE adaptor"); MODULE_AUTHOR("David Brown <usb-storage@davidb.org>"); MODULE_LICENSE("GPL"); MODULE_IMPORT_NS("USB_STORAGE"); #ifdef CONFIG_USB_STORAGE_DEBUG static void pdump(struct us_data *us, void *ibuffer, int length); #endif /* Bits of HD_STATUS */ #define ERR_STAT 0x01 #define DRQ_STAT 0x08 /* All of the outgoing packets are 64 bytes long. */ struct freecom_cb_wrap { u8 Type; /* Command type. */ u8 Timeout; /* Timeout in seconds. */ u8 Atapi[12]; /* An ATAPI packet. */ u8 Filler[50]; /* Padding Data. */ }; struct freecom_xfer_wrap { u8 Type; /* Command type. */ u8 Timeout; /* Timeout in seconds. */ __le32 Count; /* Number of bytes to transfer. */ u8 Pad[58]; } __attribute__ ((packed)); struct freecom_ide_out { u8 Type; /* Type + IDE register. */ u8 Pad; __le16 Value; /* Value to write. */ u8 Pad2[60]; }; struct freecom_ide_in { u8 Type; /* Type | IDE register. */ u8 Pad[63]; }; struct freecom_status { u8 Status; u8 Reason; __le16 Count; u8 Pad[60]; }; /* * Freecom stuffs the interrupt status in the INDEX_STAT bit of the ide * register. */ #define FCM_INT_STATUS 0x02 /* INDEX_STAT */ #define FCM_STATUS_BUSY 0x80 /* * These are the packet types. The low bit indicates that this command * should wait for an interrupt. */ #define FCM_PACKET_ATAPI 0x21 #define FCM_PACKET_STATUS 0x20 /* * Receive data from the IDE interface. The ATAPI packet has already * waited, so the data should be immediately available. */ #define FCM_PACKET_INPUT 0x81 /* Send data to the IDE interface. */ #define FCM_PACKET_OUTPUT 0x01 /* * Write a value to an ide register. Or the ide register to write after * munging the address a bit. */ #define FCM_PACKET_IDE_WRITE 0x40 #define FCM_PACKET_IDE_READ 0xC0 /* All packets (except for status) are 64 bytes long. */ #define FCM_PACKET_LENGTH 64 #define FCM_STATUS_PACKET_LENGTH 4 static int init_freecom(struct us_data *us); /* * The table of devices */ #define UNUSUAL_DEV(id_vendor, id_product, bcdDeviceMin, bcdDeviceMax, \ vendorName, productName, useProtocol, useTransport, \ initFunction, flags) \ { USB_DEVICE_VER(id_vendor, id_product, bcdDeviceMin, bcdDeviceMax), \ .driver_info = (flags) } static const struct usb_device_id freecom_usb_ids[] = { # include "unusual_freecom.h" { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, freecom_usb_ids); #undef UNUSUAL_DEV /* * The flags table */ #define UNUSUAL_DEV(idVendor, idProduct, bcdDeviceMin, bcdDeviceMax, \ vendor_name, product_name, use_protocol, use_transport, \ init_function, Flags) \ { \ .vendorName = vendor_name, \ .productName = product_name, \ .useProtocol = use_protocol, \ .useTransport = use_transport, \ .initFunction = init_function, \ } static const struct us_unusual_dev freecom_unusual_dev_list[] = { # include "unusual_freecom.h" { } /* Terminating entry */ }; #undef UNUSUAL_DEV static int freecom_readdata (struct scsi_cmnd *srb, struct us_data *us, unsigned int ipipe, unsigned int opipe, int count) { struct freecom_xfer_wrap *fxfr = (struct freecom_xfer_wrap *) us->iobuf; int result; fxfr->Type = FCM_PACKET_INPUT | 0x00; fxfr->Timeout = 0; /* Short timeout for debugging. */ fxfr->Count = cpu_to_le32 (count); memset (fxfr->Pad, 0, sizeof (fxfr->Pad)); usb_stor_dbg(us, "Read data Freecom! (c=%d)\n", count); /* Issue the transfer command. */ result = usb_stor_bulk_transfer_buf (us, opipe, fxfr, FCM_PACKET_LENGTH, NULL); if (result != USB_STOR_XFER_GOOD) { usb_stor_dbg(us, "Freecom readdata transport error\n"); return USB_STOR_TRANSPORT_ERROR; } /* Now transfer all of our blocks. */ usb_stor_dbg(us, "Start of read\n"); result = usb_stor_bulk_srb(us, ipipe, srb); usb_stor_dbg(us, "freecom_readdata done!\n"); if (result > USB_STOR_XFER_SHORT) return USB_STOR_TRANSPORT_ERROR; return USB_STOR_TRANSPORT_GOOD; } static int freecom_writedata (struct scsi_cmnd *srb, struct us_data *us, int unsigned ipipe, unsigned int opipe, int count) { struct freecom_xfer_wrap *fxfr = (struct freecom_xfer_wrap *) us->iobuf; int result; fxfr->Type = FCM_PACKET_OUTPUT | 0x00; fxfr->Timeout = 0; /* Short timeout for debugging. */ fxfr->Count = cpu_to_le32 (count); memset (fxfr->Pad, 0, sizeof (fxfr->Pad)); usb_stor_dbg(us, "Write data Freecom! (c=%d)\n", count); /* Issue the transfer command. */ result = usb_stor_bulk_transfer_buf (us, opipe, fxfr, FCM_PACKET_LENGTH, NULL); if (result != USB_STOR_XFER_GOOD) { usb_stor_dbg(us, "Freecom writedata transport error\n"); return USB_STOR_TRANSPORT_ERROR; } /* Now transfer all of our blocks. */ usb_stor_dbg(us, "Start of write\n"); result = usb_stor_bulk_srb(us, opipe, srb); usb_stor_dbg(us, "freecom_writedata done!\n"); if (result > USB_STOR_XFER_SHORT) return USB_STOR_TRANSPORT_ERROR; return USB_STOR_TRANSPORT_GOOD; } /* * Transport for the Freecom USB/IDE adaptor. * */ static int freecom_transport(struct scsi_cmnd *srb, struct us_data *us) { struct freecom_cb_wrap *fcb; struct freecom_status *fst; unsigned int ipipe, opipe; /* We need both pipes. */ int result; unsigned int partial; int length; fcb = (struct freecom_cb_wrap *) us->iobuf; fst = (struct freecom_status *) us->iobuf; usb_stor_dbg(us, "Freecom TRANSPORT STARTED\n"); /* Get handles for both transports. */ opipe = us->send_bulk_pipe; ipipe = us->recv_bulk_pipe; /* The ATAPI Command always goes out first. */ fcb->Type = FCM_PACKET_ATAPI | 0x00; fcb->Timeout = 0; memcpy (fcb->Atapi, srb->cmnd, 12); memset (fcb->Filler, 0, sizeof (fcb->Filler)); US_DEBUG(pdump(us, srb->cmnd, 12)); /* Send it out. */ result = usb_stor_bulk_transfer_buf (us, opipe, fcb, FCM_PACKET_LENGTH, NULL); /* * The Freecom device will only fail if there is something wrong in * USB land. It returns the status in its own registers, which * come back in the bulk pipe. */ if (result != USB_STOR_XFER_GOOD) { usb_stor_dbg(us, "freecom transport error\n"); return USB_STOR_TRANSPORT_ERROR; } /* * There are times we can optimize out this status read, but it * doesn't hurt us to always do it now. */ result = usb_stor_bulk_transfer_buf (us, ipipe, fst, FCM_STATUS_PACKET_LENGTH, &partial); usb_stor_dbg(us, "foo Status result %d %u\n", result, partial); if (result != USB_STOR_XFER_GOOD) return USB_STOR_TRANSPORT_ERROR; US_DEBUG(pdump(us, (void *)fst, partial)); /* * The firmware will time-out commands after 20 seconds. Some commands * can legitimately take longer than this, so we use a different * command that only waits for the interrupt and then sends status, * without having to send a new ATAPI command to the device. * * NOTE: There is some indication that a data transfer after a timeout * may not work, but that is a condition that should never happen. */ while (fst->Status & FCM_STATUS_BUSY) { usb_stor_dbg(us, "20 second USB/ATAPI bridge TIMEOUT occurred!\n"); usb_stor_dbg(us, "fst->Status is %x\n", fst->Status); /* Get the status again */ fcb->Type = FCM_PACKET_STATUS; fcb->Timeout = 0; memset (fcb->Atapi, 0, sizeof(fcb->Atapi)); memset (fcb->Filler, 0, sizeof (fcb->Filler)); /* Send it out. */ result = usb_stor_bulk_transfer_buf (us, opipe, fcb, FCM_PACKET_LENGTH, NULL); /* * The Freecom device will only fail if there is something * wrong in USB land. It returns the status in its own * registers, which come back in the bulk pipe. */ if (result != USB_STOR_XFER_GOOD) { usb_stor_dbg(us, "freecom transport error\n"); return USB_STOR_TRANSPORT_ERROR; } /* get the data */ result = usb_stor_bulk_transfer_buf (us, ipipe, fst, FCM_STATUS_PACKET_LENGTH, &partial); usb_stor_dbg(us, "bar Status result %d %u\n", result, partial); if (result != USB_STOR_XFER_GOOD) return USB_STOR_TRANSPORT_ERROR; US_DEBUG(pdump(us, (void *)fst, partial)); } if (partial != 4) return USB_STOR_TRANSPORT_ERROR; if ((fst->Status & 1) != 0) { usb_stor_dbg(us, "operation failed\n"); return USB_STOR_TRANSPORT_FAILED; } /* * The device might not have as much data available as we * requested. If you ask for more than the device has, this reads * and such will hang. */ usb_stor_dbg(us, "Device indicates that it has %d bytes available\n", le16_to_cpu(fst->Count)); usb_stor_dbg(us, "SCSI requested %d\n", scsi_bufflen(srb)); /* Find the length we desire to read. */ switch (srb->cmnd[0]) { case INQUIRY: case REQUEST_SENSE: /* 16 or 18 bytes? spec says 18, lots of devices only have 16 */ case MODE_SENSE: case MODE_SENSE_10: length = le16_to_cpu(fst->Count); break; default: length = scsi_bufflen(srb); } /* verify that this amount is legal */ if (length > scsi_bufflen(srb)) { length = scsi_bufflen(srb); usb_stor_dbg(us, "Truncating request to match buffer length: %d\n", length); } /* * What we do now depends on what direction the data is supposed to * move in. */ switch (us->srb->sc_data_direction) { case DMA_FROM_DEVICE: /* catch bogus "read 0 length" case */ if (!length) break; /* * Make sure that the status indicates that the device * wants data as well. */ if ((fst->Status & DRQ_STAT) == 0 || (fst->Reason & 3) != 2) { usb_stor_dbg(us, "SCSI wants data, drive doesn't have any\n"); return USB_STOR_TRANSPORT_FAILED; } result = freecom_readdata (srb, us, ipipe, opipe, length); if (result != USB_STOR_TRANSPORT_GOOD) return result; usb_stor_dbg(us, "Waiting for status\n"); result = usb_stor_bulk_transfer_buf (us, ipipe, fst, FCM_PACKET_LENGTH, &partial); US_DEBUG(pdump(us, (void *)fst, partial)); if (partial != 4 || result > USB_STOR_XFER_SHORT) return USB_STOR_TRANSPORT_ERROR; if ((fst->Status & ERR_STAT) != 0) { usb_stor_dbg(us, "operation failed\n"); return USB_STOR_TRANSPORT_FAILED; } if ((fst->Reason & 3) != 3) { usb_stor_dbg(us, "Drive seems still hungry\n"); return USB_STOR_TRANSPORT_FAILED; } usb_stor_dbg(us, "Transfer happy\n"); break; case DMA_TO_DEVICE: /* catch bogus "write 0 length" case */ if (!length) break; /* * Make sure the status indicates that the device wants to * send us data. */ /* !!IMPLEMENT!! */ result = freecom_writedata (srb, us, ipipe, opipe, length); if (result != USB_STOR_TRANSPORT_GOOD) return result; usb_stor_dbg(us, "Waiting for status\n"); result = usb_stor_bulk_transfer_buf (us, ipipe, fst, FCM_PACKET_LENGTH, &partial); if (partial != 4 || result > USB_STOR_XFER_SHORT) return USB_STOR_TRANSPORT_ERROR; if ((fst->Status & ERR_STAT) != 0) { usb_stor_dbg(us, "operation failed\n"); return USB_STOR_TRANSPORT_FAILED; } if ((fst->Reason & 3) != 3) { usb_stor_dbg(us, "Drive seems still hungry\n"); return USB_STOR_TRANSPORT_FAILED; } usb_stor_dbg(us, "Transfer happy\n"); break; case DMA_NONE: /* Easy, do nothing. */ break; default: /* should never hit here -- filtered in usb.c */ usb_stor_dbg(us, "freecom unimplemented direction: %d\n", us->srb->sc_data_direction); /* Return fail, SCSI seems to handle this better. */ return USB_STOR_TRANSPORT_FAILED; } return USB_STOR_TRANSPORT_GOOD; } static int init_freecom(struct us_data *us) { int result; char *buffer = us->iobuf; /* * The DMA-mapped I/O buffer is 64 bytes long, just right for * all our packets. No need to allocate any extra buffer space. */ result = usb_stor_control_msg(us, us->recv_ctrl_pipe, 0x4c, 0xc0, 0x4346, 0x0, buffer, 0x20, 3*HZ); buffer[32] = '\0'; usb_stor_dbg(us, "String returned from FC init is: %s\n", buffer); /* * Special thanks to the people at Freecom for providing me with * this "magic sequence", which they use in their Windows and MacOS * drivers to make sure that all the attached perhiperals are * properly reset. */ /* send reset */ result = usb_stor_control_msg(us, us->send_ctrl_pipe, 0x4d, 0x40, 0x24d8, 0x0, NULL, 0x0, 3*HZ); usb_stor_dbg(us, "result from activate reset is %d\n", result); /* wait 250ms */ msleep(250); /* clear reset */ result = usb_stor_control_msg(us, us->send_ctrl_pipe, 0x4d, 0x40, 0x24f8, 0x0, NULL, 0x0, 3*HZ); usb_stor_dbg(us, "result from clear reset is %d\n", result); /* wait 3 seconds */ msleep(3 * 1000); return USB_STOR_TRANSPORT_GOOD; } static int usb_stor_freecom_reset(struct us_data *us) { printk (KERN_CRIT "freecom reset called\n"); /* We don't really have this feature. */ return FAILED; } #ifdef CONFIG_USB_STORAGE_DEBUG static void pdump(struct us_data *us, void *ibuffer, int length) { static char line[80]; int offset = 0; unsigned char *buffer = (unsigned char *) ibuffer; int i, j; int from, base; offset = 0; for (i = 0; i < length; i++) { if ((i & 15) == 0) { if (i > 0) { offset += sprintf (line+offset, " - "); for (j = i - 16; j < i; j++) { if (buffer[j] >= 32 && buffer[j] <= 126) line[offset++] = buffer[j]; else line[offset++] = '.'; } line[offset] = 0; usb_stor_dbg(us, "%s\n", line); offset = 0; } offset += sprintf (line+offset, "%08x:", i); } else if ((i & 7) == 0) { offset += sprintf (line+offset, " -"); } offset += sprintf (line+offset, " %02x", buffer[i] & 0xff); } /* Add the last "chunk" of data. */ from = (length - 1) % 16; base = ((length - 1) / 16) * 16; for (i = from + 1; i < 16; i++) offset += sprintf (line+offset, " "); if (from < 8) offset += sprintf (line+offset, " "); offset += sprintf (line+offset, " - "); for (i = 0; i <= from; i++) { if (buffer[base+i] >= 32 && buffer[base+i] <= 126) line[offset++] = buffer[base+i]; else line[offset++] = '.'; } line[offset] = 0; usb_stor_dbg(us, "%s\n", line); } #endif static struct scsi_host_template freecom_host_template; static int freecom_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct us_data *us; int result; result = usb_stor_probe1(&us, intf, id, (id - freecom_usb_ids) + freecom_unusual_dev_list, &freecom_host_template); if (result) return result; us->transport_name = "Freecom"; us->transport = freecom_transport; us->transport_reset = usb_stor_freecom_reset; us->max_lun = 0; result = usb_stor_probe2(us); return result; } static struct usb_driver freecom_driver = { .name = DRV_NAME, .probe = freecom_probe, .disconnect = usb_stor_disconnect, .suspend = usb_stor_suspend, .resume = usb_stor_resume, .reset_resume = usb_stor_reset_resume, .pre_reset = usb_stor_pre_reset, .post_reset = usb_stor_post_reset, .id_table = freecom_usb_ids, .soft_unbind = 1, .no_dynamic_id = 1, }; module_usb_stor_driver(freecom_driver, freecom_host_template, DRV_NAME); |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Driver for USB ethernet port of Conexant CX82310-based ADSL routers * Copyright (C) 2010 by Ondrej Zary * some parts inspired by the cxacru driver */ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/ethtool.h> #include <linux/workqueue.h> #include <linux/mii.h> #include <linux/usb.h> #include <linux/usb/usbnet.h> enum cx82310_cmd { CMD_START = 0x84, /* no effect? */ CMD_STOP = 0x85, /* no effect? */ CMD_GET_STATUS = 0x90, /* returns nothing? */ CMD_GET_MAC_ADDR = 0x91, /* read MAC address */ CMD_GET_LINK_STATUS = 0x92, /* not useful, link is always up */ CMD_ETHERNET_MODE = 0x99, /* unknown, needed during init */ }; enum cx82310_status { STATUS_UNDEFINED, STATUS_SUCCESS, STATUS_ERROR, STATUS_UNSUPPORTED, STATUS_UNIMPLEMENTED, STATUS_PARAMETER_ERROR, STATUS_DBG_LOOPBACK, }; #define CMD_PACKET_SIZE 64 #define CMD_TIMEOUT 100 #define CMD_REPLY_RETRY 5 #define CX82310_MTU 1514 #define CMD_EP 0x01 struct cx82310_priv { struct work_struct reenable_work; struct usbnet *dev; }; /* * execute control command * - optionally send some data (command parameters) * - optionally wait for the reply * - optionally read some data from the reply */ static int cx82310_cmd(struct usbnet *dev, enum cx82310_cmd cmd, bool reply, u8 *wdata, int wlen, u8 *rdata, int rlen) { int actual_len, retries, ret; struct usb_device *udev = dev->udev; u8 *buf = kzalloc(CMD_PACKET_SIZE, GFP_KERNEL); if (!buf) return -ENOMEM; /* create command packet */ buf[0] = cmd; if (wdata) memcpy(buf + 4, wdata, min_t(int, wlen, CMD_PACKET_SIZE - 4)); /* send command packet */ ret = usb_bulk_msg(udev, usb_sndbulkpipe(udev, CMD_EP), buf, CMD_PACKET_SIZE, &actual_len, CMD_TIMEOUT); if (ret < 0) { if (cmd != CMD_GET_LINK_STATUS) netdev_err(dev->net, "send command %#x: error %d\n", cmd, ret); goto end; } if (reply) { /* wait for reply, retry if it's empty */ for (retries = 0; retries < CMD_REPLY_RETRY; retries++) { ret = usb_bulk_msg(udev, usb_rcvbulkpipe(udev, CMD_EP), buf, CMD_PACKET_SIZE, &actual_len, CMD_TIMEOUT); if (ret < 0) { if (cmd != CMD_GET_LINK_STATUS) netdev_err(dev->net, "reply receive error %d\n", ret); goto end; } if (actual_len > 0) break; } if (actual_len == 0) { netdev_err(dev->net, "no reply to command %#x\n", cmd); ret = -EIO; goto end; } if (buf[0] != cmd) { netdev_err(dev->net, "got reply to command %#x, expected: %#x\n", buf[0], cmd); ret = -EIO; goto end; } if (buf[1] != STATUS_SUCCESS) { netdev_err(dev->net, "command %#x failed: %#x\n", cmd, buf[1]); ret = -EIO; goto end; } if (rdata) memcpy(rdata, buf + 4, min_t(int, rlen, CMD_PACKET_SIZE - 4)); } end: kfree(buf); return ret; } static int cx82310_enable_ethernet(struct usbnet *dev) { int ret = cx82310_cmd(dev, CMD_ETHERNET_MODE, true, "\x01", 1, NULL, 0); if (ret) netdev_err(dev->net, "unable to enable ethernet mode: %d\n", ret); return ret; } static void cx82310_reenable_work(struct work_struct *work) { struct cx82310_priv *priv = container_of(work, struct cx82310_priv, reenable_work); cx82310_enable_ethernet(priv->dev); } #define partial_len data[0] /* length of partial packet data */ #define partial_rem data[1] /* remaining (missing) data length */ #define partial_data data[2] /* partial packet data */ static int cx82310_bind(struct usbnet *dev, struct usb_interface *intf) { int ret; char buf[15]; struct usb_device *udev = dev->udev; u8 link[3]; int timeout = 50; struct cx82310_priv *priv; u8 addr[ETH_ALEN]; /* avoid ADSL modems - continue only if iProduct is "USB NET CARD" */ if (usb_string(udev, udev->descriptor.iProduct, buf, sizeof(buf)) > 0 && strcmp(buf, "USB NET CARD")) { dev_info(&udev->dev, "ignoring: probably an ADSL modem\n"); return -ENODEV; } ret = usbnet_get_endpoints(dev, intf); if (ret) return ret; /* * this must not include ethernet header as the device can send partial * packets with no header (and sometimes even empty URBs) */ dev->net->hard_header_len = 0; /* we can send at most 1514 bytes of data (+ 2-byte header) per URB */ dev->hard_mtu = CX82310_MTU + 2; /* we can receive URBs up to 4KB from the device */ dev->rx_urb_size = 4096; dev->partial_data = (unsigned long) kmalloc(dev->hard_mtu, GFP_KERNEL); if (!dev->partial_data) return -ENOMEM; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) { ret = -ENOMEM; goto err_partial; } dev->driver_priv = priv; INIT_WORK(&priv->reenable_work, cx82310_reenable_work); priv->dev = dev; /* wait for firmware to become ready (indicated by the link being up) */ while (--timeout) { ret = cx82310_cmd(dev, CMD_GET_LINK_STATUS, true, NULL, 0, link, sizeof(link)); /* the command can time out during boot - it's not an error */ if (!ret && link[0] == 1 && link[2] == 1) break; msleep(500); } if (!timeout) { netdev_err(dev->net, "firmware not ready in time\n"); ret = -ETIMEDOUT; goto err; } /* enable ethernet mode (?) */ ret = cx82310_enable_ethernet(dev); if (ret) goto err; /* get the MAC address */ ret = cx82310_cmd(dev, CMD_GET_MAC_ADDR, true, NULL, 0, addr, ETH_ALEN); if (ret) { netdev_err(dev->net, "unable to read MAC address: %d\n", ret); goto err; } eth_hw_addr_set(dev->net, addr); /* start (does not seem to have any effect?) */ ret = cx82310_cmd(dev, CMD_START, false, NULL, 0, NULL, 0); if (ret) goto err; return 0; err: kfree(dev->driver_priv); err_partial: kfree((void *)dev->partial_data); return ret; } static void cx82310_unbind(struct usbnet *dev, struct usb_interface *intf) { struct cx82310_priv *priv = dev->driver_priv; kfree((void *)dev->partial_data); cancel_work_sync(&priv->reenable_work); kfree(dev->driver_priv); } /* * RX is NOT easy - we can receive multiple packets per skb, each having 2-byte * packet length at the beginning. * The last packet might be incomplete (when it crosses the 4KB URB size), * continuing in the next skb (without any headers). * If a packet has odd length, there is one extra byte at the end (before next * packet or at the end of the URB). */ static int cx82310_rx_fixup(struct usbnet *dev, struct sk_buff *skb) { int len; struct sk_buff *skb2; struct cx82310_priv *priv = dev->driver_priv; /* * If the last skb ended with an incomplete packet, this skb contains * end of that packet at the beginning. */ if (dev->partial_rem) { len = dev->partial_len + dev->partial_rem; skb2 = alloc_skb(len, GFP_ATOMIC); if (!skb2) return 0; skb_put(skb2, len); memcpy(skb2->data, (void *)dev->partial_data, dev->partial_len); memcpy(skb2->data + dev->partial_len, skb->data, dev->partial_rem); usbnet_skb_return(dev, skb2); skb_pull(skb, (dev->partial_rem + 1) & ~1); dev->partial_rem = 0; if (skb->len < 2) return 1; } /* a skb can contain multiple packets */ while (skb->len > 1) { /* first two bytes are packet length */ len = skb->data[0] | (skb->data[1] << 8); skb_pull(skb, 2); /* if last packet in the skb, let usbnet to process it */ if (len == skb->len || len + 1 == skb->len) { skb_trim(skb, len); break; } if (len == 0xffff) { netdev_info(dev->net, "router was rebooted, re-enabling ethernet mode"); schedule_work(&priv->reenable_work); } else if (len > CX82310_MTU) { netdev_err(dev->net, "RX packet too long: %d B\n", len); return 0; } /* incomplete packet, save it for the next skb */ if (len > skb->len) { dev->partial_len = skb->len; dev->partial_rem = len - skb->len; memcpy((void *)dev->partial_data, skb->data, dev->partial_len); skb_pull(skb, skb->len); break; } skb2 = alloc_skb(len, GFP_ATOMIC); if (!skb2) return 0; skb_put(skb2, len); memcpy(skb2->data, skb->data, len); /* process the packet */ usbnet_skb_return(dev, skb2); skb_pull(skb, (len + 1) & ~1); } /* let usbnet process the last packet */ return 1; } /* TX is easy, just add 2 bytes of length at the beginning */ static struct sk_buff *cx82310_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags) { int len = skb->len; if (skb_cow_head(skb, 2)) { dev_kfree_skb_any(skb); return NULL; } skb_push(skb, 2); skb->data[0] = len; skb->data[1] = len >> 8; return skb; } static const struct driver_info cx82310_info = { .description = "Conexant CX82310 USB ethernet", .flags = FLAG_ETHER, .bind = cx82310_bind, .unbind = cx82310_unbind, .rx_fixup = cx82310_rx_fixup, .tx_fixup = cx82310_tx_fixup, }; #define USB_DEVICE_CLASS(vend, prod, cl, sc, pr) \ .match_flags = USB_DEVICE_ID_MATCH_DEVICE | \ USB_DEVICE_ID_MATCH_DEV_INFO, \ .idVendor = (vend), \ .idProduct = (prod), \ .bDeviceClass = (cl), \ .bDeviceSubClass = (sc), \ .bDeviceProtocol = (pr) static const struct usb_device_id products[] = { { USB_DEVICE_CLASS(0x0572, 0xcb01, 0xff, 0, 0), .driver_info = (unsigned long) &cx82310_info }, { }, }; MODULE_DEVICE_TABLE(usb, products); static struct usb_driver cx82310_driver = { .name = "cx82310_eth", .id_table = products, .probe = usbnet_probe, .disconnect = usbnet_disconnect, .suspend = usbnet_suspend, .resume = usbnet_resume, .disable_hub_initiated_lpm = 1, }; module_usb_driver(cx82310_driver); MODULE_AUTHOR("Ondrej Zary"); MODULE_DESCRIPTION("Conexant CX82310-based ADSL router USB ethernet driver"); MODULE_LICENSE("GPL"); |
| 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/pagemap.h> #include <linux/blkdev.h> #include "../blk.h" /* * add_gd_partition adds a partitions details to the devices partition * description. */ struct parsed_partitions { struct gendisk *disk; char name[BDEVNAME_SIZE]; struct { sector_t from; sector_t size; int flags; bool has_info; struct partition_meta_info info; } *parts; int next; int limit; bool access_beyond_eod; char *pp_buf; }; typedef struct { struct folio *v; } Sector; void *read_part_sector(struct parsed_partitions *state, sector_t n, Sector *p); static inline void put_dev_sector(Sector p) { folio_put(p.v); } static inline void put_partition(struct parsed_partitions *p, int n, sector_t from, sector_t size) { if (n < p->limit) { char tmp[1 + BDEVNAME_SIZE + 10 + 1]; p->parts[n].from = from; p->parts[n].size = size; snprintf(tmp, sizeof(tmp), " %s%d", p->name, n); strlcat(p->pp_buf, tmp, PAGE_SIZE); } } /* detection routines go here in alphabetical order: */ int adfspart_check_ADFS(struct parsed_partitions *state); int adfspart_check_CUMANA(struct parsed_partitions *state); int adfspart_check_EESOX(struct parsed_partitions *state); int adfspart_check_ICS(struct parsed_partitions *state); int adfspart_check_POWERTEC(struct parsed_partitions *state); int aix_partition(struct parsed_partitions *state); int amiga_partition(struct parsed_partitions *state); int atari_partition(struct parsed_partitions *state); int cmdline_partition(struct parsed_partitions *state); int efi_partition(struct parsed_partitions *state); int ibm_partition(struct parsed_partitions *); int karma_partition(struct parsed_partitions *state); int ldm_partition(struct parsed_partitions *state); int mac_partition(struct parsed_partitions *state); int msdos_partition(struct parsed_partitions *state); int of_partition(struct parsed_partitions *state); int osf_partition(struct parsed_partitions *state); int sgi_partition(struct parsed_partitions *state); int sun_partition(struct parsed_partitions *state); int sysv68_partition(struct parsed_partitions *state); int ultrix_partition(struct parsed_partitions *state); |
| 16 1 1 1 10 22 52 52 52 2 1 16 16 1 22 10 35 34 52 52 52 52 52 52 2 13 8 1 5 2 3 1 3 4 5 4 5 1 1 2 1 1 1 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Dummy soundcard * Copyright (c) by Jaroslav Kysela <perex@perex.cz> */ #include <linux/init.h> #include <linux/err.h> #include <linux/platform_device.h> #include <linux/jiffies.h> #include <linux/slab.h> #include <linux/time.h> #include <linux/wait.h> #include <linux/hrtimer.h> #include <linux/math64.h> #include <linux/module.h> #include <sound/core.h> #include <sound/control.h> #include <sound/tlv.h> #include <sound/pcm.h> #include <sound/rawmidi.h> #include <sound/info.h> #include <sound/initval.h> MODULE_AUTHOR("Jaroslav Kysela <perex@perex.cz>"); MODULE_DESCRIPTION("Dummy soundcard (/dev/null)"); MODULE_LICENSE("GPL"); #define MAX_PCM_DEVICES 4 #define MAX_PCM_SUBSTREAMS 128 #define MAX_MIDI_DEVICES 2 /* defaults */ #define MAX_BUFFER_SIZE (64*1024) #define MIN_PERIOD_SIZE 64 #define MAX_PERIOD_SIZE MAX_BUFFER_SIZE #define USE_FORMATS (SNDRV_PCM_FMTBIT_U8 | SNDRV_PCM_FMTBIT_S16_LE) #define USE_RATE SNDRV_PCM_RATE_CONTINUOUS | SNDRV_PCM_RATE_8000_48000 #define USE_RATE_MIN 5500 #define USE_RATE_MAX 48000 #define USE_CHANNELS_MIN 1 #define USE_CHANNELS_MAX 2 #define USE_PERIODS_MIN 1 #define USE_PERIODS_MAX 1024 #define USE_MIXER_VOLUME_LEVEL_MIN -50 #define USE_MIXER_VOLUME_LEVEL_MAX 100 static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX; /* Index 0-MAX */ static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; /* ID for this card */ static bool enable[SNDRV_CARDS] = {1, [1 ... (SNDRV_CARDS - 1)] = 0}; static char *model[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = NULL}; static int pcm_devs[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 1}; static int pcm_substreams[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 8}; //static int midi_devs[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 2}; static int mixer_volume_level_min = USE_MIXER_VOLUME_LEVEL_MIN; static int mixer_volume_level_max = USE_MIXER_VOLUME_LEVEL_MAX; #ifdef CONFIG_HIGH_RES_TIMERS static bool hrtimer = 1; #endif static bool fake_buffer = 1; module_param_array(index, int, NULL, 0444); MODULE_PARM_DESC(index, "Index value for dummy soundcard."); module_param_array(id, charp, NULL, 0444); MODULE_PARM_DESC(id, "ID string for dummy soundcard."); module_param_array(enable, bool, NULL, 0444); MODULE_PARM_DESC(enable, "Enable this dummy soundcard."); module_param_array(model, charp, NULL, 0444); MODULE_PARM_DESC(model, "Soundcard model."); module_param_array(pcm_devs, int, NULL, 0444); MODULE_PARM_DESC(pcm_devs, "PCM devices # (0-4) for dummy driver."); module_param_array(pcm_substreams, int, NULL, 0444); MODULE_PARM_DESC(pcm_substreams, "PCM substreams # (1-128) for dummy driver."); //module_param_array(midi_devs, int, NULL, 0444); //MODULE_PARM_DESC(midi_devs, "MIDI devices # (0-2) for dummy driver."); module_param(mixer_volume_level_min, int, 0444); MODULE_PARM_DESC(mixer_volume_level_min, "Minimum mixer volume level for dummy driver. Default: -50"); module_param(mixer_volume_level_max, int, 0444); MODULE_PARM_DESC(mixer_volume_level_max, "Maximum mixer volume level for dummy driver. Default: 100"); module_param(fake_buffer, bool, 0444); MODULE_PARM_DESC(fake_buffer, "Fake buffer allocations."); #ifdef CONFIG_HIGH_RES_TIMERS module_param(hrtimer, bool, 0644); MODULE_PARM_DESC(hrtimer, "Use hrtimer as the timer source."); #endif static struct platform_device *devices[SNDRV_CARDS]; #define MIXER_ADDR_MASTER 0 #define MIXER_ADDR_LINE 1 #define MIXER_ADDR_MIC 2 #define MIXER_ADDR_SYNTH 3 #define MIXER_ADDR_CD 4 #define MIXER_ADDR_LAST 4 struct dummy_timer_ops { int (*create)(struct snd_pcm_substream *); void (*free)(struct snd_pcm_substream *); int (*prepare)(struct snd_pcm_substream *); int (*start)(struct snd_pcm_substream *); int (*stop)(struct snd_pcm_substream *); snd_pcm_uframes_t (*pointer)(struct snd_pcm_substream *); }; #define get_dummy_ops(substream) \ (*(const struct dummy_timer_ops **)(substream)->runtime->private_data) struct dummy_model { const char *name; int (*playback_constraints)(struct snd_pcm_runtime *runtime); int (*capture_constraints)(struct snd_pcm_runtime *runtime); u64 formats; size_t buffer_bytes_max; size_t period_bytes_min; size_t period_bytes_max; unsigned int periods_min; unsigned int periods_max; unsigned int rates; unsigned int rate_min; unsigned int rate_max; unsigned int channels_min; unsigned int channels_max; }; struct snd_dummy { struct snd_card *card; const struct dummy_model *model; struct snd_pcm *pcm; struct snd_pcm_hardware pcm_hw; spinlock_t mixer_lock; int mixer_volume[MIXER_ADDR_LAST+1][2]; int capture_source[MIXER_ADDR_LAST+1][2]; int iobox; struct snd_kcontrol *cd_volume_ctl; struct snd_kcontrol *cd_switch_ctl; }; /* * card models */ static int emu10k1_playback_constraints(struct snd_pcm_runtime *runtime) { int err; err = snd_pcm_hw_constraint_integer(runtime, SNDRV_PCM_HW_PARAM_PERIODS); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 256, UINT_MAX); if (err < 0) return err; return 0; } static const struct dummy_model model_emu10k1 = { .name = "emu10k1", .playback_constraints = emu10k1_playback_constraints, .buffer_bytes_max = 128 * 1024, }; static const struct dummy_model model_rme9652 = { .name = "rme9652", .buffer_bytes_max = 26 * 64 * 1024, .formats = SNDRV_PCM_FMTBIT_S32_LE, .channels_min = 26, .channels_max = 26, .periods_min = 2, .periods_max = 2, }; static const struct dummy_model model_ice1712 = { .name = "ice1712", .buffer_bytes_max = 256 * 1024, .formats = SNDRV_PCM_FMTBIT_S32_LE, .channels_min = 10, .channels_max = 10, .periods_min = 1, .periods_max = 1024, }; static const struct dummy_model model_uda1341 = { .name = "uda1341", .buffer_bytes_max = 16380, .formats = SNDRV_PCM_FMTBIT_S16_LE, .channels_min = 2, .channels_max = 2, .periods_min = 2, .periods_max = 255, }; static const struct dummy_model model_ac97 = { .name = "ac97", .formats = SNDRV_PCM_FMTBIT_S16_LE, .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_48000, .rate_min = 48000, .rate_max = 48000, }; static const struct dummy_model model_ca0106 = { .name = "ca0106", .formats = SNDRV_PCM_FMTBIT_S16_LE, .buffer_bytes_max = ((65536-64)*8), .period_bytes_max = (65536-64), .periods_min = 2, .periods_max = 8, .channels_min = 2, .channels_max = 2, .rates = SNDRV_PCM_RATE_48000|SNDRV_PCM_RATE_96000|SNDRV_PCM_RATE_192000, .rate_min = 48000, .rate_max = 192000, }; static const struct dummy_model *dummy_models[] = { &model_emu10k1, &model_rme9652, &model_ice1712, &model_uda1341, &model_ac97, &model_ca0106, NULL }; /* * system timer interface */ struct dummy_systimer_pcm { /* ops must be the first item */ const struct dummy_timer_ops *timer_ops; spinlock_t lock; struct timer_list timer; unsigned long base_time; unsigned int frac_pos; /* fractional sample position (based HZ) */ unsigned int frac_period_rest; unsigned int frac_buffer_size; /* buffer_size * HZ */ unsigned int frac_period_size; /* period_size * HZ */ unsigned int rate; int elapsed; struct snd_pcm_substream *substream; }; static void dummy_systimer_rearm(struct dummy_systimer_pcm *dpcm) { mod_timer(&dpcm->timer, jiffies + DIV_ROUND_UP(dpcm->frac_period_rest, dpcm->rate)); } static void dummy_systimer_update(struct dummy_systimer_pcm *dpcm) { unsigned long delta; delta = jiffies - dpcm->base_time; if (!delta) return; dpcm->base_time += delta; delta *= dpcm->rate; dpcm->frac_pos += delta; while (dpcm->frac_pos >= dpcm->frac_buffer_size) dpcm->frac_pos -= dpcm->frac_buffer_size; while (dpcm->frac_period_rest <= delta) { dpcm->elapsed++; dpcm->frac_period_rest += dpcm->frac_period_size; } dpcm->frac_period_rest -= delta; } static int dummy_systimer_start(struct snd_pcm_substream *substream) { struct dummy_systimer_pcm *dpcm = substream->runtime->private_data; spin_lock(&dpcm->lock); dpcm->base_time = jiffies; dummy_systimer_rearm(dpcm); spin_unlock(&dpcm->lock); return 0; } static int dummy_systimer_stop(struct snd_pcm_substream *substream) { struct dummy_systimer_pcm *dpcm = substream->runtime->private_data; spin_lock(&dpcm->lock); timer_delete(&dpcm->timer); spin_unlock(&dpcm->lock); return 0; } static int dummy_systimer_prepare(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct dummy_systimer_pcm *dpcm = runtime->private_data; dpcm->frac_pos = 0; dpcm->rate = runtime->rate; dpcm->frac_buffer_size = runtime->buffer_size * HZ; dpcm->frac_period_size = runtime->period_size * HZ; dpcm->frac_period_rest = dpcm->frac_period_size; dpcm->elapsed = 0; return 0; } static void dummy_systimer_callback(struct timer_list *t) { struct dummy_systimer_pcm *dpcm = timer_container_of(dpcm, t, timer); unsigned long flags; int elapsed = 0; spin_lock_irqsave(&dpcm->lock, flags); dummy_systimer_update(dpcm); dummy_systimer_rearm(dpcm); elapsed = dpcm->elapsed; dpcm->elapsed = 0; spin_unlock_irqrestore(&dpcm->lock, flags); if (elapsed) snd_pcm_period_elapsed(dpcm->substream); } static snd_pcm_uframes_t dummy_systimer_pointer(struct snd_pcm_substream *substream) { struct dummy_systimer_pcm *dpcm = substream->runtime->private_data; snd_pcm_uframes_t pos; spin_lock(&dpcm->lock); dummy_systimer_update(dpcm); pos = dpcm->frac_pos / HZ; spin_unlock(&dpcm->lock); return pos; } static int dummy_systimer_create(struct snd_pcm_substream *substream) { struct dummy_systimer_pcm *dpcm; dpcm = kzalloc(sizeof(*dpcm), GFP_KERNEL); if (!dpcm) return -ENOMEM; substream->runtime->private_data = dpcm; timer_setup(&dpcm->timer, dummy_systimer_callback, 0); spin_lock_init(&dpcm->lock); dpcm->substream = substream; return 0; } static void dummy_systimer_free(struct snd_pcm_substream *substream) { kfree(substream->runtime->private_data); } static const struct dummy_timer_ops dummy_systimer_ops = { .create = dummy_systimer_create, .free = dummy_systimer_free, .prepare = dummy_systimer_prepare, .start = dummy_systimer_start, .stop = dummy_systimer_stop, .pointer = dummy_systimer_pointer, }; #ifdef CONFIG_HIGH_RES_TIMERS /* * hrtimer interface */ struct dummy_hrtimer_pcm { /* ops must be the first item */ const struct dummy_timer_ops *timer_ops; ktime_t base_time; ktime_t period_time; atomic_t running; struct hrtimer timer; struct snd_pcm_substream *substream; }; static enum hrtimer_restart dummy_hrtimer_callback(struct hrtimer *timer) { struct dummy_hrtimer_pcm *dpcm; dpcm = container_of(timer, struct dummy_hrtimer_pcm, timer); if (!atomic_read(&dpcm->running)) return HRTIMER_NORESTART; /* * In cases of XRUN and draining, this calls .trigger to stop PCM * substream. */ snd_pcm_period_elapsed(dpcm->substream); if (!atomic_read(&dpcm->running)) return HRTIMER_NORESTART; hrtimer_forward_now(timer, dpcm->period_time); return HRTIMER_RESTART; } static int dummy_hrtimer_start(struct snd_pcm_substream *substream) { struct dummy_hrtimer_pcm *dpcm = substream->runtime->private_data; dpcm->base_time = hrtimer_cb_get_time(&dpcm->timer); hrtimer_start(&dpcm->timer, dpcm->period_time, HRTIMER_MODE_REL_SOFT); atomic_set(&dpcm->running, 1); return 0; } static int dummy_hrtimer_stop(struct snd_pcm_substream *substream) { struct dummy_hrtimer_pcm *dpcm = substream->runtime->private_data; atomic_set(&dpcm->running, 0); if (!hrtimer_callback_running(&dpcm->timer)) hrtimer_cancel(&dpcm->timer); return 0; } static inline void dummy_hrtimer_sync(struct dummy_hrtimer_pcm *dpcm) { hrtimer_cancel(&dpcm->timer); } static snd_pcm_uframes_t dummy_hrtimer_pointer(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct dummy_hrtimer_pcm *dpcm = runtime->private_data; u64 delta; u32 pos; delta = ktime_us_delta(hrtimer_cb_get_time(&dpcm->timer), dpcm->base_time); delta = div_u64(delta * runtime->rate + 999999, 1000000); div_u64_rem(delta, runtime->buffer_size, &pos); return pos; } static int dummy_hrtimer_prepare(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct dummy_hrtimer_pcm *dpcm = runtime->private_data; unsigned int period, rate; long sec; unsigned long nsecs; dummy_hrtimer_sync(dpcm); period = runtime->period_size; rate = runtime->rate; sec = period / rate; period %= rate; nsecs = div_u64((u64)period * 1000000000UL + rate - 1, rate); dpcm->period_time = ktime_set(sec, nsecs); return 0; } static int dummy_hrtimer_create(struct snd_pcm_substream *substream) { struct dummy_hrtimer_pcm *dpcm; dpcm = kzalloc(sizeof(*dpcm), GFP_KERNEL); if (!dpcm) return -ENOMEM; substream->runtime->private_data = dpcm; hrtimer_setup(&dpcm->timer, dummy_hrtimer_callback, CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT); dpcm->substream = substream; atomic_set(&dpcm->running, 0); return 0; } static void dummy_hrtimer_free(struct snd_pcm_substream *substream) { struct dummy_hrtimer_pcm *dpcm = substream->runtime->private_data; dummy_hrtimer_sync(dpcm); kfree(dpcm); } static const struct dummy_timer_ops dummy_hrtimer_ops = { .create = dummy_hrtimer_create, .free = dummy_hrtimer_free, .prepare = dummy_hrtimer_prepare, .start = dummy_hrtimer_start, .stop = dummy_hrtimer_stop, .pointer = dummy_hrtimer_pointer, }; #endif /* CONFIG_HIGH_RES_TIMERS */ /* * PCM interface */ static int dummy_pcm_trigger(struct snd_pcm_substream *substream, int cmd) { switch (cmd) { case SNDRV_PCM_TRIGGER_START: case SNDRV_PCM_TRIGGER_RESUME: return get_dummy_ops(substream)->start(substream); case SNDRV_PCM_TRIGGER_STOP: case SNDRV_PCM_TRIGGER_SUSPEND: return get_dummy_ops(substream)->stop(substream); } return -EINVAL; } static int dummy_pcm_prepare(struct snd_pcm_substream *substream) { return get_dummy_ops(substream)->prepare(substream); } static snd_pcm_uframes_t dummy_pcm_pointer(struct snd_pcm_substream *substream) { return get_dummy_ops(substream)->pointer(substream); } static const struct snd_pcm_hardware dummy_pcm_hardware = { .info = (SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_INTERLEAVED | SNDRV_PCM_INFO_RESUME | SNDRV_PCM_INFO_MMAP_VALID), .formats = USE_FORMATS, .rates = USE_RATE, .rate_min = USE_RATE_MIN, .rate_max = USE_RATE_MAX, .channels_min = USE_CHANNELS_MIN, .channels_max = USE_CHANNELS_MAX, .buffer_bytes_max = MAX_BUFFER_SIZE, .period_bytes_min = MIN_PERIOD_SIZE, .period_bytes_max = MAX_PERIOD_SIZE, .periods_min = USE_PERIODS_MIN, .periods_max = USE_PERIODS_MAX, .fifo_size = 0, }; static int dummy_pcm_hw_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *hw_params) { if (fake_buffer) { /* runtime->dma_bytes has to be set manually to allow mmap */ substream->runtime->dma_bytes = params_buffer_bytes(hw_params); return 0; } return 0; } static int dummy_pcm_open(struct snd_pcm_substream *substream) { struct snd_dummy *dummy = snd_pcm_substream_chip(substream); const struct dummy_model *model = dummy->model; struct snd_pcm_runtime *runtime = substream->runtime; const struct dummy_timer_ops *ops; int err; ops = &dummy_systimer_ops; #ifdef CONFIG_HIGH_RES_TIMERS if (hrtimer) ops = &dummy_hrtimer_ops; #endif err = ops->create(substream); if (err < 0) return err; get_dummy_ops(substream) = ops; runtime->hw = dummy->pcm_hw; if (substream->pcm->device & 1) { runtime->hw.info &= ~SNDRV_PCM_INFO_INTERLEAVED; runtime->hw.info |= SNDRV_PCM_INFO_NONINTERLEAVED; } if (substream->pcm->device & 2) runtime->hw.info &= ~(SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID); if (model == NULL) return 0; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { if (model->playback_constraints) err = model->playback_constraints(substream->runtime); } else { if (model->capture_constraints) err = model->capture_constraints(substream->runtime); } if (err < 0) { get_dummy_ops(substream)->free(substream); return err; } return 0; } static int dummy_pcm_close(struct snd_pcm_substream *substream) { get_dummy_ops(substream)->free(substream); return 0; } /* * dummy buffer handling */ static void *dummy_page[2]; static void free_fake_buffer(void) { if (fake_buffer) { int i; for (i = 0; i < 2; i++) if (dummy_page[i]) { free_page((unsigned long)dummy_page[i]); dummy_page[i] = NULL; } } } static int alloc_fake_buffer(void) { int i; if (!fake_buffer) return 0; for (i = 0; i < 2; i++) { dummy_page[i] = (void *)get_zeroed_page(GFP_KERNEL); if (!dummy_page[i]) { free_fake_buffer(); return -ENOMEM; } } return 0; } static int dummy_pcm_copy(struct snd_pcm_substream *substream, int channel, unsigned long pos, struct iov_iter *iter, unsigned long bytes) { return 0; /* do nothing */ } static int dummy_pcm_silence(struct snd_pcm_substream *substream, int channel, unsigned long pos, unsigned long bytes) { return 0; /* do nothing */ } static struct page *dummy_pcm_page(struct snd_pcm_substream *substream, unsigned long offset) { return virt_to_page(dummy_page[substream->stream]); /* the same page */ } static const struct snd_pcm_ops dummy_pcm_ops = { .open = dummy_pcm_open, .close = dummy_pcm_close, .hw_params = dummy_pcm_hw_params, .prepare = dummy_pcm_prepare, .trigger = dummy_pcm_trigger, .pointer = dummy_pcm_pointer, }; static const struct snd_pcm_ops dummy_pcm_ops_no_buf = { .open = dummy_pcm_open, .close = dummy_pcm_close, .hw_params = dummy_pcm_hw_params, .prepare = dummy_pcm_prepare, .trigger = dummy_pcm_trigger, .pointer = dummy_pcm_pointer, .copy = dummy_pcm_copy, .fill_silence = dummy_pcm_silence, .page = dummy_pcm_page, }; static int snd_card_dummy_pcm(struct snd_dummy *dummy, int device, int substreams) { struct snd_pcm *pcm; const struct snd_pcm_ops *ops; int err; err = snd_pcm_new(dummy->card, "Dummy PCM", device, substreams, substreams, &pcm); if (err < 0) return err; dummy->pcm = pcm; if (fake_buffer) ops = &dummy_pcm_ops_no_buf; else ops = &dummy_pcm_ops; snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, ops); snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE, ops); pcm->private_data = dummy; pcm->info_flags = 0; strcpy(pcm->name, "Dummy PCM"); if (!fake_buffer) { snd_pcm_set_managed_buffer_all(pcm, SNDRV_DMA_TYPE_CONTINUOUS, NULL, 0, 64*1024); } return 0; } /* * mixer interface */ #define DUMMY_VOLUME(xname, xindex, addr) \ { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, \ .access = SNDRV_CTL_ELEM_ACCESS_READWRITE | SNDRV_CTL_ELEM_ACCESS_TLV_READ, \ .name = xname, .index = xindex, \ .info = snd_dummy_volume_info, \ .get = snd_dummy_volume_get, .put = snd_dummy_volume_put, \ .private_value = addr, \ .tlv = { .p = db_scale_dummy } } static int snd_dummy_volume_info(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_info *uinfo) { uinfo->type = SNDRV_CTL_ELEM_TYPE_INTEGER; uinfo->count = 2; uinfo->value.integer.min = mixer_volume_level_min; uinfo->value.integer.max = mixer_volume_level_max; return 0; } static int snd_dummy_volume_get(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); int addr = kcontrol->private_value; spin_lock_irq(&dummy->mixer_lock); ucontrol->value.integer.value[0] = dummy->mixer_volume[addr][0]; ucontrol->value.integer.value[1] = dummy->mixer_volume[addr][1]; spin_unlock_irq(&dummy->mixer_lock); return 0; } static int snd_dummy_volume_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); int change, addr = kcontrol->private_value; int left, right; left = ucontrol->value.integer.value[0]; if (left < mixer_volume_level_min) left = mixer_volume_level_min; if (left > mixer_volume_level_max) left = mixer_volume_level_max; right = ucontrol->value.integer.value[1]; if (right < mixer_volume_level_min) right = mixer_volume_level_min; if (right > mixer_volume_level_max) right = mixer_volume_level_max; spin_lock_irq(&dummy->mixer_lock); change = dummy->mixer_volume[addr][0] != left || dummy->mixer_volume[addr][1] != right; dummy->mixer_volume[addr][0] = left; dummy->mixer_volume[addr][1] = right; spin_unlock_irq(&dummy->mixer_lock); return change; } static const DECLARE_TLV_DB_SCALE(db_scale_dummy, -4500, 30, 0); #define DUMMY_CAPSRC(xname, xindex, addr) \ { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, .index = xindex, \ .info = snd_dummy_capsrc_info, \ .get = snd_dummy_capsrc_get, .put = snd_dummy_capsrc_put, \ .private_value = addr } #define snd_dummy_capsrc_info snd_ctl_boolean_stereo_info static int snd_dummy_capsrc_get(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); int addr = kcontrol->private_value; spin_lock_irq(&dummy->mixer_lock); ucontrol->value.integer.value[0] = dummy->capture_source[addr][0]; ucontrol->value.integer.value[1] = dummy->capture_source[addr][1]; spin_unlock_irq(&dummy->mixer_lock); return 0; } static int snd_dummy_capsrc_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); int change, addr = kcontrol->private_value; int left, right; left = ucontrol->value.integer.value[0] & 1; right = ucontrol->value.integer.value[1] & 1; spin_lock_irq(&dummy->mixer_lock); change = dummy->capture_source[addr][0] != left && dummy->capture_source[addr][1] != right; dummy->capture_source[addr][0] = left; dummy->capture_source[addr][1] = right; spin_unlock_irq(&dummy->mixer_lock); return change; } static int snd_dummy_iobox_info(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_info *info) { static const char *const names[] = { "None", "CD Player" }; return snd_ctl_enum_info(info, 1, 2, names); } static int snd_dummy_iobox_get(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *value) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); value->value.enumerated.item[0] = dummy->iobox; return 0; } static int snd_dummy_iobox_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *value) { struct snd_dummy *dummy = snd_kcontrol_chip(kcontrol); int changed; if (value->value.enumerated.item[0] > 1) return -EINVAL; changed = value->value.enumerated.item[0] != dummy->iobox; if (changed) { dummy->iobox = value->value.enumerated.item[0]; if (dummy->iobox) { dummy->cd_volume_ctl->vd[0].access &= ~SNDRV_CTL_ELEM_ACCESS_INACTIVE; dummy->cd_switch_ctl->vd[0].access &= ~SNDRV_CTL_ELEM_ACCESS_INACTIVE; } else { dummy->cd_volume_ctl->vd[0].access |= SNDRV_CTL_ELEM_ACCESS_INACTIVE; dummy->cd_switch_ctl->vd[0].access |= SNDRV_CTL_ELEM_ACCESS_INACTIVE; } snd_ctl_notify(dummy->card, SNDRV_CTL_EVENT_MASK_INFO, &dummy->cd_volume_ctl->id); snd_ctl_notify(dummy->card, SNDRV_CTL_EVENT_MASK_INFO, &dummy->cd_switch_ctl->id); } return changed; } static const struct snd_kcontrol_new snd_dummy_controls[] = { DUMMY_VOLUME("Master Volume", 0, MIXER_ADDR_MASTER), DUMMY_CAPSRC("Master Capture Switch", 0, MIXER_ADDR_MASTER), DUMMY_VOLUME("Synth Volume", 0, MIXER_ADDR_SYNTH), DUMMY_CAPSRC("Synth Capture Switch", 0, MIXER_ADDR_SYNTH), DUMMY_VOLUME("Line Volume", 0, MIXER_ADDR_LINE), DUMMY_CAPSRC("Line Capture Switch", 0, MIXER_ADDR_LINE), DUMMY_VOLUME("Mic Volume", 0, MIXER_ADDR_MIC), DUMMY_CAPSRC("Mic Capture Switch", 0, MIXER_ADDR_MIC), DUMMY_VOLUME("CD Volume", 0, MIXER_ADDR_CD), DUMMY_CAPSRC("CD Capture Switch", 0, MIXER_ADDR_CD), { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = "External I/O Box", .info = snd_dummy_iobox_info, .get = snd_dummy_iobox_get, .put = snd_dummy_iobox_put, }, }; static int snd_card_dummy_new_mixer(struct snd_dummy *dummy) { struct snd_card *card = dummy->card; struct snd_kcontrol *kcontrol; unsigned int idx; int err; spin_lock_init(&dummy->mixer_lock); strcpy(card->mixername, "Dummy Mixer"); dummy->iobox = 1; for (idx = 0; idx < ARRAY_SIZE(snd_dummy_controls); idx++) { kcontrol = snd_ctl_new1(&snd_dummy_controls[idx], dummy); err = snd_ctl_add(card, kcontrol); if (err < 0) return err; if (!strcmp(kcontrol->id.name, "CD Volume")) dummy->cd_volume_ctl = kcontrol; else if (!strcmp(kcontrol->id.name, "CD Capture Switch")) dummy->cd_switch_ctl = kcontrol; } return 0; } #if defined(CONFIG_SND_DEBUG) && defined(CONFIG_SND_PROC_FS) /* * proc interface */ static void print_formats(struct snd_dummy *dummy, struct snd_info_buffer *buffer) { snd_pcm_format_t i; pcm_for_each_format(i) { if (dummy->pcm_hw.formats & pcm_format_to_bits(i)) snd_iprintf(buffer, " %s", snd_pcm_format_name(i)); } } static void print_rates(struct snd_dummy *dummy, struct snd_info_buffer *buffer) { static const int rates[] = { 5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000, }; int i; if (dummy->pcm_hw.rates & SNDRV_PCM_RATE_CONTINUOUS) snd_iprintf(buffer, " continuous"); if (dummy->pcm_hw.rates & SNDRV_PCM_RATE_KNOT) snd_iprintf(buffer, " knot"); for (i = 0; i < ARRAY_SIZE(rates); i++) if (dummy->pcm_hw.rates & (1 << i)) snd_iprintf(buffer, " %d", rates[i]); } #define get_dummy_int_ptr(dummy, ofs) \ (unsigned int *)((char *)&((dummy)->pcm_hw) + (ofs)) #define get_dummy_ll_ptr(dummy, ofs) \ (unsigned long long *)((char *)&((dummy)->pcm_hw) + (ofs)) struct dummy_hw_field { const char *name; const char *format; unsigned int offset; unsigned int size; }; #define FIELD_ENTRY(item, fmt) { \ .name = #item, \ .format = fmt, \ .offset = offsetof(struct snd_pcm_hardware, item), \ .size = sizeof(dummy_pcm_hardware.item) } static const struct dummy_hw_field fields[] = { FIELD_ENTRY(formats, "%#llx"), FIELD_ENTRY(rates, "%#x"), FIELD_ENTRY(rate_min, "%d"), FIELD_ENTRY(rate_max, "%d"), FIELD_ENTRY(channels_min, "%d"), FIELD_ENTRY(channels_max, "%d"), FIELD_ENTRY(buffer_bytes_max, "%ld"), FIELD_ENTRY(period_bytes_min, "%ld"), FIELD_ENTRY(period_bytes_max, "%ld"), FIELD_ENTRY(periods_min, "%d"), FIELD_ENTRY(periods_max, "%d"), }; static void dummy_proc_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { struct snd_dummy *dummy = entry->private_data; int i; for (i = 0; i < ARRAY_SIZE(fields); i++) { snd_iprintf(buffer, "%s ", fields[i].name); if (fields[i].size == sizeof(int)) snd_iprintf(buffer, fields[i].format, *get_dummy_int_ptr(dummy, fields[i].offset)); else snd_iprintf(buffer, fields[i].format, *get_dummy_ll_ptr(dummy, fields[i].offset)); if (!strcmp(fields[i].name, "formats")) print_formats(dummy, buffer); else if (!strcmp(fields[i].name, "rates")) print_rates(dummy, buffer); snd_iprintf(buffer, "\n"); } } static void dummy_proc_write(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { struct snd_dummy *dummy = entry->private_data; char line[64]; while (!snd_info_get_line(buffer, line, sizeof(line))) { char item[20]; const char *ptr; unsigned long long val; int i; ptr = snd_info_get_str(item, line, sizeof(item)); for (i = 0; i < ARRAY_SIZE(fields); i++) { if (!strcmp(item, fields[i].name)) break; } if (i >= ARRAY_SIZE(fields)) continue; snd_info_get_str(item, ptr, sizeof(item)); if (kstrtoull(item, 0, &val)) continue; if (fields[i].size == sizeof(int)) *get_dummy_int_ptr(dummy, fields[i].offset) = val; else *get_dummy_ll_ptr(dummy, fields[i].offset) = val; } } static void dummy_proc_init(struct snd_dummy *chip) { snd_card_rw_proc_new(chip->card, "dummy_pcm", chip, dummy_proc_read, dummy_proc_write); } #else #define dummy_proc_init(x) #endif /* CONFIG_SND_DEBUG && CONFIG_SND_PROC_FS */ static int snd_dummy_probe(struct platform_device *devptr) { struct snd_card *card; struct snd_dummy *dummy; const struct dummy_model *m = NULL, **mdl; int idx, err; int dev = devptr->id; err = snd_devm_card_new(&devptr->dev, index[dev], id[dev], THIS_MODULE, sizeof(struct snd_dummy), &card); if (err < 0) return err; dummy = card->private_data; dummy->card = card; for (mdl = dummy_models; *mdl && model[dev]; mdl++) { if (strcmp(model[dev], (*mdl)->name) == 0) { pr_info("snd-dummy: Using model '%s' for card %i\n", (*mdl)->name, card->number); m = dummy->model = *mdl; break; } } for (idx = 0; idx < MAX_PCM_DEVICES && idx < pcm_devs[dev]; idx++) { if (pcm_substreams[dev] < 1) pcm_substreams[dev] = 1; if (pcm_substreams[dev] > MAX_PCM_SUBSTREAMS) pcm_substreams[dev] = MAX_PCM_SUBSTREAMS; err = snd_card_dummy_pcm(dummy, idx, pcm_substreams[dev]); if (err < 0) return err; } dummy->pcm_hw = dummy_pcm_hardware; if (m) { if (m->formats) dummy->pcm_hw.formats = m->formats; if (m->buffer_bytes_max) dummy->pcm_hw.buffer_bytes_max = m->buffer_bytes_max; if (m->period_bytes_min) dummy->pcm_hw.period_bytes_min = m->period_bytes_min; if (m->period_bytes_max) dummy->pcm_hw.period_bytes_max = m->period_bytes_max; if (m->periods_min) dummy->pcm_hw.periods_min = m->periods_min; if (m->periods_max) dummy->pcm_hw.periods_max = m->periods_max; if (m->rates) dummy->pcm_hw.rates = m->rates; if (m->rate_min) dummy->pcm_hw.rate_min = m->rate_min; if (m->rate_max) dummy->pcm_hw.rate_max = m->rate_max; if (m->channels_min) dummy->pcm_hw.channels_min = m->channels_min; if (m->channels_max) dummy->pcm_hw.channels_max = m->channels_max; } if (mixer_volume_level_min > mixer_volume_level_max) { pr_warn("snd-dummy: Invalid mixer volume level: min=%d, max=%d. Fall back to default value.\n", mixer_volume_level_min, mixer_volume_level_max); mixer_volume_level_min = USE_MIXER_VOLUME_LEVEL_MIN; mixer_volume_level_max = USE_MIXER_VOLUME_LEVEL_MAX; } err = snd_card_dummy_new_mixer(dummy); if (err < 0) return err; strcpy(card->driver, "Dummy"); strcpy(card->shortname, "Dummy"); sprintf(card->longname, "Dummy %i", dev + 1); dummy_proc_init(dummy); err = snd_card_register(card); if (err < 0) return err; platform_set_drvdata(devptr, card); return 0; } static int snd_dummy_suspend(struct device *pdev) { struct snd_card *card = dev_get_drvdata(pdev); snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); return 0; } static int snd_dummy_resume(struct device *pdev) { struct snd_card *card = dev_get_drvdata(pdev); snd_power_change_state(card, SNDRV_CTL_POWER_D0); return 0; } static DEFINE_SIMPLE_DEV_PM_OPS(snd_dummy_pm, snd_dummy_suspend, snd_dummy_resume); #define SND_DUMMY_DRIVER "snd_dummy" static struct platform_driver snd_dummy_driver = { .probe = snd_dummy_probe, .driver = { .name = SND_DUMMY_DRIVER, .pm = &snd_dummy_pm, }, }; static void snd_dummy_unregister_all(void) { int i; for (i = 0; i < ARRAY_SIZE(devices); ++i) platform_device_unregister(devices[i]); platform_driver_unregister(&snd_dummy_driver); free_fake_buffer(); } static int __init alsa_card_dummy_init(void) { int i, cards, err; err = platform_driver_register(&snd_dummy_driver); if (err < 0) return err; err = alloc_fake_buffer(); if (err < 0) { platform_driver_unregister(&snd_dummy_driver); return err; } cards = 0; for (i = 0; i < SNDRV_CARDS; i++) { struct platform_device *device; if (! enable[i]) continue; device = platform_device_register_simple(SND_DUMMY_DRIVER, i, NULL, 0); if (IS_ERR(device)) continue; if (!platform_get_drvdata(device)) { platform_device_unregister(device); continue; } devices[i] = device; cards++; } if (!cards) { #ifdef MODULE pr_err("Dummy soundcard not found or device busy\n"); #endif snd_dummy_unregister_all(); return -ENODEV; } return 0; } static void __exit alsa_card_dummy_exit(void) { snd_dummy_unregister_all(); } module_init(alsa_card_dummy_init) module_exit(alsa_card_dummy_exit) |
| 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * async.h: Asynchronous function calls for boot performance * * (C) Copyright 2009 Intel Corporation * Author: Arjan van de Ven <arjan@linux.intel.com> */ #ifndef __ASYNC_H__ #define __ASYNC_H__ #include <linux/types.h> #include <linux/list.h> #include <linux/numa.h> #include <linux/device.h> typedef u64 async_cookie_t; typedef void (*async_func_t) (void *data, async_cookie_t cookie); struct async_domain { struct list_head pending; unsigned registered:1; }; /* * domain participates in global async_synchronize_full */ #define ASYNC_DOMAIN(_name) \ struct async_domain _name = { .pending = LIST_HEAD_INIT(_name.pending), \ .registered = 1 } /* * domain is free to go out of scope as soon as all pending work is * complete, this domain does not participate in async_synchronize_full */ #define ASYNC_DOMAIN_EXCLUSIVE(_name) \ struct async_domain _name = { .pending = LIST_HEAD_INIT(_name.pending), \ .registered = 0 } async_cookie_t async_schedule_node(async_func_t func, void *data, int node); async_cookie_t async_schedule_node_domain(async_func_t func, void *data, int node, struct async_domain *domain); /** * async_schedule - schedule a function for asynchronous execution * @func: function to execute asynchronously * @data: data pointer to pass to the function * * Returns an async_cookie_t that may be used for checkpointing later. * Note: This function may be called from atomic or non-atomic contexts. */ static inline async_cookie_t async_schedule(async_func_t func, void *data) { return async_schedule_node(func, data, NUMA_NO_NODE); } /** * async_schedule_domain - schedule a function for asynchronous execution within a certain domain * @func: function to execute asynchronously * @data: data pointer to pass to the function * @domain: the domain * * Returns an async_cookie_t that may be used for checkpointing later. * @domain may be used in the async_synchronize_*_domain() functions to * wait within a certain synchronization domain rather than globally. * Note: This function may be called from atomic or non-atomic contexts. */ static inline async_cookie_t async_schedule_domain(async_func_t func, void *data, struct async_domain *domain) { return async_schedule_node_domain(func, data, NUMA_NO_NODE, domain); } /** * async_schedule_dev - A device specific version of async_schedule * @func: function to execute asynchronously * @dev: device argument to be passed to function * * Returns an async_cookie_t that may be used for checkpointing later. * @dev is used as both the argument for the function and to provide NUMA * context for where to run the function. By doing this we can try to * provide for the best possible outcome by operating on the device on the * CPUs closest to the device. * Note: This function may be called from atomic or non-atomic contexts. */ static inline async_cookie_t async_schedule_dev(async_func_t func, struct device *dev) { return async_schedule_node(func, dev, dev_to_node(dev)); } bool async_schedule_dev_nocall(async_func_t func, struct device *dev); /** * async_schedule_dev_domain - A device specific version of async_schedule_domain * @func: function to execute asynchronously * @dev: device argument to be passed to function * @domain: the domain * * Returns an async_cookie_t that may be used for checkpointing later. * @dev is used as both the argument for the function and to provide NUMA * context for where to run the function. By doing this we can try to * provide for the best possible outcome by operating on the device on the * CPUs closest to the device. * @domain may be used in the async_synchronize_*_domain() functions to * wait within a certain synchronization domain rather than globally. * Note: This function may be called from atomic or non-atomic contexts. */ static inline async_cookie_t async_schedule_dev_domain(async_func_t func, struct device *dev, struct async_domain *domain) { return async_schedule_node_domain(func, dev, dev_to_node(dev), domain); } extern void async_synchronize_full(void); extern void async_synchronize_full_domain(struct async_domain *domain); extern void async_synchronize_cookie(async_cookie_t cookie); extern void async_synchronize_cookie_domain(async_cookie_t cookie, struct async_domain *domain); extern bool current_is_async(void); extern void async_init(void); #endif |
| 34 11 11 27 27 27 24 2 23 23 24 1 5 2 2 2 182 181 9 1 1 9 2 2 181 9 163 171 180 179 179 180 31 1 1 30 31 180 178 178 178 178 30 30 23 178 172 183 179 176 183 5 4 1 1 1 1 1 1 4 4 3 5 179 183 183 110 74 74 71 108 26 64 27 27 26 27 27 27 18 39 39 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 10 1 10 10 10 1 1 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 | // SPDX-License-Identifier: GPL-2.0 /* * SUCS NET3: * * Generic datagram handling routines. These are generic for all * protocols. Possibly a generic IP version on top of these would * make sense. Not tonight however 8-). * This is used because UDP, RAW, PACKET, DDP, IPX, AX.25 and * NetROM layer all have identical poll code and mostly * identical recvmsg() code. So we share it here. The poll was * shared before but buried in udp.c so I moved it. * * Authors: Alan Cox <alan@lxorguk.ukuu.org.uk>. (datagram_poll() from old * udp.c code) * * Fixes: * Alan Cox : NULL return from skb_peek_copy() * understood * Alan Cox : Rewrote skb_read_datagram to avoid the * skb_peek_copy stuff. * Alan Cox : Added support for SOCK_SEQPACKET. * IPX can no longer use the SO_TYPE hack * but AX.25 now works right, and SPX is * feasible. * Alan Cox : Fixed write poll of non IP protocol * crash. * Florian La Roche: Changed for my new skbuff handling. * Darryl Miles : Fixed non-blocking SOCK_SEQPACKET. * Linus Torvalds : BSD semantic fixes. * Alan Cox : Datagram iovec handling * Darryl Miles : Fixed non-blocking SOCK_STREAM. * Alan Cox : POSIXisms * Pete Wyckoff : Unconnected accept() fix. * */ #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/uaccess.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/errno.h> #include <linux/sched.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/rtnetlink.h> #include <linux/poll.h> #include <linux/highmem.h> #include <linux/spinlock.h> #include <linux/slab.h> #include <linux/pagemap.h> #include <linux/iov_iter.h> #include <linux/indirect_call_wrapper.h> #include <linux/crc32.h> #include <net/protocol.h> #include <linux/skbuff.h> #include <net/checksum.h> #include <net/sock.h> #include <net/tcp_states.h> #include <trace/events/skb.h> #include <net/busy_poll.h> #include "devmem.h" /* * Is a socket 'connection oriented' ? */ static inline int connection_based(struct sock *sk) { return sk->sk_type == SOCK_SEQPACKET || sk->sk_type == SOCK_STREAM; } static int receiver_wake_function(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key) { /* * Avoid a wakeup if event not interesting for us */ if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR))) return 0; return autoremove_wake_function(wait, mode, sync, key); } /* * Wait for the last received packet to be different from skb */ int __skb_wait_for_more_packets(struct sock *sk, struct sk_buff_head *queue, int *err, long *timeo_p, const struct sk_buff *skb) { int error; DEFINE_WAIT_FUNC(wait, receiver_wake_function); prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); /* Socket errors? */ error = sock_error(sk); if (error) goto out_err; if (READ_ONCE(queue->prev) != skb) goto out; /* Socket shut down? */ if (sk->sk_shutdown & RCV_SHUTDOWN) goto out_noerr; /* Sequenced packets can come disconnected. * If so we report the problem */ error = -ENOTCONN; if (connection_based(sk) && !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN)) goto out_err; /* handle signals */ if (signal_pending(current)) goto interrupted; error = 0; *timeo_p = schedule_timeout(*timeo_p); out: finish_wait(sk_sleep(sk), &wait); return error; interrupted: error = sock_intr_errno(*timeo_p); out_err: *err = error; goto out; out_noerr: *err = 0; error = 1; goto out; } EXPORT_SYMBOL(__skb_wait_for_more_packets); static struct sk_buff *skb_set_peeked(struct sk_buff *skb) { struct sk_buff *nskb; if (skb->peeked) return skb; /* We have to unshare an skb before modifying it. */ if (!skb_shared(skb)) goto done; nskb = skb_clone(skb, GFP_ATOMIC); if (!nskb) return ERR_PTR(-ENOMEM); skb->prev->next = nskb; skb->next->prev = nskb; nskb->prev = skb->prev; nskb->next = skb->next; consume_skb(skb); skb = nskb; done: skb->peeked = 1; return skb; } struct sk_buff *__skb_try_recv_from_queue(struct sk_buff_head *queue, unsigned int flags, int *off, int *err, struct sk_buff **last) { bool peek_at_off = false; struct sk_buff *skb; int _off = 0; if (unlikely(flags & MSG_PEEK && *off >= 0)) { peek_at_off = true; _off = *off; } *last = queue->prev; skb_queue_walk(queue, skb) { if (flags & MSG_PEEK) { if (peek_at_off && _off >= skb->len && (_off || skb->peeked)) { _off -= skb->len; continue; } if (!skb->len) { skb = skb_set_peeked(skb); if (IS_ERR(skb)) { *err = PTR_ERR(skb); return NULL; } } refcount_inc(&skb->users); } else { __skb_unlink(skb, queue); } *off = _off; return skb; } return NULL; } /** * __skb_try_recv_datagram - Receive a datagram skbuff * @sk: socket * @queue: socket queue from which to receive * @flags: MSG\_ flags * @off: an offset in bytes to peek skb from. Returns an offset * within an skb where data actually starts * @err: error code returned * @last: set to last peeked message to inform the wait function * what to look for when peeking * * Get a datagram skbuff, understands the peeking, nonblocking wakeups * and possible races. This replaces identical code in packet, raw and * udp, as well as the IPX AX.25 and Appletalk. It also finally fixes * the long standing peek and read race for datagram sockets. If you * alter this routine remember it must be re-entrant. * * This function will lock the socket if a skb is returned, so * the caller needs to unlock the socket in that case (usually by * calling skb_free_datagram). Returns NULL with @err set to * -EAGAIN if no data was available or to some other value if an * error was detected. * * * It does not lock socket since today. This function is * * free of race conditions. This measure should/can improve * * significantly datagram socket latencies at high loads, * * when data copying to user space takes lots of time. * * (BTW I've just killed the last cli() in IP/IPv6/core/netlink/packet * * 8) Great win.) * * --ANK (980729) * * The order of the tests when we find no data waiting are specified * quite explicitly by POSIX 1003.1g, don't change them without having * the standard around please. */ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, struct sk_buff_head *queue, unsigned int flags, int *off, int *err, struct sk_buff **last) { struct sk_buff *skb; unsigned long cpu_flags; /* * Caller is allowed not to check sk->sk_err before skb_recv_datagram() */ int error = sock_error(sk); if (error) goto no_packet; do { /* Again only user level code calls this function, so nothing * interrupt level will suddenly eat the receive_queue. * * Look at current nfs client by the way... * However, this function was correct in any case. 8) */ spin_lock_irqsave(&queue->lock, cpu_flags); skb = __skb_try_recv_from_queue(queue, flags, off, &error, last); spin_unlock_irqrestore(&queue->lock, cpu_flags); if (error) goto no_packet; if (skb) return skb; if (!sk_can_busy_loop(sk)) break; sk_busy_loop(sk, flags & MSG_DONTWAIT); } while (READ_ONCE(queue->prev) != *last); error = -EAGAIN; no_packet: *err = error; return NULL; } EXPORT_SYMBOL(__skb_try_recv_datagram); struct sk_buff *__skb_recv_datagram(struct sock *sk, struct sk_buff_head *sk_queue, unsigned int flags, int *off, int *err) { struct sk_buff *skb, *last; long timeo; timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); do { skb = __skb_try_recv_datagram(sk, sk_queue, flags, off, err, &last); if (skb) return skb; if (*err != -EAGAIN) break; } while (timeo && !__skb_wait_for_more_packets(sk, sk_queue, err, &timeo, last)); return NULL; } EXPORT_SYMBOL(__skb_recv_datagram); struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags, int *err) { int off = 0; return __skb_recv_datagram(sk, &sk->sk_receive_queue, flags, &off, err); } EXPORT_SYMBOL(skb_recv_datagram); void skb_free_datagram(struct sock *sk, struct sk_buff *skb) { consume_skb(skb); } EXPORT_SYMBOL(skb_free_datagram); int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue, struct sk_buff *skb, unsigned int flags, void (*destructor)(struct sock *sk, struct sk_buff *skb)) { int err = 0; if (flags & MSG_PEEK) { err = -ENOENT; spin_lock_bh(&sk_queue->lock); if (skb->next) { __skb_unlink(skb, sk_queue); refcount_dec(&skb->users); if (destructor) destructor(sk, skb); err = 0; } spin_unlock_bh(&sk_queue->lock); } atomic_inc(&sk->sk_drops); return err; } EXPORT_SYMBOL(__sk_queue_drop_skb); /** * skb_kill_datagram - Free a datagram skbuff forcibly * @sk: socket * @skb: datagram skbuff * @flags: MSG\_ flags * * This function frees a datagram skbuff that was received by * skb_recv_datagram. The flags argument must match the one * used for skb_recv_datagram. * * If the MSG_PEEK flag is set, and the packet is still on the * receive queue of the socket, it will be taken off the queue * before it is freed. * * This function currently only disables BH when acquiring the * sk_receive_queue lock. Therefore it must not be used in a * context where that lock is acquired in an IRQ context. * * It returns 0 if the packet was removed by us. */ int skb_kill_datagram(struct sock *sk, struct sk_buff *skb, unsigned int flags) { int err = __sk_queue_drop_skb(sk, &sk->sk_receive_queue, skb, flags, NULL); kfree_skb(skb); return err; } EXPORT_SYMBOL(skb_kill_datagram); INDIRECT_CALLABLE_DECLARE(static size_t simple_copy_to_iter(const void *addr, size_t bytes, void *data __always_unused, struct iov_iter *i)); static int __skb_datagram_iter(const struct sk_buff *skb, int offset, struct iov_iter *to, int len, bool fault_short, size_t (*cb)(const void *, size_t, void *, struct iov_iter *), void *data) { int start = skb_headlen(skb); int i, copy = start - offset, start_off = offset, n; struct sk_buff *frag_iter; /* Copy header. */ if (copy > 0) { if (copy > len) copy = len; n = INDIRECT_CALL_1(cb, simple_copy_to_iter, skb->data + offset, copy, data, to); offset += n; if (n != copy) goto short_copy; if ((len -= copy) == 0) return 0; } if (!skb_frags_readable(skb)) goto short_copy; /* Copy paged appendix. Hmm... why does this look so complicated? */ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; WARN_ON(start > offset + len); end = start + skb_frag_size(frag); if ((copy = end - offset) > 0) { u32 p_off, p_len, copied; struct page *p; u8 *vaddr; if (copy > len) copy = len; n = 0; skb_frag_foreach_page(frag, skb_frag_off(frag) + offset - start, copy, p, p_off, p_len, copied) { vaddr = kmap_local_page(p); n += INDIRECT_CALL_1(cb, simple_copy_to_iter, vaddr + p_off, p_len, data, to); kunmap_local(vaddr); } offset += n; if (n != copy) goto short_copy; if (!(len -= copy)) return 0; } start = end; } skb_walk_frags(skb, frag_iter) { int end; WARN_ON(start > offset + len); end = start + frag_iter->len; if ((copy = end - offset) > 0) { if (copy > len) copy = len; if (__skb_datagram_iter(frag_iter, offset - start, to, copy, fault_short, cb, data)) goto fault; if ((len -= copy) == 0) return 0; offset += copy; } start = end; } if (!len) return 0; /* This is not really a user copy fault, but rather someone * gave us a bogus length on the skb. We should probably * print a warning here as it may indicate a kernel bug. */ fault: iov_iter_revert(to, offset - start_off); return -EFAULT; short_copy: if (fault_short || iov_iter_count(to)) goto fault; return 0; } #ifdef CONFIG_NET_CRC32C static size_t crc32c_and_copy_to_iter(const void *addr, size_t bytes, void *_crcp, struct iov_iter *i) { u32 *crcp = _crcp; size_t copied; copied = copy_to_iter(addr, bytes, i); *crcp = crc32c(*crcp, addr, copied); return copied; } /** * skb_copy_and_crc32c_datagram_iter - Copy datagram to an iovec iterator * and update a CRC32C value. * @skb: buffer to copy * @offset: offset in the buffer to start copying from * @to: iovec iterator to copy to * @len: amount of data to copy from buffer to iovec * @crcp: pointer to CRC32C value to update * * Return: 0 on success, -EFAULT if there was a fault during copy. */ int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset, struct iov_iter *to, int len, u32 *crcp) { return __skb_datagram_iter(skb, offset, to, len, true, crc32c_and_copy_to_iter, crcp); } EXPORT_SYMBOL(skb_copy_and_crc32c_datagram_iter); #endif /* CONFIG_NET_CRC32C */ static size_t simple_copy_to_iter(const void *addr, size_t bytes, void *data __always_unused, struct iov_iter *i) { return copy_to_iter(addr, bytes, i); } /** * skb_copy_datagram_iter - Copy a datagram to an iovec iterator. * @skb: buffer to copy * @offset: offset in the buffer to start copying from * @to: iovec iterator to copy to * @len: amount of data to copy from buffer to iovec */ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset, struct iov_iter *to, int len) { trace_skb_copy_datagram_iovec(skb, len); return __skb_datagram_iter(skb, offset, to, len, false, simple_copy_to_iter, NULL); } EXPORT_SYMBOL(skb_copy_datagram_iter); /** * skb_copy_datagram_from_iter - Copy a datagram from an iov_iter. * @skb: buffer to copy * @offset: offset in the buffer to start copying to * @from: the copy source * @len: amount of data to copy to buffer from iovec * * Returns 0 or -EFAULT. */ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, struct iov_iter *from, int len) { int start = skb_headlen(skb); int i, copy = start - offset; struct sk_buff *frag_iter; /* Copy header. */ if (copy > 0) { if (copy > len) copy = len; if (copy_from_iter(skb->data + offset, copy, from) != copy) goto fault; if ((len -= copy) == 0) return 0; offset += copy; } /* Copy paged appendix. Hmm... why does this look so complicated? */ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; WARN_ON(start > offset + len); end = start + skb_frag_size(frag); if ((copy = end - offset) > 0) { size_t copied; if (copy > len) copy = len; copied = copy_page_from_iter(skb_frag_page(frag), skb_frag_off(frag) + offset - start, copy, from); if (copied != copy) goto fault; if (!(len -= copy)) return 0; offset += copy; } start = end; } skb_walk_frags(skb, frag_iter) { int end; WARN_ON(start > offset + len); end = start + frag_iter->len; if ((copy = end - offset) > 0) { if (copy > len) copy = len; if (skb_copy_datagram_from_iter(frag_iter, offset - start, from, copy)) goto fault; if ((len -= copy) == 0) return 0; offset += copy; } start = end; } if (!len) return 0; fault: return -EFAULT; } EXPORT_SYMBOL(skb_copy_datagram_from_iter); int zerocopy_fill_skb_from_iter(struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; if (!skb_frags_readable(skb)) return -EFAULT; while (length && iov_iter_count(from)) { struct page *head, *last_head = NULL; struct page *pages[MAX_SKB_FRAGS]; int refs, order, n = 0; size_t start; ssize_t copied; if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; copied = iov_iter_get_pages2(from, pages, length, MAX_SKB_FRAGS - frag, &start); if (copied < 0) return -EFAULT; length -= copied; skb->data_len += copied; skb->len += copied; skb->truesize += PAGE_ALIGN(copied + start); head = compound_head(pages[n]); order = compound_order(head); for (refs = 0; copied != 0; start = 0) { int size = min_t(int, copied, PAGE_SIZE - start); if (pages[n] - head > (1UL << order) - 1) { head = compound_head(pages[n]); order = compound_order(head); } start += (pages[n] - head) << PAGE_SHIFT; copied -= size; n++; if (frag) { skb_frag_t *last = &skb_shinfo(skb)->frags[frag - 1]; if (head == skb_frag_page(last) && start == skb_frag_off(last) + skb_frag_size(last)) { skb_frag_size_add(last, size); /* We combined this page, we need to release * a reference. Since compound pages refcount * is shared among many pages, batch the refcount * adjustments to limit false sharing. */ last_head = head; refs++; continue; } } if (refs) { page_ref_sub(last_head, refs); refs = 0; } skb_fill_page_desc_noacc(skb, frag++, head, start, size); } if (refs) page_ref_sub(last_head, refs); } return 0; } static int zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from, int length, struct net_devmem_dmabuf_binding *binding) { int i = skb_shinfo(skb)->nr_frags; size_t virt_addr, size, off; struct net_iov *niov; /* Devmem filling works by taking an IOVEC from the user where the * iov_addrs are interpreted as an offset in bytes into the dma-buf to * send from. We do not support other iter types. */ if (iov_iter_type(from) != ITER_IOVEC && iov_iter_type(from) != ITER_UBUF) return -EFAULT; while (length && iov_iter_count(from)) { if (i == MAX_SKB_FRAGS) return -EMSGSIZE; virt_addr = (size_t)iter_iov_addr(from); niov = net_devmem_get_niov_at(binding, virt_addr, &off, &size); if (!niov) return -EFAULT; size = min_t(size_t, size, length); size = min_t(size_t, size, iter_iov_len(from)); get_netmem(net_iov_to_netmem(niov)); skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off, size, PAGE_SIZE); iov_iter_advance(from, size); length -= size; i++; } return 0; } int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length, struct net_devmem_dmabuf_binding *binding) { unsigned long orig_size = skb->truesize; unsigned long truesize; int ret; if (msg && msg->msg_ubuf && msg->sg_from_iter) ret = msg->sg_from_iter(skb, from, length); else if (binding) ret = zerocopy_fill_skb_from_devmem(skb, from, length, binding); else ret = zerocopy_fill_skb_from_iter(skb, from, length); truesize = skb->truesize - orig_size; if (sk && sk->sk_type == SOCK_STREAM) { sk_wmem_queued_add(sk, truesize); if (!skb_zcopy_pure(skb)) sk_mem_charge(sk, truesize); } else { refcount_add(truesize, &skb->sk->sk_wmem_alloc); } return ret; } EXPORT_SYMBOL(__zerocopy_sg_from_iter); /** * zerocopy_sg_from_iter - Build a zerocopy datagram from an iov_iter * @skb: buffer to copy * @from: the source to copy from * * The function will first copy up to headlen, and then pin the userspace * pages and build frags through them. * * Returns 0, -EFAULT or -EMSGSIZE. */ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from) { int copy = min_t(int, skb_headlen(skb), iov_iter_count(from)); /* copy up to skb headlen */ if (skb_copy_datagram_from_iter(skb, 0, from, copy)) return -EFAULT; return __zerocopy_sg_from_iter(NULL, NULL, skb, from, ~0U, NULL); } EXPORT_SYMBOL(zerocopy_sg_from_iter); static __always_inline size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress, size_t len, void *from, void *priv2) { __wsum next, *csum = priv2; next = csum_and_copy_to_user(from + progress, iter_to, len); *csum = csum_block_add(*csum, next, progress); return next ? 0 : len; } static __always_inline size_t memcpy_to_iter_csum(void *iter_to, size_t progress, size_t len, void *from, void *priv2) { __wsum *csum = priv2; __wsum next = csum_partial_copy_nocheck(from + progress, iter_to, len); *csum = csum_block_add(*csum, next, progress); return 0; } struct csum_state { __wsum csum; size_t off; }; static size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, struct iov_iter *i) { struct csum_state *csstate = _csstate; __wsum sum; if (WARN_ON_ONCE(i->data_source)) return 0; if (unlikely(iov_iter_is_discard(i))) { // can't use csum_memcpy() for that one - data is not copied csstate->csum = csum_block_add(csstate->csum, csum_partial(addr, bytes, 0), csstate->off); csstate->off += bytes; return bytes; } sum = csum_shift(csstate->csum, csstate->off); bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum, copy_to_user_iter_csum, memcpy_to_iter_csum); csstate->csum = csum_shift(sum, csstate->off); csstate->off += bytes; return bytes; } /** * skb_copy_and_csum_datagram - Copy datagram to an iovec iterator * and update a checksum. * @skb: buffer to copy * @offset: offset in the buffer to start copying from * @to: iovec iterator to copy to * @len: amount of data to copy from buffer to iovec * @csump: checksum pointer */ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset, struct iov_iter *to, int len, __wsum *csump) { struct csum_state csdata = { .csum = *csump }; int ret; ret = __skb_datagram_iter(skb, offset, to, len, true, csum_and_copy_to_iter, &csdata); if (ret) return ret; *csump = csdata.csum; return 0; } /** * skb_copy_and_csum_datagram_msg - Copy and checksum skb to user iovec. * @skb: skbuff * @hlen: hardware length * @msg: destination * * Caller _must_ check that skb will fit to this iovec. * * Returns: 0 - success. * -EINVAL - checksum failure. * -EFAULT - fault during copy. */ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen, struct msghdr *msg) { __wsum csum; int chunk = skb->len - hlen; if (!chunk) return 0; if (msg_data_left(msg) < chunk) { if (__skb_checksum_complete(skb)) return -EINVAL; if (skb_copy_datagram_msg(skb, hlen, msg, chunk)) goto fault; } else { csum = csum_partial(skb->data, hlen, skb->csum); if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter, chunk, &csum)) goto fault; if (csum_fold(csum)) { iov_iter_revert(&msg->msg_iter, chunk); return -EINVAL; } if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) netdev_rx_csum_fault(NULL, skb); } return 0; fault: return -EFAULT; } EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg); /** * datagram_poll - generic datagram poll * @file: file struct * @sock: socket * @wait: poll table * * Datagram poll: Again totally generic. This also handles * sequenced packet sockets providing the socket receive queue * is only ever holding data ready to receive. * * Note: when you *don't* use this routine for this protocol, * and you use a different write policy from sock_writeable() * then please supply your own write_space callback. */ __poll_t datagram_poll(struct file *file, struct socket *sock, poll_table *wait) { struct sock *sk = sock->sk; __poll_t mask; u8 shutdown; sock_poll_wait(file, sock, wait); mask = 0; /* exceptional events? */ if (READ_ONCE(sk->sk_err) || !skb_queue_empty_lockless(&sk->sk_error_queue)) mask |= EPOLLERR | (sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? EPOLLPRI : 0); shutdown = READ_ONCE(sk->sk_shutdown); if (shutdown & RCV_SHUTDOWN) mask |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM; if (shutdown == SHUTDOWN_MASK) mask |= EPOLLHUP; /* readable? */ if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ if (connection_based(sk)) { int state = READ_ONCE(sk->sk_state); if (state == TCP_CLOSE) mask |= EPOLLHUP; /* connection hasn't started yet? */ if (state == TCP_SYN_SENT) return mask; } /* writable? */ if (sock_writeable(sk)) mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND; else sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); return mask; } EXPORT_SYMBOL(datagram_poll); |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 3 2 2 1 1 1 2 1 2 1 1 1 1 1 3 3 5 4 4 4 4 4 4 5 20 23 3 1 1 3 18 5 4 3 2 2 2 3 6 5 1 4 3 3 3 1 4 3 2 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 | /* * Copyright (c) 2016 Intel Corporation * * Permission to use, copy, modify, distribute, and sell this software and its * documentation for any purpose is hereby granted without fee, provided that * the above copyright notice appear in all copies and that both that copyright * notice and this permission notice appear in supporting documentation, and * that the name of the copyright holders not be used in advertising or * publicity pertaining to distribution of the software without specific, * written prior permission. The copyright holders make no representations * about the suitability of this software for any purpose. It is provided "as * is" without express or implied warranty. * * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE * OF THIS SOFTWARE. */ #include <linux/export.h> #include <linux/uaccess.h> #include <drm/drm_crtc.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_framebuffer.h> #include <drm/drm_print.h> #include <drm/drm_property.h> #include "drm_crtc_internal.h" /** * DOC: overview * * Properties as represented by &drm_property are used to extend the modeset * interface exposed to userspace. For the atomic modeset IOCTL properties are * even the only way to transport metadata about the desired new modeset * configuration from userspace to the kernel. Properties have a well-defined * value range, which is enforced by the drm core. See the documentation of the * flags member of &struct drm_property for an overview of the different * property types and ranges. * * Properties don't store the current value directly, but need to be * instantiated by attaching them to a &drm_mode_object with * drm_object_attach_property(). * * Property values are only 64bit. To support bigger piles of data (like gamma * tables, color correction matrices or large structures) a property can instead * point at a &drm_property_blob with that additional data. * * Properties are defined by their symbolic name, userspace must keep a * per-object mapping from those names to the property ID used in the atomic * IOCTL and in the get/set property IOCTL. */ static bool drm_property_flags_valid(u32 flags) { u32 legacy_type = flags & DRM_MODE_PROP_LEGACY_TYPE; u32 ext_type = flags & DRM_MODE_PROP_EXTENDED_TYPE; /* Reject undefined/deprecated flags */ if (flags & ~(DRM_MODE_PROP_LEGACY_TYPE | DRM_MODE_PROP_EXTENDED_TYPE | DRM_MODE_PROP_IMMUTABLE | DRM_MODE_PROP_ATOMIC)) return false; /* We want either a legacy type or an extended type, but not both */ if (!legacy_type == !ext_type) return false; /* Only one legacy type at a time please */ if (legacy_type && !is_power_of_2(legacy_type)) return false; return true; } /** * drm_property_create - create a new property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @num_values: number of pre-defined values * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create(struct drm_device *dev, u32 flags, const char *name, int num_values) { struct drm_property *property = NULL; int ret; if (WARN_ON(!drm_property_flags_valid(flags))) return NULL; if (WARN_ON(strlen(name) >= DRM_PROP_NAME_LEN)) return NULL; property = kzalloc(sizeof(struct drm_property), GFP_KERNEL); if (!property) return NULL; property->dev = dev; if (num_values) { property->values = kcalloc(num_values, sizeof(uint64_t), GFP_KERNEL); if (!property->values) goto fail; } ret = drm_mode_object_add(dev, &property->base, DRM_MODE_OBJECT_PROPERTY); if (ret) goto fail; property->flags = flags; property->num_values = num_values; INIT_LIST_HEAD(&property->enum_list); strscpy_pad(property->name, name, DRM_PROP_NAME_LEN); list_add_tail(&property->head, &dev->mode_config.property_list); return property; fail: kfree(property->values); kfree(property); return NULL; } EXPORT_SYMBOL(drm_property_create); /** * drm_property_create_enum - create a new enumeration property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @props: enumeration lists with property values * @num_values: number of pre-defined values * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Userspace is only allowed to set one of the predefined values for enumeration * properties. * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_enum(struct drm_device *dev, u32 flags, const char *name, const struct drm_prop_enum_list *props, int num_values) { struct drm_property *property; int i, ret; flags |= DRM_MODE_PROP_ENUM; property = drm_property_create(dev, flags, name, num_values); if (!property) return NULL; for (i = 0; i < num_values; i++) { ret = drm_property_add_enum(property, props[i].type, props[i].name); if (ret) { drm_property_destroy(dev, property); return NULL; } } return property; } EXPORT_SYMBOL(drm_property_create_enum); /** * drm_property_create_bitmask - create a new bitmask property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @props: enumeration lists with property bitflags * @num_props: size of the @props array * @supported_bits: bitmask of all supported enumeration values * * This creates a new bitmask drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Compared to plain enumeration properties userspace is allowed to set any * or'ed together combination of the predefined property bitflag values * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_bitmask(struct drm_device *dev, u32 flags, const char *name, const struct drm_prop_enum_list *props, int num_props, uint64_t supported_bits) { struct drm_property *property; int i, ret; int num_values = hweight64(supported_bits); flags |= DRM_MODE_PROP_BITMASK; property = drm_property_create(dev, flags, name, num_values); if (!property) return NULL; for (i = 0; i < num_props; i++) { if (!(supported_bits & (1ULL << props[i].type))) continue; ret = drm_property_add_enum(property, props[i].type, props[i].name); if (ret) { drm_property_destroy(dev, property); return NULL; } } return property; } EXPORT_SYMBOL(drm_property_create_bitmask); static struct drm_property *property_create_range(struct drm_device *dev, u32 flags, const char *name, uint64_t min, uint64_t max) { struct drm_property *property; property = drm_property_create(dev, flags, name, 2); if (!property) return NULL; property->values[0] = min; property->values[1] = max; return property; } /** * drm_property_create_range - create a new unsigned ranged property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @min: minimum value of the property * @max: maximum value of the property * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Userspace is allowed to set any unsigned integer value in the (min, max) * range inclusive. * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_range(struct drm_device *dev, u32 flags, const char *name, uint64_t min, uint64_t max) { return property_create_range(dev, DRM_MODE_PROP_RANGE | flags, name, min, max); } EXPORT_SYMBOL(drm_property_create_range); /** * drm_property_create_signed_range - create a new signed ranged property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @min: minimum value of the property * @max: maximum value of the property * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Userspace is allowed to set any signed integer value in the (min, max) * range inclusive. * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_signed_range(struct drm_device *dev, u32 flags, const char *name, int64_t min, int64_t max) { return property_create_range(dev, DRM_MODE_PROP_SIGNED_RANGE | flags, name, I642U64(min), I642U64(max)); } EXPORT_SYMBOL(drm_property_create_signed_range); /** * drm_property_create_object - create a new object property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * @type: object type from DRM_MODE_OBJECT_* defines * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * Userspace is only allowed to set this to any property value of the given * @type. Only useful for atomic properties, which is enforced. * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_object(struct drm_device *dev, u32 flags, const char *name, uint32_t type) { struct drm_property *property; flags |= DRM_MODE_PROP_OBJECT; if (WARN_ON(!(flags & DRM_MODE_PROP_ATOMIC))) return NULL; property = drm_property_create(dev, flags, name, 1); if (!property) return NULL; property->values[0] = type; return property; } EXPORT_SYMBOL(drm_property_create_object); /** * drm_property_create_bool - create a new boolean property type * @dev: drm device * @flags: flags specifying the property type * @name: name of the property * * This creates a new generic drm property which can then be attached to a drm * object with drm_object_attach_property(). The returned property object must * be freed with drm_property_destroy(), which is done automatically when * calling drm_mode_config_cleanup(). * * This is implemented as a ranged property with only {0, 1} as valid values. * * Returns: * A pointer to the newly created property on success, NULL on failure. */ struct drm_property *drm_property_create_bool(struct drm_device *dev, u32 flags, const char *name) { return drm_property_create_range(dev, flags, name, 0, 1); } EXPORT_SYMBOL(drm_property_create_bool); /** * drm_property_add_enum - add a possible value to an enumeration property * @property: enumeration property to change * @value: value of the new enumeration * @name: symbolic name of the new enumeration * * This functions adds enumerations to a property. * * It's use is deprecated, drivers should use one of the more specific helpers * to directly create the property with all enumerations already attached. * * Returns: * Zero on success, error code on failure. */ int drm_property_add_enum(struct drm_property *property, uint64_t value, const char *name) { struct drm_property_enum *prop_enum; int index = 0; if (WARN_ON(strlen(name) >= DRM_PROP_NAME_LEN)) return -EINVAL; if (WARN_ON(!drm_property_type_is(property, DRM_MODE_PROP_ENUM) && !drm_property_type_is(property, DRM_MODE_PROP_BITMASK))) return -EINVAL; /* * Bitmask enum properties have the additional constraint of values * from 0 to 63 */ if (WARN_ON(drm_property_type_is(property, DRM_MODE_PROP_BITMASK) && value > 63)) return -EINVAL; list_for_each_entry(prop_enum, &property->enum_list, head) { if (WARN_ON(prop_enum->value == value)) return -EINVAL; index++; } if (WARN_ON(index >= property->num_values)) return -EINVAL; prop_enum = kzalloc(sizeof(struct drm_property_enum), GFP_KERNEL); if (!prop_enum) return -ENOMEM; strscpy_pad(prop_enum->name, name, DRM_PROP_NAME_LEN); prop_enum->value = value; property->values[index] = value; list_add_tail(&prop_enum->head, &property->enum_list); return 0; } EXPORT_SYMBOL(drm_property_add_enum); /** * drm_property_destroy - destroy a drm property * @dev: drm device * @property: property to destroy * * This function frees a property including any attached resources like * enumeration values. */ void drm_property_destroy(struct drm_device *dev, struct drm_property *property) { struct drm_property_enum *prop_enum, *pt; list_for_each_entry_safe(prop_enum, pt, &property->enum_list, head) { list_del(&prop_enum->head); kfree(prop_enum); } if (property->num_values) kfree(property->values); drm_mode_object_unregister(dev, &property->base); list_del(&property->head); kfree(property); } EXPORT_SYMBOL(drm_property_destroy); int drm_mode_getproperty_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_get_property *out_resp = data; struct drm_property *property; int enum_count = 0; int value_count = 0; int i, copied; struct drm_property_enum *prop_enum; struct drm_mode_property_enum __user *enum_ptr; uint64_t __user *values_ptr; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EOPNOTSUPP; property = drm_property_find(dev, file_priv, out_resp->prop_id); if (!property) return -ENOENT; strscpy_pad(out_resp->name, property->name, DRM_PROP_NAME_LEN); out_resp->flags = property->flags; value_count = property->num_values; values_ptr = u64_to_user_ptr(out_resp->values_ptr); for (i = 0; i < value_count; i++) { if (i < out_resp->count_values && put_user(property->values[i], values_ptr + i)) { return -EFAULT; } } out_resp->count_values = value_count; copied = 0; enum_ptr = u64_to_user_ptr(out_resp->enum_blob_ptr); if (drm_property_type_is(property, DRM_MODE_PROP_ENUM) || drm_property_type_is(property, DRM_MODE_PROP_BITMASK)) { list_for_each_entry(prop_enum, &property->enum_list, head) { enum_count++; if (out_resp->count_enum_blobs < enum_count) continue; if (copy_to_user(&enum_ptr[copied].value, &prop_enum->value, sizeof(uint64_t))) return -EFAULT; if (copy_to_user(&enum_ptr[copied].name, &prop_enum->name, DRM_PROP_NAME_LEN)) return -EFAULT; copied++; } out_resp->count_enum_blobs = enum_count; } /* * NOTE: The idea seems to have been to use this to read all the blob * property values. But nothing ever added them to the corresponding * list, userspace always used the special-purpose get_blob ioctl to * read the value for a blob property. It also doesn't make a lot of * sense to return values here when everything else is just metadata for * the property itself. */ if (drm_property_type_is(property, DRM_MODE_PROP_BLOB)) out_resp->count_enum_blobs = 0; return 0; } static void drm_property_free_blob(struct kref *kref) { struct drm_property_blob *blob = container_of(kref, struct drm_property_blob, base.refcount); mutex_lock(&blob->dev->mode_config.blob_lock); list_del(&blob->head_global); mutex_unlock(&blob->dev->mode_config.blob_lock); drm_mode_object_unregister(blob->dev, &blob->base); kvfree(blob); } /** * drm_property_create_blob - Create new blob property * @dev: DRM device to create property for * @length: Length to allocate for blob data * @data: If specified, copies data into blob * * Creates a new blob property for a specified DRM device, optionally * copying data. Note that blob properties are meant to be invariant, hence the * data must be filled out before the blob is used as the value of any property. * * Returns: * New blob property with a single reference on success, or an ERR_PTR * value on failure. */ struct drm_property_blob * drm_property_create_blob(struct drm_device *dev, size_t length, const void *data) { struct drm_property_blob *blob; int ret; if (!length || length > INT_MAX - sizeof(struct drm_property_blob)) return ERR_PTR(-EINVAL); blob = kvzalloc(sizeof(struct drm_property_blob)+length, GFP_KERNEL); if (!blob) return ERR_PTR(-ENOMEM); /* This must be explicitly initialised, so we can safely call list_del * on it in the removal handler, even if it isn't in a file list. */ INIT_LIST_HEAD(&blob->head_file); blob->data = (void *)blob + sizeof(*blob); blob->length = length; blob->dev = dev; if (data) memcpy(blob->data, data, length); ret = __drm_mode_object_add(dev, &blob->base, DRM_MODE_OBJECT_BLOB, true, drm_property_free_blob); if (ret) { kvfree(blob); return ERR_PTR(-EINVAL); } mutex_lock(&dev->mode_config.blob_lock); list_add_tail(&blob->head_global, &dev->mode_config.property_blob_list); mutex_unlock(&dev->mode_config.blob_lock); return blob; } EXPORT_SYMBOL(drm_property_create_blob); /** * drm_property_blob_put - release a blob property reference * @blob: DRM blob property * * Releases a reference to a blob property. May free the object. */ void drm_property_blob_put(struct drm_property_blob *blob) { if (!blob) return; drm_mode_object_put(&blob->base); } EXPORT_SYMBOL(drm_property_blob_put); void drm_property_destroy_user_blobs(struct drm_device *dev, struct drm_file *file_priv) { struct drm_property_blob *blob, *bt; /* * When the file gets released that means no one else can access the * blob list any more, so no need to grab dev->blob_lock. */ list_for_each_entry_safe(blob, bt, &file_priv->blobs, head_file) { list_del_init(&blob->head_file); drm_property_blob_put(blob); } } /** * drm_property_blob_get - acquire blob property reference * @blob: DRM blob property * * Acquires a reference to an existing blob property. Returns @blob, which * allows this to be used as a shorthand in assignments. */ struct drm_property_blob *drm_property_blob_get(struct drm_property_blob *blob) { drm_mode_object_get(&blob->base); return blob; } EXPORT_SYMBOL(drm_property_blob_get); /** * drm_property_lookup_blob - look up a blob property and take a reference * @dev: drm device * @id: id of the blob property * * If successful, this takes an additional reference to the blob property. * callers need to make sure to eventually unreferenced the returned property * again, using drm_property_blob_put(). * * Return: * NULL on failure, pointer to the blob on success. */ struct drm_property_blob *drm_property_lookup_blob(struct drm_device *dev, uint32_t id) { struct drm_mode_object *obj; struct drm_property_blob *blob = NULL; obj = __drm_mode_object_find(dev, NULL, id, DRM_MODE_OBJECT_BLOB); if (obj) blob = obj_to_blob(obj); return blob; } EXPORT_SYMBOL(drm_property_lookup_blob); /** * drm_property_replace_global_blob - replace existing blob property * @dev: drm device * @replace: location of blob property pointer to be replaced * @length: length of data for new blob, or 0 for no data * @data: content for new blob, or NULL for no data * @obj_holds_id: optional object for property holding blob ID * @prop_holds_id: optional property holding blob ID * @return 0 on success or error on failure * * This function will replace a global property in the blob list, optionally * updating a property which holds the ID of that property. * * If length is 0 or data is NULL, no new blob will be created, and the holding * property, if specified, will be set to 0. * * Access to the replace pointer is assumed to be protected by the caller, e.g. * by holding the relevant modesetting object lock for its parent. * * For example, a drm_connector has a 'PATH' property, which contains the ID * of a blob property with the value of the MST path information. Calling this * function with replace pointing to the connector's path_blob_ptr, length and * data set for the new path information, obj_holds_id set to the connector's * base object, and prop_holds_id set to the path property name, will perform * a completely atomic update. The access to path_blob_ptr is protected by the * caller holding a lock on the connector. */ int drm_property_replace_global_blob(struct drm_device *dev, struct drm_property_blob **replace, size_t length, const void *data, struct drm_mode_object *obj_holds_id, struct drm_property *prop_holds_id) { struct drm_property_blob *new_blob = NULL; struct drm_property_blob *old_blob = NULL; int ret; WARN_ON(replace == NULL); old_blob = *replace; if (length && data) { new_blob = drm_property_create_blob(dev, length, data); if (IS_ERR(new_blob)) return PTR_ERR(new_blob); } if (obj_holds_id) { ret = drm_object_property_set_value(obj_holds_id, prop_holds_id, new_blob ? new_blob->base.id : 0); if (ret != 0) goto err_created; } drm_property_blob_put(old_blob); *replace = new_blob; return 0; err_created: drm_property_blob_put(new_blob); return ret; } EXPORT_SYMBOL(drm_property_replace_global_blob); /** * drm_property_replace_blob - replace a blob property * @blob: a pointer to the member blob to be replaced * @new_blob: the new blob to replace with * * Return: true if the blob was in fact replaced. */ bool drm_property_replace_blob(struct drm_property_blob **blob, struct drm_property_blob *new_blob) { struct drm_property_blob *old_blob = *blob; if (old_blob == new_blob) return false; drm_property_blob_put(old_blob); if (new_blob) drm_property_blob_get(new_blob); *blob = new_blob; return true; } EXPORT_SYMBOL(drm_property_replace_blob); /** * drm_property_replace_blob_from_id - replace a blob property taking a reference * @dev: DRM device * @blob: a pointer to the member blob to be replaced * @blob_id: the id of the new blob to replace with * @expected_size: expected size of the blob property * @expected_elem_size: expected size of an element in the blob property * @replaced: if the blob was in fact replaced * * Look up the new blob from id, take its reference, check expected sizes of * the blob and its element and replace the old blob by the new one. Advertise * if the replacement operation was successful. * * Return: true if the blob was in fact replaced. -EINVAL if the new blob was * not found or sizes don't match. */ int drm_property_replace_blob_from_id(struct drm_device *dev, struct drm_property_blob **blob, uint64_t blob_id, ssize_t expected_size, ssize_t expected_elem_size, bool *replaced) { struct drm_property_blob *new_blob = NULL; if (blob_id != 0) { new_blob = drm_property_lookup_blob(dev, blob_id); if (new_blob == NULL) { drm_dbg_atomic(dev, "cannot find blob ID %llu\n", blob_id); return -EINVAL; } if (expected_size > 0 && new_blob->length != expected_size) { drm_dbg_atomic(dev, "[BLOB:%d] length %zu different from expected %zu\n", new_blob->base.id, new_blob->length, expected_size); drm_property_blob_put(new_blob); return -EINVAL; } if (expected_elem_size > 0 && new_blob->length % expected_elem_size != 0) { drm_dbg_atomic(dev, "[BLOB:%d] length %zu not divisible by element size %zu\n", new_blob->base.id, new_blob->length, expected_elem_size); drm_property_blob_put(new_blob); return -EINVAL; } } *replaced |= drm_property_replace_blob(blob, new_blob); drm_property_blob_put(new_blob); return 0; } EXPORT_SYMBOL(drm_property_replace_blob_from_id); int drm_mode_getblob_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_get_blob *out_resp = data; struct drm_property_blob *blob; int ret = 0; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EOPNOTSUPP; blob = drm_property_lookup_blob(dev, out_resp->blob_id); if (!blob) return -ENOENT; if (out_resp->length == blob->length) { if (copy_to_user(u64_to_user_ptr(out_resp->data), blob->data, blob->length)) { ret = -EFAULT; goto unref; } } out_resp->length = blob->length; unref: drm_property_blob_put(blob); return ret; } int drm_mode_createblob_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_create_blob *out_resp = data; struct drm_property_blob *blob; int ret = 0; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EOPNOTSUPP; blob = drm_property_create_blob(dev, out_resp->length, NULL); if (IS_ERR(blob)) return PTR_ERR(blob); if (copy_from_user(blob->data, u64_to_user_ptr(out_resp->data), out_resp->length)) { ret = -EFAULT; goto out_blob; } /* Dropping the lock between create_blob and our access here is safe * as only the same file_priv can remove the blob; at this point, it is * not associated with any file_priv. */ mutex_lock(&dev->mode_config.blob_lock); out_resp->blob_id = blob->base.id; list_add_tail(&blob->head_file, &file_priv->blobs); mutex_unlock(&dev->mode_config.blob_lock); return 0; out_blob: drm_property_blob_put(blob); return ret; } int drm_mode_destroyblob_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_destroy_blob *out_resp = data; struct drm_property_blob *blob = NULL, *bt; bool found = false; int ret = 0; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EOPNOTSUPP; blob = drm_property_lookup_blob(dev, out_resp->blob_id); if (!blob) return -ENOENT; mutex_lock(&dev->mode_config.blob_lock); /* Ensure the property was actually created by this user. */ list_for_each_entry(bt, &file_priv->blobs, head_file) { if (bt == blob) { found = true; break; } } if (!found) { ret = -EPERM; goto err; } /* We must drop head_file here, because we may not be the last * reference on the blob. */ list_del_init(&blob->head_file); mutex_unlock(&dev->mode_config.blob_lock); /* One reference from lookup, and one from the filp. */ drm_property_blob_put(blob); drm_property_blob_put(blob); return 0; err: mutex_unlock(&dev->mode_config.blob_lock); drm_property_blob_put(blob); return ret; } /* Some properties could refer to dynamic refcnt'd objects, or things that * need special locking to handle lifetime issues (ie. to ensure the prop * value doesn't become invalid part way through the property update due to * race). The value returned by reference via 'obj' should be passed back * to drm_property_change_valid_put() after the property is set (and the * object to which the property is attached has a chance to take its own * reference). */ bool drm_property_change_valid_get(struct drm_property *property, uint64_t value, struct drm_mode_object **ref) { int i; if (property->flags & DRM_MODE_PROP_IMMUTABLE) return false; *ref = NULL; if (drm_property_type_is(property, DRM_MODE_PROP_RANGE)) { if (value < property->values[0] || value > property->values[1]) return false; return true; } else if (drm_property_type_is(property, DRM_MODE_PROP_SIGNED_RANGE)) { int64_t svalue = U642I64(value); if (svalue < U642I64(property->values[0]) || svalue > U642I64(property->values[1])) return false; return true; } else if (drm_property_type_is(property, DRM_MODE_PROP_BITMASK)) { uint64_t valid_mask = 0; for (i = 0; i < property->num_values; i++) valid_mask |= (1ULL << property->values[i]); return !(value & ~valid_mask); } else if (drm_property_type_is(property, DRM_MODE_PROP_BLOB)) { struct drm_property_blob *blob; if (value == 0) return true; blob = drm_property_lookup_blob(property->dev, value); if (blob) { *ref = &blob->base; return true; } else { return false; } } else if (drm_property_type_is(property, DRM_MODE_PROP_OBJECT)) { /* a zero value for an object property translates to null: */ if (value == 0) return true; *ref = __drm_mode_object_find(property->dev, NULL, value, property->values[0]); return *ref != NULL; } for (i = 0; i < property->num_values; i++) if (property->values[i] == value) return true; return false; } void drm_property_change_valid_put(struct drm_property *property, struct drm_mode_object *ref) { if (!ref) return; if (drm_property_type_is(property, DRM_MODE_PROP_OBJECT)) { drm_mode_object_put(ref); } else if (drm_property_type_is(property, DRM_MODE_PROP_BLOB)) drm_property_blob_put(obj_to_blob(ref)); } |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | /* SPDX-License-Identifier: GPL-2.0 */ /* taskstats_kern.h - kernel header for per-task statistics interface * * Copyright (C) Shailabh Nagar, IBM Corp. 2006 * (C) Balbir Singh, IBM Corp. 2006 */ #ifndef _LINUX_TASKSTATS_KERN_H #define _LINUX_TASKSTATS_KERN_H #include <linux/taskstats.h> #include <linux/sched/signal.h> #include <linux/slab.h> #ifdef CONFIG_TASKSTATS extern struct kmem_cache *taskstats_cache; extern struct mutex taskstats_exit_mutex; static inline void taskstats_tgid_free(struct signal_struct *sig) { if (sig->stats) kmem_cache_free(taskstats_cache, sig->stats); } extern void taskstats_exit(struct task_struct *, int group_dead); extern void taskstats_init_early(void); #else static inline void taskstats_exit(struct task_struct *tsk, int group_dead) {} static inline void taskstats_tgid_free(struct signal_struct *sig) {} static inline void taskstats_init_early(void) {} #endif /* CONFIG_TASKSTATS */ #endif |
| 418 419 427 420 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Released under the GPLv2 only. */ #include <linux/pm.h> #include <linux/acpi.h> struct usb_hub_descriptor; struct usb_dev_state; /* Functions local to drivers/usb/core/ */ extern int usb_create_sysfs_dev_files(struct usb_device *dev); extern void usb_remove_sysfs_dev_files(struct usb_device *dev); extern void usb_create_sysfs_intf_files(struct usb_interface *intf); extern void usb_remove_sysfs_intf_files(struct usb_interface *intf); extern int usb_update_wireless_status_attr(struct usb_interface *intf); extern int usb_create_ep_devs(struct device *parent, struct usb_host_endpoint *endpoint, struct usb_device *udev); extern void usb_remove_ep_devs(struct usb_host_endpoint *endpoint); extern void usb_enable_endpoint(struct usb_device *dev, struct usb_host_endpoint *ep, bool reset_toggle); extern void usb_enable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_toggles); extern void usb_disable_endpoint(struct usb_device *dev, unsigned int epaddr, bool reset_hardware); extern void usb_disable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_hardware); extern void usb_release_interface_cache(struct kref *ref); extern void usb_disable_device(struct usb_device *dev, int skip_ep0); extern int usb_deauthorize_device(struct usb_device *); extern int usb_authorize_device(struct usb_device *); extern void usb_deauthorize_interface(struct usb_interface *); extern void usb_authorize_interface(struct usb_interface *); extern void usb_detect_quirks(struct usb_device *udev); extern void usb_detect_interface_quirks(struct usb_device *udev); extern void usb_release_quirk_list(void); extern bool usb_endpoint_is_ignored(struct usb_device *udev, struct usb_host_interface *intf, struct usb_endpoint_descriptor *epd); extern int usb_remove_device(struct usb_device *udev); extern struct usb_device_descriptor *usb_get_device_descriptor( struct usb_device *udev); extern int usb_set_isoch_delay(struct usb_device *dev); extern int usb_get_bos_descriptor(struct usb_device *dev); extern void usb_release_bos_descriptor(struct usb_device *dev); extern int usb_set_configuration(struct usb_device *dev, int configuration); extern int usb_choose_configuration(struct usb_device *udev); extern int usb_generic_driver_probe(struct usb_device *udev); extern void usb_generic_driver_disconnect(struct usb_device *udev); extern int usb_generic_driver_suspend(struct usb_device *udev, pm_message_t msg); extern int usb_generic_driver_resume(struct usb_device *udev, pm_message_t msg); static inline unsigned usb_get_max_power(struct usb_device *udev, struct usb_host_config *c) { /* SuperSpeed power is in 8 mA units; others are in 2 mA units */ unsigned mul = (udev->speed >= USB_SPEED_SUPER ? 8 : 2); return c->desc.bMaxPower * mul; } extern void usb_kick_hub_wq(struct usb_device *dev); extern int usb_match_one_id_intf(struct usb_device *dev, struct usb_host_interface *intf, const struct usb_device_id *id); extern int usb_match_device(struct usb_device *dev, const struct usb_device_id *id); extern const struct usb_device_id *usb_device_match_id(struct usb_device *udev, const struct usb_device_id *id); extern bool usb_driver_applicable(struct usb_device *udev, const struct usb_device_driver *udrv); extern void usb_forced_unbind_intf(struct usb_interface *intf); extern void usb_unbind_and_rebind_marked_interfaces(struct usb_device *udev); extern void usb_hub_release_all_ports(struct usb_device *hdev, struct usb_dev_state *owner); extern bool usb_device_is_owned(struct usb_device *udev); extern int usb_hub_init(void); extern void usb_hub_cleanup(void); extern int usb_major_init(void); extern void usb_major_cleanup(void); extern int usb_device_supports_lpm(struct usb_device *udev); extern int usb_port_disable(struct usb_device *udev); #ifdef CONFIG_PM extern int usb_suspend(struct device *dev, pm_message_t msg); extern int usb_resume(struct device *dev, pm_message_t msg); extern int usb_resume_complete(struct device *dev); extern int usb_port_suspend(struct usb_device *dev, pm_message_t msg); extern int usb_port_resume(struct usb_device *dev, pm_message_t msg); extern void usb_autosuspend_device(struct usb_device *udev); extern int usb_autoresume_device(struct usb_device *udev); extern int usb_remote_wakeup(struct usb_device *dev); extern int usb_runtime_suspend(struct device *dev); extern int usb_runtime_resume(struct device *dev); extern int usb_runtime_idle(struct device *dev); extern int usb_enable_usb2_hardware_lpm(struct usb_device *udev); extern int usb_disable_usb2_hardware_lpm(struct usb_device *udev); extern void usbfs_notify_suspend(struct usb_device *udev); extern void usbfs_notify_resume(struct usb_device *udev); #else static inline int usb_port_suspend(struct usb_device *udev, pm_message_t msg) { return 0; } static inline int usb_port_resume(struct usb_device *udev, pm_message_t msg) { return 0; } #define usb_autosuspend_device(udev) do {} while (0) static inline int usb_autoresume_device(struct usb_device *udev) { return 0; } static inline int usb_enable_usb2_hardware_lpm(struct usb_device *udev) { return 0; } static inline int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { return 0; } #endif extern const struct class usbmisc_class; extern const struct bus_type usb_bus_type; extern struct mutex usb_port_peer_mutex; extern const struct device_type usb_device_type; extern const struct device_type usb_if_device_type; extern const struct device_type usb_ep_device_type; extern const struct device_type usb_port_device_type; extern struct usb_device_driver usb_generic_driver; static inline int is_usb_device(const struct device *dev) { return dev->type == &usb_device_type; } static inline int is_usb_interface(const struct device *dev) { return dev->type == &usb_if_device_type; } static inline int is_usb_endpoint(const struct device *dev) { return dev->type == &usb_ep_device_type; } static inline int is_usb_port(const struct device *dev) { return dev->type == &usb_port_device_type; } static inline int is_root_hub(struct usb_device *udev) { return (udev->parent == NULL); } extern bool is_usb_device_driver(const struct device_driver *drv); /* for labeling diagnostics */ extern const char *usbcore_name; /* sysfs stuff */ extern const struct attribute_group *usb_device_groups[]; extern const struct attribute_group *usb_interface_groups[]; /* usbfs stuff */ extern struct usb_driver usbfs_driver; extern const struct file_operations usbfs_devices_fops; extern const struct file_operations usbdev_file_operations; extern int usb_devio_init(void); extern void usb_devio_cleanup(void); /* * Firmware specific cookie identifying a port's location. '0' == no location * data available */ typedef u32 usb_port_location_t; /* internal notify stuff */ extern void usb_notify_add_device(struct usb_device *udev); extern void usb_notify_remove_device(struct usb_device *udev); extern void usb_notify_add_bus(struct usb_bus *ubus); extern void usb_notify_remove_bus(struct usb_bus *ubus); extern void usb_hub_adjust_deviceremovable(struct usb_device *hdev, struct usb_hub_descriptor *desc); #ifdef CONFIG_ACPI extern int usb_acpi_register(void); extern void usb_acpi_unregister(void); extern acpi_handle usb_get_hub_port_acpi_handle(struct usb_device *hdev, int port1); #else static inline int usb_acpi_register(void) { return 0; }; static inline void usb_acpi_unregister(void) { }; #endif |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_RECIPROCAL_DIV_H #define _LINUX_RECIPROCAL_DIV_H #include <linux/types.h> /* * This algorithm is based on the paper "Division by Invariant * Integers Using Multiplication" by Torbjörn Granlund and Peter * L. Montgomery. * * The assembler implementation from Agner Fog, which this code is * based on, can be found here: * http://www.agner.org/optimize/asmlib.zip * * This optimization for A/B is helpful if the divisor B is mostly * runtime invariant. The reciprocal of B is calculated in the * slow-path with reciprocal_value(). The fast-path can then just use * a much faster multiplication operation with a variable dividend A * to calculate the division A/B. */ struct reciprocal_value { u32 m; u8 sh1, sh2; }; /* "reciprocal_value" and "reciprocal_divide" together implement the basic * version of the algorithm described in Figure 4.1 of the paper. */ struct reciprocal_value reciprocal_value(u32 d); static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R) { u32 t = (u32)(((u64)a * R.m) >> 32); return (t + ((a - t) >> R.sh1)) >> R.sh2; } struct reciprocal_value_adv { u32 m; u8 sh, exp; bool is_wide_m; }; /* "reciprocal_value_adv" implements the advanced version of the algorithm * described in Figure 4.2 of the paper except when "divisor > (1U << 31)" whose * ceil(log2(d)) result will be 32 which then requires u128 divide on host. The * exception case could be easily handled before calling "reciprocal_value_adv". * * The advanced version requires more complex calculation to get the reciprocal * multiplier and other control variables, but then could reduce the required * emulation operations. * * It makes no sense to use this advanced version for host divide emulation, * those extra complexities for calculating multiplier etc could completely * waive our saving on emulation operations. * * However, it makes sense to use it for JIT divide code generation for which * we are willing to trade performance of JITed code with that of host. As shown * by the following pseudo code, the required emulation operations could go down * from 6 (the basic version) to 3 or 4. * * To use the result of "reciprocal_value_adv", suppose we want to calculate * n/d, the pseudo C code will be: * * struct reciprocal_value_adv rvalue; * u8 pre_shift, exp; * * // handle exception case. * if (d >= (1U << 31)) { * result = n >= d; * return; * } * * rvalue = reciprocal_value_adv(d, 32) * exp = rvalue.exp; * if (rvalue.is_wide_m && !(d & 1)) { * // floor(log2(d & (2^32 -d))) * pre_shift = fls(d & -d) - 1; * rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift); * } else { * pre_shift = 0; * } * * // code generation starts. * if (imm == 1U << exp) { * result = n >> exp; * } else if (rvalue.is_wide_m) { * // pre_shift must be zero when reached here. * t = (n * rvalue.m) >> 32; * result = n - t; * result >>= 1; * result += t; * result >>= rvalue.sh - 1; * } else { * if (pre_shift) * result = n >> pre_shift; * result = ((u64)result * rvalue.m) >> 32; * result >>= rvalue.sh; * } */ struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec); #endif /* _LINUX_RECIPROCAL_DIV_H */ |
| 1 2 3 3 3 3 3 3 3 3 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | // SPDX-License-Identifier: GPL-2.0-only /* * pcrypt - Parallel crypto wrapper. * * Copyright (C) 2009 secunet Security Networks AG * Copyright (C) 2009 Steffen Klassert <steffen.klassert@secunet.com> */ #include <crypto/algapi.h> #include <crypto/internal/aead.h> #include <linux/atomic.h> #include <linux/err.h> #include <linux/init.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/kobject.h> #include <linux/cpu.h> #include <crypto/pcrypt.h> static struct padata_instance *pencrypt; static struct padata_instance *pdecrypt; static struct kset *pcrypt_kset; struct pcrypt_instance_ctx { struct crypto_aead_spawn spawn; struct padata_shell *psenc; struct padata_shell *psdec; atomic_t tfm_count; }; struct pcrypt_aead_ctx { struct crypto_aead *child; unsigned int cb_cpu; }; static inline struct pcrypt_instance_ctx *pcrypt_tfm_ictx( struct crypto_aead *tfm) { return aead_instance_ctx(aead_alg_instance(tfm)); } static int pcrypt_aead_setkey(struct crypto_aead *parent, const u8 *key, unsigned int keylen) { struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent); return crypto_aead_setkey(ctx->child, key, keylen); } static int pcrypt_aead_setauthsize(struct crypto_aead *parent, unsigned int authsize) { struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent); return crypto_aead_setauthsize(ctx->child, authsize); } static void pcrypt_aead_serial(struct padata_priv *padata) { struct pcrypt_request *preq = pcrypt_padata_request(padata); struct aead_request *req = pcrypt_request_ctx(preq); aead_request_complete(req->base.data, padata->info); } static void pcrypt_aead_done(void *data, int err) { struct aead_request *req = data; struct pcrypt_request *preq = aead_request_ctx(req); struct padata_priv *padata = pcrypt_request_padata(preq); padata->info = err; padata_do_serial(padata); } static void pcrypt_aead_enc(struct padata_priv *padata) { struct pcrypt_request *preq = pcrypt_padata_request(padata); struct aead_request *req = pcrypt_request_ctx(preq); int ret; ret = crypto_aead_encrypt(req); if (ret == -EINPROGRESS) return; padata->info = ret; padata_do_serial(padata); } static int pcrypt_aead_encrypt(struct aead_request *req) { int err; struct pcrypt_request *preq = aead_request_ctx(req); struct aead_request *creq = pcrypt_request_ctx(preq); struct padata_priv *padata = pcrypt_request_padata(preq); struct crypto_aead *aead = crypto_aead_reqtfm(req); struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead); u32 flags = aead_request_flags(req); struct pcrypt_instance_ctx *ictx; ictx = pcrypt_tfm_ictx(aead); memset(padata, 0, sizeof(struct padata_priv)); padata->parallel = pcrypt_aead_enc; padata->serial = pcrypt_aead_serial; aead_request_set_tfm(creq, ctx->child); aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP, pcrypt_aead_done, req); aead_request_set_crypt(creq, req->src, req->dst, req->cryptlen, req->iv); aead_request_set_ad(creq, req->assoclen); err = padata_do_parallel(ictx->psenc, padata, &ctx->cb_cpu); if (!err) return -EINPROGRESS; if (err == -EBUSY) { /* try non-parallel mode */ return crypto_aead_encrypt(creq); } return err; } static void pcrypt_aead_dec(struct padata_priv *padata) { struct pcrypt_request *preq = pcrypt_padata_request(padata); struct aead_request *req = pcrypt_request_ctx(preq); int ret; ret = crypto_aead_decrypt(req); if (ret == -EINPROGRESS) return; padata->info = ret; padata_do_serial(padata); } static int pcrypt_aead_decrypt(struct aead_request *req) { int err; struct pcrypt_request *preq = aead_request_ctx(req); struct aead_request *creq = pcrypt_request_ctx(preq); struct padata_priv *padata = pcrypt_request_padata(preq); struct crypto_aead *aead = crypto_aead_reqtfm(req); struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead); u32 flags = aead_request_flags(req); struct pcrypt_instance_ctx *ictx; ictx = pcrypt_tfm_ictx(aead); memset(padata, 0, sizeof(struct padata_priv)); padata->parallel = pcrypt_aead_dec; padata->serial = pcrypt_aead_serial; aead_request_set_tfm(creq, ctx->child); aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP, pcrypt_aead_done, req); aead_request_set_crypt(creq, req->src, req->dst, req->cryptlen, req->iv); aead_request_set_ad(creq, req->assoclen); err = padata_do_parallel(ictx->psdec, padata, &ctx->cb_cpu); if (!err) return -EINPROGRESS; if (err == -EBUSY) { /* try non-parallel mode */ return crypto_aead_decrypt(creq); } return err; } static int pcrypt_aead_init_tfm(struct crypto_aead *tfm) { int cpu, cpu_index; struct aead_instance *inst = aead_alg_instance(tfm); struct pcrypt_instance_ctx *ictx = aead_instance_ctx(inst); struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(tfm); struct crypto_aead *cipher; cpu_index = (unsigned int)atomic_inc_return(&ictx->tfm_count) % cpumask_weight(cpu_online_mask); ctx->cb_cpu = cpumask_first(cpu_online_mask); for (cpu = 0; cpu < cpu_index; cpu++) ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask); cipher = crypto_spawn_aead(&ictx->spawn); if (IS_ERR(cipher)) return PTR_ERR(cipher); ctx->child = cipher; crypto_aead_set_reqsize(tfm, sizeof(struct pcrypt_request) + sizeof(struct aead_request) + crypto_aead_reqsize(cipher)); return 0; } static void pcrypt_aead_exit_tfm(struct crypto_aead *tfm) { struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(tfm); crypto_free_aead(ctx->child); } static void pcrypt_free(struct aead_instance *inst) { struct pcrypt_instance_ctx *ctx = aead_instance_ctx(inst); crypto_drop_aead(&ctx->spawn); padata_free_shell(ctx->psdec); padata_free_shell(ctx->psenc); kfree(inst); } static int pcrypt_init_instance(struct crypto_instance *inst, struct crypto_alg *alg) { if (snprintf(inst->alg.cra_driver_name, CRYPTO_MAX_ALG_NAME, "pcrypt(%s)", alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG; memcpy(inst->alg.cra_name, alg->cra_name, CRYPTO_MAX_ALG_NAME); inst->alg.cra_priority = alg->cra_priority + 100; inst->alg.cra_blocksize = alg->cra_blocksize; inst->alg.cra_alignmask = alg->cra_alignmask; return 0; } static int pcrypt_create_aead(struct crypto_template *tmpl, struct rtattr **tb, struct crypto_attr_type *algt) { struct pcrypt_instance_ctx *ctx; struct aead_instance *inst; struct aead_alg *alg; u32 mask = crypto_algt_inherited_mask(algt); int err; inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL); if (!inst) return -ENOMEM; err = -ENOMEM; ctx = aead_instance_ctx(inst); ctx->psenc = padata_alloc_shell(pencrypt); if (!ctx->psenc) goto err_free_inst; ctx->psdec = padata_alloc_shell(pdecrypt); if (!ctx->psdec) goto err_free_inst; err = crypto_grab_aead(&ctx->spawn, aead_crypto_instance(inst), crypto_attr_alg_name(tb[1]), 0, mask); if (err) goto err_free_inst; alg = crypto_spawn_aead_alg(&ctx->spawn); err = pcrypt_init_instance(aead_crypto_instance(inst), &alg->base); if (err) goto err_free_inst; inst->alg.base.cra_flags |= CRYPTO_ALG_ASYNC; inst->alg.ivsize = crypto_aead_alg_ivsize(alg); inst->alg.maxauthsize = crypto_aead_alg_maxauthsize(alg); inst->alg.base.cra_ctxsize = sizeof(struct pcrypt_aead_ctx); inst->alg.init = pcrypt_aead_init_tfm; inst->alg.exit = pcrypt_aead_exit_tfm; inst->alg.setkey = pcrypt_aead_setkey; inst->alg.setauthsize = pcrypt_aead_setauthsize; inst->alg.encrypt = pcrypt_aead_encrypt; inst->alg.decrypt = pcrypt_aead_decrypt; inst->free = pcrypt_free; err = aead_register_instance(tmpl, inst); if (err) { err_free_inst: pcrypt_free(inst); } return err; } static int pcrypt_create(struct crypto_template *tmpl, struct rtattr **tb) { struct crypto_attr_type *algt; algt = crypto_get_attr_type(tb); if (IS_ERR(algt)) return PTR_ERR(algt); switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) { case CRYPTO_ALG_TYPE_AEAD: return pcrypt_create_aead(tmpl, tb, algt); } return -EINVAL; } static int pcrypt_sysfs_add(struct padata_instance *pinst, const char *name) { int ret; pinst->kobj.kset = pcrypt_kset; ret = kobject_add(&pinst->kobj, NULL, "%s", name); if (!ret) kobject_uevent(&pinst->kobj, KOBJ_ADD); return ret; } static int pcrypt_init_padata(struct padata_instance **pinst, const char *name) { int ret = -ENOMEM; *pinst = padata_alloc(name); if (!*pinst) return ret; ret = pcrypt_sysfs_add(*pinst, name); if (ret) padata_free(*pinst); return ret; } static struct crypto_template pcrypt_tmpl = { .name = "pcrypt", .create = pcrypt_create, .module = THIS_MODULE, }; static int __init pcrypt_init(void) { int err = -ENOMEM; pcrypt_kset = kset_create_and_add("pcrypt", NULL, kernel_kobj); if (!pcrypt_kset) goto err; err = pcrypt_init_padata(&pencrypt, "pencrypt"); if (err) goto err_unreg_kset; err = pcrypt_init_padata(&pdecrypt, "pdecrypt"); if (err) goto err_deinit_pencrypt; return crypto_register_template(&pcrypt_tmpl); err_deinit_pencrypt: padata_free(pencrypt); err_unreg_kset: kset_unregister(pcrypt_kset); err: return err; } static void __exit pcrypt_exit(void) { crypto_unregister_template(&pcrypt_tmpl); padata_free(pencrypt); padata_free(pdecrypt); kset_unregister(pcrypt_kset); } module_init(pcrypt_init); module_exit(pcrypt_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Steffen Klassert <steffen.klassert@secunet.com>"); MODULE_DESCRIPTION("Parallel crypto wrapper"); MODULE_ALIAS_CRYPTO("pcrypt"); |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 | /* SPDX-License-Identifier: GPL-2.0-only */ /* Driver for Realtek RTS5139 USB card reader * * Copyright(c) 2009-2013 Realtek Semiconductor Corp. All rights reserved. * * Author: * Roger Tseng <rogerable@realtek.com> */ #ifndef __RTSX_USB_H #define __RTSX_USB_H #include <linux/usb.h> #define DRV_NAME_RTSX_USB "rtsx_usb" #define DRV_NAME_RTSX_USB_SDMMC "rtsx_usb_sdmmc" #define DRV_NAME_RTSX_USB_MS "rtsx_usb_ms" /* related module names */ #define RTSX_USB_SD_CARD 0 #define RTSX_USB_MS_CARD 1 /* endpoint numbers */ #define EP_BULK_OUT 1 #define EP_BULK_IN 2 #define EP_INTR_IN 3 /* USB vendor requests */ #define RTSX_USB_REQ_REG_OP 0x00 #define RTSX_USB_REQ_POLL 0x02 /* miscellaneous parameters */ #define MIN_DIV_N 60 #define MAX_DIV_N 120 #define MAX_PHASE 15 #define RX_TUNING_CNT 3 #define QFN24 0 #define LQFP48 1 #define CHECK_PKG(ucr, pkg) ((ucr)->package == (pkg)) /* data structures */ struct rtsx_ucr { u16 vendor_id; u16 product_id; int package; u8 ic_version; bool is_rts5179; unsigned int cur_clk; u8 *cmd_buf; unsigned int cmd_idx; u8 *rsp_buf; struct usb_device *pusb_dev; struct usb_interface *pusb_intf; struct usb_sg_request current_sg; struct timer_list sg_timer; struct mutex dev_mutex; }; /* buffer size */ #define IOBUF_SIZE 1024 /* prototypes of exported functions */ extern int rtsx_usb_get_card_status(struct rtsx_ucr *ucr, u16 *status); extern int rtsx_usb_read_register(struct rtsx_ucr *ucr, u16 addr, u8 *data); extern int rtsx_usb_write_register(struct rtsx_ucr *ucr, u16 addr, u8 mask, u8 data); extern int rtsx_usb_ep0_write_register(struct rtsx_ucr *ucr, u16 addr, u8 mask, u8 data); extern int rtsx_usb_ep0_read_register(struct rtsx_ucr *ucr, u16 addr, u8 *data); extern void rtsx_usb_add_cmd(struct rtsx_ucr *ucr, u8 cmd_type, u16 reg_addr, u8 mask, u8 data); extern int rtsx_usb_send_cmd(struct rtsx_ucr *ucr, u8 flag, int timeout); extern int rtsx_usb_get_rsp(struct rtsx_ucr *ucr, int rsp_len, int timeout); extern int rtsx_usb_transfer_data(struct rtsx_ucr *ucr, unsigned int pipe, void *buf, unsigned int len, int use_sg, unsigned int *act_len, int timeout); extern int rtsx_usb_read_ppbuf(struct rtsx_ucr *ucr, u8 *buf, int buf_len); extern int rtsx_usb_write_ppbuf(struct rtsx_ucr *ucr, u8 *buf, int buf_len); extern int rtsx_usb_switch_clock(struct rtsx_ucr *ucr, unsigned int card_clock, u8 ssc_depth, bool initial_mode, bool double_clk, bool vpclk); extern int rtsx_usb_card_exclusive_check(struct rtsx_ucr *ucr, int card); /* card status */ #define SD_CD 0x01 #define MS_CD 0x02 #define XD_CD 0x04 #define CD_MASK (SD_CD | MS_CD | XD_CD) #define SD_WP 0x08 /* reader command field offset & parameters */ #define READ_REG_CMD 0 #define WRITE_REG_CMD 1 #define CHECK_REG_CMD 2 #define PACKET_TYPE 4 #define CNT_H 5 #define CNT_L 6 #define STAGE_FLAG 7 #define CMD_OFFSET 8 #define SEQ_WRITE_DATA_OFFSET 12 #define BATCH_CMD 0 #define SEQ_READ 1 #define SEQ_WRITE 2 #define STAGE_R 0x01 #define STAGE_DI 0x02 #define STAGE_DO 0x04 #define STAGE_MS_STATUS 0x08 #define STAGE_XD_STATUS 0x10 #define MODE_C 0x00 #define MODE_CR (STAGE_R) #define MODE_CDIR (STAGE_R | STAGE_DI) #define MODE_CDOR (STAGE_R | STAGE_DO) #define EP0_OP_SHIFT 14 #define EP0_READ_REG_CMD 2 #define EP0_WRITE_REG_CMD 3 #define rtsx_usb_cmd_hdr_tag(ucr) \ do { \ ucr->cmd_buf[0] = 'R'; \ ucr->cmd_buf[1] = 'T'; \ ucr->cmd_buf[2] = 'C'; \ ucr->cmd_buf[3] = 'R'; \ } while (0) static inline void rtsx_usb_init_cmd(struct rtsx_ucr *ucr) { rtsx_usb_cmd_hdr_tag(ucr); ucr->cmd_idx = 0; ucr->cmd_buf[PACKET_TYPE] = BATCH_CMD; } /* internal register address */ #define FPDCTL 0xFC00 #define SSC_DIV_N_0 0xFC07 #define SSC_CTL1 0xFC09 #define SSC_CTL2 0xFC0A #define CFG_MODE 0xFC0E #define CFG_MODE_1 0xFC0F #define RCCTL 0xFC14 #define SOF_WDOG 0xFC28 #define SYS_DUMMY0 0xFC30 #define MS_BLKEND 0xFD30 #define MS_READ_START 0xFD31 #define MS_READ_COUNT 0xFD32 #define MS_WRITE_START 0xFD33 #define MS_WRITE_COUNT 0xFD34 #define MS_COMMAND 0xFD35 #define MS_OLD_BLOCK_0 0xFD36 #define MS_OLD_BLOCK_1 0xFD37 #define MS_NEW_BLOCK_0 0xFD38 #define MS_NEW_BLOCK_1 0xFD39 #define MS_LOG_BLOCK_0 0xFD3A #define MS_LOG_BLOCK_1 0xFD3B #define MS_BUS_WIDTH 0xFD3C #define MS_PAGE_START 0xFD3D #define MS_PAGE_LENGTH 0xFD3E #define MS_CFG 0xFD40 #define MS_TPC 0xFD41 #define MS_TRANS_CFG 0xFD42 #define MS_TRANSFER 0xFD43 #define MS_INT_REG 0xFD44 #define MS_BYTE_CNT 0xFD45 #define MS_SECTOR_CNT_L 0xFD46 #define MS_SECTOR_CNT_H 0xFD47 #define MS_DBUS_H 0xFD48 #define CARD_DMA1_CTL 0xFD5C #define CARD_PULL_CTL1 0xFD60 #define CARD_PULL_CTL2 0xFD61 #define CARD_PULL_CTL3 0xFD62 #define CARD_PULL_CTL4 0xFD63 #define CARD_PULL_CTL5 0xFD64 #define CARD_PULL_CTL6 0xFD65 #define CARD_EXIST 0xFD6F #define CARD_INT_PEND 0xFD71 #define LDO_POWER_CFG 0xFD7B #define SD_CFG1 0xFDA0 #define SD_CFG2 0xFDA1 #define SD_CFG3 0xFDA2 #define SD_STAT1 0xFDA3 #define SD_STAT2 0xFDA4 #define SD_BUS_STAT 0xFDA5 #define SD_PAD_CTL 0xFDA6 #define SD_SAMPLE_POINT_CTL 0xFDA7 #define SD_PUSH_POINT_CTL 0xFDA8 #define SD_CMD0 0xFDA9 #define SD_CMD1 0xFDAA #define SD_CMD2 0xFDAB #define SD_CMD3 0xFDAC #define SD_CMD4 0xFDAD #define SD_CMD5 0xFDAE #define SD_BYTE_CNT_L 0xFDAF #define SD_BYTE_CNT_H 0xFDB0 #define SD_BLOCK_CNT_L 0xFDB1 #define SD_BLOCK_CNT_H 0xFDB2 #define SD_TRANSFER 0xFDB3 #define SD_CMD_STATE 0xFDB5 #define SD_DATA_STATE 0xFDB6 #define SD_VPCLK0_CTL 0xFC2A #define SD_VPCLK1_CTL 0xFC2B #define SD_DCMPS0_CTL 0xFC2C #define SD_DCMPS1_CTL 0xFC2D #define CARD_DMA1_CTL 0xFD5C #define HW_VERSION 0xFC01 #define SSC_CLK_FPGA_SEL 0xFC02 #define CLK_DIV 0xFC03 #define SFSM_ED 0xFC04 #define CD_DEGLITCH_WIDTH 0xFC20 #define CD_DEGLITCH_EN 0xFC21 #define AUTO_DELINK_EN 0xFC23 #define FPGA_PULL_CTL 0xFC1D #define CARD_CLK_SOURCE 0xFC2E #define CARD_SHARE_MODE 0xFD51 #define CARD_DRIVE_SEL 0xFD52 #define CARD_STOP 0xFD53 #define CARD_OE 0xFD54 #define CARD_AUTO_BLINK 0xFD55 #define CARD_GPIO 0xFD56 #define SD30_DRIVE_SEL 0xFD57 #define CARD_DATA_SOURCE 0xFD5D #define CARD_SELECT 0xFD5E #define CARD_CLK_EN 0xFD79 #define CARD_PWR_CTL 0xFD7A #define OCPCTL 0xFD80 #define OCPPARA1 0xFD81 #define OCPPARA2 0xFD82 #define OCPSTAT 0xFD83 #define HS_USB_STAT 0xFE01 #define HS_VCONTROL 0xFE26 #define HS_VSTAIN 0xFE27 #define HS_VLOADM 0xFE28 #define HS_VSTAOUT 0xFE29 #define MC_IRQ 0xFF00 #define MC_IRQEN 0xFF01 #define MC_FIFO_CTL 0xFF02 #define MC_FIFO_BC0 0xFF03 #define MC_FIFO_BC1 0xFF04 #define MC_FIFO_STAT 0xFF05 #define MC_FIFO_MODE 0xFF06 #define MC_FIFO_RD_PTR0 0xFF07 #define MC_FIFO_RD_PTR1 0xFF08 #define MC_DMA_CTL 0xFF10 #define MC_DMA_TC0 0xFF11 #define MC_DMA_TC1 0xFF12 #define MC_DMA_TC2 0xFF13 #define MC_DMA_TC3 0xFF14 #define MC_DMA_RST 0xFF15 #define RBUF_SIZE_MASK 0xFBFF #define RBUF_BASE 0xF000 #define PPBUF_BASE1 0xF800 #define PPBUF_BASE2 0xFA00 /* internal register value macros */ #define POWER_OFF 0x03 #define PARTIAL_POWER_ON 0x02 #define POWER_ON 0x00 #define POWER_MASK 0x03 #define LDO3318_PWR_MASK 0x0C #define LDO_ON 0x00 #define LDO_SUSPEND 0x08 #define LDO_OFF 0x0C #define DV3318_AUTO_PWR_OFF 0x10 #define FORCE_LDO_POWERB 0x60 /* LDO_POWER_CFG */ #define TUNE_SD18_MASK 0x1C #define TUNE_SD18_1V7 0x00 #define TUNE_SD18_1V8 (0x01 << 2) #define TUNE_SD18_1V9 (0x02 << 2) #define TUNE_SD18_2V0 (0x03 << 2) #define TUNE_SD18_2V7 (0x04 << 2) #define TUNE_SD18_2V8 (0x05 << 2) #define TUNE_SD18_2V9 (0x06 << 2) #define TUNE_SD18_3V3 (0x07 << 2) /* CLK_DIV */ #define CLK_CHANGE 0x80 #define CLK_DIV_1 0x00 #define CLK_DIV_2 0x01 #define CLK_DIV_4 0x02 #define CLK_DIV_8 0x03 #define SSC_POWER_MASK 0x01 #define SSC_POWER_DOWN 0x01 #define SSC_POWER_ON 0x00 #define FPGA_VER 0x80 #define HW_VER_MASK 0x0F #define EXTEND_DMA1_ASYNC_SIGNAL 0x02 /* CFG_MODE*/ #define XTAL_FREE 0x80 #define CLK_MODE_MASK 0x03 #define CLK_MODE_12M_XTAL 0x00 #define CLK_MODE_NON_XTAL 0x01 #define CLK_MODE_24M_OSC 0x02 #define CLK_MODE_48M_OSC 0x03 /* CFG_MODE_1*/ #define RTS5179 0x02 #define NYET_EN 0x01 #define NYET_MSAK 0x01 #define SD30_DRIVE_MASK 0x07 #define SD20_DRIVE_MASK 0x03 #define DISABLE_SD_CD 0x08 #define DISABLE_MS_CD 0x10 #define DISABLE_XD_CD 0x20 #define SD_CD_DEGLITCH_EN 0x01 #define MS_CD_DEGLITCH_EN 0x02 #define XD_CD_DEGLITCH_EN 0x04 #define CARD_SHARE_LQFP48 0x04 #define CARD_SHARE_QFN24 0x00 #define CARD_SHARE_LQFP_SEL 0x04 #define CARD_SHARE_XD 0x00 #define CARD_SHARE_SD 0x01 #define CARD_SHARE_MS 0x02 #define CARD_SHARE_MASK 0x03 /* SD30_DRIVE_SEL */ #define DRIVER_TYPE_A 0x05 #define DRIVER_TYPE_B 0x03 #define DRIVER_TYPE_C 0x02 #define DRIVER_TYPE_D 0x01 /* SD_BUS_STAT */ #define SD_CLK_TOGGLE_EN 0x80 #define SD_CLK_FORCE_STOP 0x40 #define SD_DAT3_STATUS 0x10 #define SD_DAT2_STATUS 0x08 #define SD_DAT1_STATUS 0x04 #define SD_DAT0_STATUS 0x02 #define SD_CMD_STATUS 0x01 /* SD_PAD_CTL */ #define SD_IO_USING_1V8 0x80 #define SD_IO_USING_3V3 0x7F #define TYPE_A_DRIVING 0x00 #define TYPE_B_DRIVING 0x01 #define TYPE_C_DRIVING 0x02 #define TYPE_D_DRIVING 0x03 /* CARD_CLK_EN */ #define SD_CLK_EN 0x04 #define MS_CLK_EN 0x08 /* CARD_SELECT */ #define SD_MOD_SEL 2 #define MS_MOD_SEL 3 /* CARD_SHARE_MODE */ #define CARD_SHARE_LQFP48 0x04 #define CARD_SHARE_QFN24 0x00 #define CARD_SHARE_LQFP_SEL 0x04 #define CARD_SHARE_XD 0x00 #define CARD_SHARE_SD 0x01 #define CARD_SHARE_MS 0x02 #define CARD_SHARE_MASK 0x03 /* SSC_CTL1 */ #define SSC_RSTB 0x80 #define SSC_8X_EN 0x40 #define SSC_FIX_FRAC 0x20 #define SSC_SEL_1M 0x00 #define SSC_SEL_2M 0x08 #define SSC_SEL_4M 0x10 #define SSC_SEL_8M 0x18 /* SSC_CTL2 */ #define SSC_DEPTH_MASK 0x03 #define SSC_DEPTH_DISALBE 0x00 #define SSC_DEPTH_2M 0x01 #define SSC_DEPTH_1M 0x02 #define SSC_DEPTH_512K 0x03 /* SD_VPCLK0_CTL */ #define PHASE_CHANGE 0x80 #define PHASE_NOT_RESET 0x40 /* SD_TRANSFER */ #define SD_TRANSFER_START 0x80 #define SD_TRANSFER_END 0x40 #define SD_STAT_IDLE 0x20 #define SD_TRANSFER_ERR 0x10 #define SD_TM_NORMAL_WRITE 0x00 #define SD_TM_AUTO_WRITE_3 0x01 #define SD_TM_AUTO_WRITE_4 0x02 #define SD_TM_AUTO_READ_3 0x05 #define SD_TM_AUTO_READ_4 0x06 #define SD_TM_CMD_RSP 0x08 #define SD_TM_AUTO_WRITE_1 0x09 #define SD_TM_AUTO_WRITE_2 0x0A #define SD_TM_NORMAL_READ 0x0C #define SD_TM_AUTO_READ_1 0x0D #define SD_TM_AUTO_READ_2 0x0E #define SD_TM_AUTO_TUNING 0x0F /* SD_CFG1 */ #define SD_CLK_DIVIDE_0 0x00 #define SD_CLK_DIVIDE_256 0xC0 #define SD_CLK_DIVIDE_128 0x80 #define SD_CLK_DIVIDE_MASK 0xC0 #define SD_BUS_WIDTH_1BIT 0x00 #define SD_BUS_WIDTH_4BIT 0x01 #define SD_BUS_WIDTH_8BIT 0x02 #define SD_ASYNC_FIFO_RST 0x10 #define SD_20_MODE 0x00 #define SD_DDR_MODE 0x04 #define SD_30_MODE 0x08 /* SD_CFG2 */ #define SD_CALCULATE_CRC7 0x00 #define SD_NO_CALCULATE_CRC7 0x80 #define SD_CHECK_CRC16 0x00 #define SD_NO_CHECK_CRC16 0x40 #define SD_WAIT_CRC_TO_EN 0x20 #define SD_WAIT_BUSY_END 0x08 #define SD_NO_WAIT_BUSY_END 0x00 #define SD_CHECK_CRC7 0x00 #define SD_NO_CHECK_CRC7 0x04 #define SD_RSP_LEN_0 0x00 #define SD_RSP_LEN_6 0x01 #define SD_RSP_LEN_17 0x02 #define SD_RSP_TYPE_R0 0x04 #define SD_RSP_TYPE_R1 0x01 #define SD_RSP_TYPE_R1b 0x09 #define SD_RSP_TYPE_R2 0x02 #define SD_RSP_TYPE_R3 0x05 #define SD_RSP_TYPE_R4 0x05 #define SD_RSP_TYPE_R5 0x01 #define SD_RSP_TYPE_R6 0x01 #define SD_RSP_TYPE_R7 0x01 /* SD_STAT1 */ #define SD_CRC7_ERR 0x80 #define SD_CRC16_ERR 0x40 #define SD_CRC_WRITE_ERR 0x20 #define SD_CRC_WRITE_ERR_MASK 0x1C #define GET_CRC_TIME_OUT 0x02 #define SD_TUNING_COMPARE_ERR 0x01 /* SD_DATA_STATE */ #define SD_DATA_IDLE 0x80 /* CARD_DATA_SOURCE */ #define PINGPONG_BUFFER 0x01 #define RING_BUFFER 0x00 /* CARD_OE */ #define SD_OUTPUT_EN 0x04 #define MS_OUTPUT_EN 0x08 /* CARD_STOP */ #define SD_STOP 0x04 #define MS_STOP 0x08 #define SD_CLR_ERR 0x40 #define MS_CLR_ERR 0x80 /* CARD_CLK_SOURCE */ #define CRC_FIX_CLK (0x00 << 0) #define CRC_VAR_CLK0 (0x01 << 0) #define CRC_VAR_CLK1 (0x02 << 0) #define SD30_FIX_CLK (0x00 << 2) #define SD30_VAR_CLK0 (0x01 << 2) #define SD30_VAR_CLK1 (0x02 << 2) #define SAMPLE_FIX_CLK (0x00 << 4) #define SAMPLE_VAR_CLK0 (0x01 << 4) #define SAMPLE_VAR_CLK1 (0x02 << 4) /* SD_SAMPLE_POINT_CTL */ #define DDR_FIX_RX_DAT 0x00 #define DDR_VAR_RX_DAT 0x80 #define DDR_FIX_RX_DAT_EDGE 0x00 #define DDR_FIX_RX_DAT_14_DELAY 0x40 #define DDR_FIX_RX_CMD 0x00 #define DDR_VAR_RX_CMD 0x20 #define DDR_FIX_RX_CMD_POS_EDGE 0x00 #define DDR_FIX_RX_CMD_14_DELAY 0x10 #define SD20_RX_POS_EDGE 0x00 #define SD20_RX_14_DELAY 0x08 #define SD20_RX_SEL_MASK 0x08 /* SD_PUSH_POINT_CTL */ #define DDR_FIX_TX_CMD_DAT 0x00 #define DDR_VAR_TX_CMD_DAT 0x80 #define DDR_FIX_TX_DAT_14_TSU 0x00 #define DDR_FIX_TX_DAT_12_TSU 0x40 #define DDR_FIX_TX_CMD_NEG_EDGE 0x00 #define DDR_FIX_TX_CMD_14_AHEAD 0x20 #define SD20_TX_NEG_EDGE 0x00 #define SD20_TX_14_AHEAD 0x10 #define SD20_TX_SEL_MASK 0x10 #define DDR_VAR_SDCLK_POL_SWAP 0x01 /* MS_CFG */ #define SAMPLE_TIME_RISING 0x00 #define SAMPLE_TIME_FALLING 0x80 #define PUSH_TIME_DEFAULT 0x00 #define PUSH_TIME_ODD 0x40 #define NO_EXTEND_TOGGLE 0x00 #define EXTEND_TOGGLE_CHK 0x20 #define MS_BUS_WIDTH_1 0x00 #define MS_BUS_WIDTH_4 0x10 #define MS_BUS_WIDTH_8 0x18 #define MS_2K_SECTOR_MODE 0x04 #define MS_512_SECTOR_MODE 0x00 #define MS_TOGGLE_TIMEOUT_EN 0x00 #define MS_TOGGLE_TIMEOUT_DISEN 0x01 #define MS_NO_CHECK_INT 0x02 /* MS_TRANS_CFG */ #define WAIT_INT 0x80 #define NO_WAIT_INT 0x00 #define NO_AUTO_READ_INT_REG 0x00 #define AUTO_READ_INT_REG 0x40 #define MS_CRC16_ERR 0x20 #define MS_RDY_TIMEOUT 0x10 #define MS_INT_CMDNK 0x08 #define MS_INT_BREQ 0x04 #define MS_INT_ERR 0x02 #define MS_INT_CED 0x01 /* MS_TRANSFER */ #define MS_TRANSFER_START 0x80 #define MS_TRANSFER_END 0x40 #define MS_TRANSFER_ERR 0x20 #define MS_BS_STATE 0x10 #define MS_TM_READ_BYTES 0x00 #define MS_TM_NORMAL_READ 0x01 #define MS_TM_WRITE_BYTES 0x04 #define MS_TM_NORMAL_WRITE 0x05 #define MS_TM_AUTO_READ 0x08 #define MS_TM_AUTO_WRITE 0x0C #define MS_TM_SET_CMD 0x06 #define MS_TM_COPY_PAGE 0x07 #define MS_TM_MULTI_READ 0x02 #define MS_TM_MULTI_WRITE 0x03 /* MC_FIFO_CTL */ #define FIFO_FLUSH 0x01 /* MC_DMA_RST */ #define DMA_RESET 0x01 /* MC_DMA_CTL */ #define DMA_TC_EQ_0 0x80 #define DMA_DIR_TO_CARD 0x00 #define DMA_DIR_FROM_CARD 0x02 #define DMA_EN 0x01 #define DMA_128 (0 << 2) #define DMA_256 (1 << 2) #define DMA_512 (2 << 2) #define DMA_1024 (3 << 2) #define DMA_PACK_SIZE_MASK 0x0C /* CARD_INT_PEND */ #define XD_INT 0x10 #define MS_INT 0x08 #define SD_INT 0x04 /* LED operations*/ static inline int rtsx_usb_turn_on_led(struct rtsx_ucr *ucr) { return rtsx_usb_ep0_write_register(ucr, CARD_GPIO, 0x03, 0x02); } static inline int rtsx_usb_turn_off_led(struct rtsx_ucr *ucr) { return rtsx_usb_ep0_write_register(ucr, CARD_GPIO, 0x03, 0x03); } /* HW error clearing */ static inline void rtsx_usb_clear_fsm_err(struct rtsx_ucr *ucr) { rtsx_usb_ep0_write_register(ucr, SFSM_ED, 0xf8, 0xf8); } static inline void rtsx_usb_clear_dma_err(struct rtsx_ucr *ucr) { rtsx_usb_ep0_write_register(ucr, MC_FIFO_CTL, FIFO_FLUSH, FIFO_FLUSH); rtsx_usb_ep0_write_register(ucr, MC_DMA_RST, DMA_RESET, DMA_RESET); } #endif /* __RTS51139_H */ |
| 25 25 253 49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2005,2006,2007,2008 IBM Corporation * * Authors: * Reiner Sailer <sailer@watson.ibm.com> * Mimi Zohar <zohar@us.ibm.com> * * File: ima.h * internal Integrity Measurement Architecture (IMA) definitions */ #ifndef __LINUX_IMA_H #define __LINUX_IMA_H #include <linux/types.h> #include <linux/crypto.h> #include <linux/fs.h> #include <linux/security.h> #include <linux/hash.h> #include <linux/tpm.h> #include <linux/audit.h> #include <crypto/hash_info.h> #include "../integrity.h" enum ima_show_type { IMA_SHOW_BINARY, IMA_SHOW_BINARY_NO_FIELD_LEN, IMA_SHOW_BINARY_OLD_STRING_FMT, IMA_SHOW_ASCII }; enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8, TPM_PCR10 = 10 }; /* digest size for IMA, fits SHA1 or MD5 */ #define IMA_DIGEST_SIZE SHA1_DIGEST_SIZE #define IMA_EVENT_NAME_LEN_MAX 255 #define IMA_HASH_BITS 10 #define IMA_MEASURE_HTABLE_SIZE (1 << IMA_HASH_BITS) #define IMA_TEMPLATE_FIELD_ID_MAX_LEN 16 #define IMA_TEMPLATE_NUM_FIELDS_MAX 15 #define IMA_TEMPLATE_IMA_NAME "ima" #define IMA_TEMPLATE_IMA_FMT "d|n" #define NR_BANKS(chip) ((chip != NULL) ? chip->nr_allocated_banks : 0) /* current content of the policy */ extern int ima_policy_flag; /* bitset of digests algorithms allowed in the setxattr hook */ extern atomic_t ima_setxattr_allowed_hash_algorithms; /* IMA hash algorithm description */ struct ima_algo_desc { struct crypto_shash *tfm; enum hash_algo algo; }; /* set during initialization */ extern int ima_hash_algo __ro_after_init; extern int ima_sha1_idx __ro_after_init; extern int ima_hash_algo_idx __ro_after_init; extern int ima_extra_slots __ro_after_init; extern struct ima_algo_desc *ima_algo_array __ro_after_init; extern int ima_appraise; extern struct tpm_chip *ima_tpm_chip; extern const char boot_aggregate_name[]; /* IMA event related data */ struct ima_event_data { struct ima_iint_cache *iint; struct file *file; const unsigned char *filename; struct evm_ima_xattr_data *xattr_value; int xattr_len; const struct modsig *modsig; const char *violation; const void *buf; int buf_len; }; /* IMA template field data definition */ struct ima_field_data { u8 *data; u32 len; }; /* IMA template field definition */ struct ima_template_field { const char field_id[IMA_TEMPLATE_FIELD_ID_MAX_LEN]; int (*field_init)(struct ima_event_data *event_data, struct ima_field_data *field_data); void (*field_show)(struct seq_file *m, enum ima_show_type show, struct ima_field_data *field_data); }; /* IMA template descriptor definition */ struct ima_template_desc { struct list_head list; char *name; char *fmt; int num_fields; const struct ima_template_field **fields; }; struct ima_template_entry { int pcr; struct tpm_digest *digests; struct ima_template_desc *template_desc; /* template descriptor */ u32 template_data_len; struct ima_field_data template_data[]; /* template related data */ }; struct ima_queue_entry { struct hlist_node hnext; /* place in hash collision list */ struct list_head later; /* place in ima_measurements list */ struct ima_template_entry *entry; }; extern struct list_head ima_measurements; /* list of all measurements */ /* Some details preceding the binary serialized measurement list */ struct ima_kexec_hdr { u16 version; u16 _reserved0; u32 _reserved1; u64 buffer_size; u64 count; }; /* IMA iint action cache flags */ #define IMA_MEASURE 0x00000001 #define IMA_MEASURED 0x00000002 #define IMA_APPRAISE 0x00000004 #define IMA_APPRAISED 0x00000008 /*#define IMA_COLLECT 0x00000010 do not use this flag */ #define IMA_COLLECTED 0x00000020 #define IMA_AUDIT 0x00000040 #define IMA_AUDITED 0x00000080 #define IMA_HASH 0x00000100 #define IMA_HASHED 0x00000200 /* IMA iint policy rule cache flags */ #define IMA_NONACTION_FLAGS 0xff000000 #define IMA_DIGSIG_REQUIRED 0x01000000 #define IMA_PERMIT_DIRECTIO 0x02000000 #define IMA_NEW_FILE 0x04000000 #define IMA_FAIL_UNVERIFIABLE_SIGS 0x10000000 #define IMA_MODSIG_ALLOWED 0x20000000 #define IMA_CHECK_BLACKLIST 0x40000000 #define IMA_VERITY_REQUIRED 0x80000000 /* Exclude non-action flags which are not rule-specific. */ #define IMA_NONACTION_RULE_FLAGS (IMA_NONACTION_FLAGS & ~IMA_NEW_FILE) #define IMA_DO_MASK (IMA_MEASURE | IMA_APPRAISE | IMA_AUDIT | \ IMA_HASH | IMA_APPRAISE_SUBMASK) #define IMA_DONE_MASK (IMA_MEASURED | IMA_APPRAISED | IMA_AUDITED | \ IMA_HASHED | IMA_COLLECTED | \ IMA_APPRAISED_SUBMASK) /* IMA iint subaction appraise cache flags */ #define IMA_FILE_APPRAISE 0x00001000 #define IMA_FILE_APPRAISED 0x00002000 #define IMA_MMAP_APPRAISE 0x00004000 #define IMA_MMAP_APPRAISED 0x00008000 #define IMA_BPRM_APPRAISE 0x00010000 #define IMA_BPRM_APPRAISED 0x00020000 #define IMA_READ_APPRAISE 0x00040000 #define IMA_READ_APPRAISED 0x00080000 #define IMA_CREDS_APPRAISE 0x00100000 #define IMA_CREDS_APPRAISED 0x00200000 #define IMA_APPRAISE_SUBMASK (IMA_FILE_APPRAISE | IMA_MMAP_APPRAISE | \ IMA_BPRM_APPRAISE | IMA_READ_APPRAISE | \ IMA_CREDS_APPRAISE) #define IMA_APPRAISED_SUBMASK (IMA_FILE_APPRAISED | IMA_MMAP_APPRAISED | \ IMA_BPRM_APPRAISED | IMA_READ_APPRAISED | \ IMA_CREDS_APPRAISED) /* IMA iint cache atomic_flags */ #define IMA_CHANGE_XATTR 0 #define IMA_UPDATE_XATTR 1 #define IMA_CHANGE_ATTR 2 #define IMA_DIGSIG 3 #define IMA_MAY_EMIT_TOMTOU 4 #define IMA_EMITTED_OPENWRITERS 5 /* IMA integrity metadata associated with an inode */ struct ima_iint_cache { struct mutex mutex; /* protects: version, flags, digest */ struct integrity_inode_attributes real_inode; unsigned long flags; unsigned long measured_pcrs; unsigned long atomic_flags; enum integrity_status ima_file_status:4; enum integrity_status ima_mmap_status:4; enum integrity_status ima_bprm_status:4; enum integrity_status ima_read_status:4; enum integrity_status ima_creds_status:4; struct ima_digest_data *ima_hash; }; extern struct lsm_blob_sizes ima_blob_sizes; static inline struct ima_iint_cache * ima_inode_get_iint(const struct inode *inode) { struct ima_iint_cache **iint_sec; if (unlikely(!inode->i_security)) return NULL; iint_sec = inode->i_security + ima_blob_sizes.lbs_inode; return *iint_sec; } static inline void ima_inode_set_iint(const struct inode *inode, struct ima_iint_cache *iint) { struct ima_iint_cache **iint_sec; if (unlikely(!inode->i_security)) return; iint_sec = inode->i_security + ima_blob_sizes.lbs_inode; *iint_sec = iint; } struct ima_iint_cache *ima_iint_find(struct inode *inode); struct ima_iint_cache *ima_inode_get(struct inode *inode); void ima_inode_free_rcu(void *inode_security); void __init ima_iintcache_init(void); extern const int read_idmap[]; #ifdef CONFIG_HAVE_IMA_KEXEC void ima_load_kexec_buffer(void); #else static inline void ima_load_kexec_buffer(void) {} #endif /* CONFIG_HAVE_IMA_KEXEC */ #ifdef CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS void ima_post_key_create_or_update(struct key *keyring, struct key *key, const void *payload, size_t plen, unsigned long flags, bool create); #endif #ifdef CONFIG_IMA_KEXEC void ima_measure_kexec_event(const char *event_name); #else static inline void ima_measure_kexec_event(const char *event_name) {} #endif /* * The default binary_runtime_measurements list format is defined as the * platform native format. The canonical format is defined as little-endian. */ extern bool ima_canonical_fmt; /* Internal IMA function definitions */ int ima_init(void); int ima_fs_init(void); int ima_add_template_entry(struct ima_template_entry *entry, int violation, const char *op, struct inode *inode, const unsigned char *filename); int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash); int ima_calc_buffer_hash(const void *buf, loff_t len, struct ima_digest_data *hash); int ima_calc_field_array_hash(struct ima_field_data *field_data, struct ima_template_entry *entry); int ima_calc_boot_aggregate(struct ima_digest_data *hash); void ima_add_violation(struct file *file, const unsigned char *filename, struct ima_iint_cache *iint, const char *op, const char *cause); int ima_init_crypto(void); void ima_putc(struct seq_file *m, void *data, int datalen); void ima_print_digest(struct seq_file *m, u8 *digest, u32 size); int template_desc_init_fields(const char *template_fmt, const struct ima_template_field ***fields, int *num_fields); struct ima_template_desc *ima_template_desc_current(void); struct ima_template_desc *ima_template_desc_buf(void); struct ima_template_desc *lookup_template_desc(const char *name); bool ima_template_has_modsig(const struct ima_template_desc *ima_template); int ima_restore_measurement_entry(struct ima_template_entry *entry); int ima_restore_measurement_list(loff_t bufsize, void *buf); int ima_measurements_show(struct seq_file *m, void *v); unsigned long ima_get_binary_runtime_size(void); int ima_init_template(void); void ima_init_template_list(void); int __init ima_init_digests(void); void __init ima_init_reboot_notifier(void); int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event, void *lsm_data); /* * used to protect h_table and sha_table */ extern spinlock_t ima_queue_lock; struct ima_h_table { atomic_long_t len; /* number of stored measurements in the list */ atomic_long_t violations; struct hlist_head queue[IMA_MEASURE_HTABLE_SIZE]; }; extern struct ima_h_table ima_htable; static inline unsigned int ima_hash_key(u8 *digest) { /* there is no point in taking a hash of part of a digest */ return (digest[0] | digest[1] << 8) % IMA_MEASURE_HTABLE_SIZE; } #define __ima_hooks(hook) \ hook(NONE, none) \ hook(FILE_CHECK, file) \ hook(MMAP_CHECK, mmap) \ hook(MMAP_CHECK_REQPROT, mmap_reqprot) \ hook(BPRM_CHECK, bprm) \ hook(CREDS_CHECK, creds) \ hook(POST_SETATTR, post_setattr) \ hook(MODULE_CHECK, module) \ hook(FIRMWARE_CHECK, firmware) \ hook(KEXEC_KERNEL_CHECK, kexec_kernel) \ hook(KEXEC_INITRAMFS_CHECK, kexec_initramfs) \ hook(POLICY_CHECK, policy) \ hook(KEXEC_CMDLINE, kexec_cmdline) \ hook(KEY_CHECK, key) \ hook(CRITICAL_DATA, critical_data) \ hook(SETXATTR_CHECK, setxattr_check) \ hook(MAX_CHECK, none) #define __ima_hook_enumify(ENUM, str) ENUM, #define __ima_stringify(arg) (#arg) #define __ima_hook_measuring_stringify(ENUM, str) \ (__ima_stringify(measuring_ ##str)), enum ima_hooks { __ima_hooks(__ima_hook_enumify) }; static const char * const ima_hooks_measure_str[] = { __ima_hooks(__ima_hook_measuring_stringify) }; static inline const char *func_measure_str(enum ima_hooks func) { if (func >= MAX_CHECK) return ima_hooks_measure_str[NONE]; return ima_hooks_measure_str[func]; } extern const char *const func_tokens[]; struct modsig; #ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS /* * To track keys that need to be measured. */ struct ima_key_entry { struct list_head list; void *payload; size_t payload_len; char *keyring_name; }; void ima_init_key_queue(void); bool ima_should_queue_key(void); bool ima_queue_key(struct key *keyring, const void *payload, size_t payload_len); void ima_process_queued_keys(void); #else static inline void ima_init_key_queue(void) {} static inline bool ima_should_queue_key(void) { return false; } static inline bool ima_queue_key(struct key *keyring, const void *payload, size_t payload_len) { return false; } static inline void ima_process_queued_keys(void) {} #endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS */ /* LIM API function definitions */ int ima_get_action(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, int mask, enum ima_hooks func, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos); int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func); int ima_collect_measurement(struct ima_iint_cache *iint, struct file *file, void *buf, loff_t size, enum hash_algo algo, struct modsig *modsig); void ima_store_measurement(struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig, int pcr, struct ima_template_desc *template_desc); int process_buffer_measurement(struct mnt_idmap *idmap, struct inode *inode, const void *buf, int size, const char *eventname, enum ima_hooks func, int pcr, const char *func_data, bool buf_hash, u8 *digest, size_t digest_len); void ima_audit_measurement(struct ima_iint_cache *iint, const unsigned char *filename); int ima_alloc_init_template(struct ima_event_data *event_data, struct ima_template_entry **entry, struct ima_template_desc *template_desc); int ima_store_template(struct ima_template_entry *entry, int violation, struct inode *inode, const unsigned char *filename, int pcr); void ima_free_template_entry(struct ima_template_entry *entry); const char *ima_d_path(const struct path *path, char **pathbuf, char *filename); /* IMA policy related functions */ int ima_match_policy(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, enum ima_hooks func, int mask, int flags, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos); void ima_init_policy(void); void ima_update_policy(void); void ima_update_policy_flags(void); ssize_t ima_parse_add_rule(char *); void ima_delete_rules(void); int ima_check_policy(void); void *ima_policy_start(struct seq_file *m, loff_t *pos); void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos); void ima_policy_stop(struct seq_file *m, void *v); int ima_policy_show(struct seq_file *m, void *v); /* Appraise integrity measurements */ #define IMA_APPRAISE_ENFORCE 0x01 #define IMA_APPRAISE_FIX 0x02 #define IMA_APPRAISE_LOG 0x04 #define IMA_APPRAISE_MODULES 0x08 #define IMA_APPRAISE_FIRMWARE 0x10 #define IMA_APPRAISE_POLICY 0x20 #define IMA_APPRAISE_KEXEC 0x40 #ifdef CONFIG_IMA_APPRAISE int ima_check_blacklist(struct ima_iint_cache *iint, const struct modsig *modsig, int pcr); int ima_appraise_measurement(enum ima_hooks func, struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig); int ima_must_appraise(struct mnt_idmap *idmap, struct inode *inode, int mask, enum ima_hooks func); void ima_update_xattr(struct ima_iint_cache *iint, struct file *file); enum integrity_status ima_get_cache_status(struct ima_iint_cache *iint, enum ima_hooks func); enum hash_algo ima_get_hash_algo(const struct evm_ima_xattr_data *xattr_value, int xattr_len); int ima_read_xattr(struct dentry *dentry, struct evm_ima_xattr_data **xattr_value, int xattr_len); void __init init_ima_appraise_lsm(const struct lsm_id *lsmid); #else static inline int ima_check_blacklist(struct ima_iint_cache *iint, const struct modsig *modsig, int pcr) { return 0; } static inline int ima_appraise_measurement(enum ima_hooks func, struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig) { return INTEGRITY_UNKNOWN; } static inline int ima_must_appraise(struct mnt_idmap *idmap, struct inode *inode, int mask, enum ima_hooks func) { return 0; } static inline void ima_update_xattr(struct ima_iint_cache *iint, struct file *file) { } static inline enum integrity_status ima_get_cache_status(struct ima_iint_cache *iint, enum ima_hooks func) { return INTEGRITY_UNKNOWN; } static inline enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data *xattr_value, int xattr_len) { return ima_hash_algo; } static inline int ima_read_xattr(struct dentry *dentry, struct evm_ima_xattr_data **xattr_value, int xattr_len) { return 0; } static inline void __init init_ima_appraise_lsm(const struct lsm_id *lsmid) { } #endif /* CONFIG_IMA_APPRAISE */ #ifdef CONFIG_IMA_APPRAISE_MODSIG int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len, struct modsig **modsig); void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size); int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo, const u8 **digest, u32 *digest_size); int ima_get_raw_modsig(const struct modsig *modsig, const void **data, u32 *data_len); void ima_free_modsig(struct modsig *modsig); #else static inline int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len, struct modsig **modsig) { return -EOPNOTSUPP; } static inline void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size) { } static inline int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo, const u8 **digest, u32 *digest_size) { return -EOPNOTSUPP; } static inline int ima_get_raw_modsig(const struct modsig *modsig, const void **data, u32 *data_len) { return -EOPNOTSUPP; } static inline void ima_free_modsig(struct modsig *modsig) { } #endif /* CONFIG_IMA_APPRAISE_MODSIG */ /* LSM based policy rules require audit */ #ifdef CONFIG_IMA_LSM_RULES #define ima_filter_rule_init security_audit_rule_init #define ima_filter_rule_free security_audit_rule_free #define ima_filter_rule_match security_audit_rule_match #else static inline int ima_filter_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule, gfp_t gfp) { return -EINVAL; } static inline void ima_filter_rule_free(void *lsmrule) { } static inline int ima_filter_rule_match(struct lsm_prop *prop, u32 field, u32 op, void *lsmrule) { return -EINVAL; } #endif /* CONFIG_IMA_LSM_RULES */ #ifdef CONFIG_IMA_READ_POLICY #define POLICY_FILE_FLAGS (S_IWUSR | S_IRUSR) #else #define POLICY_FILE_FLAGS S_IWUSR #endif /* CONFIG_IMA_READ_POLICY */ #endif /* __LINUX_IMA_H */ |
| 422 2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _DEVICE_DEVRES_H_ #define _DEVICE_DEVRES_H_ #include <linux/err.h> #include <linux/gfp_types.h> #include <linux/numa.h> #include <linux/overflow.h> #include <linux/stdarg.h> #include <linux/types.h> #include <asm/bug.h> struct device; struct device_node; struct resource; /* device resource management */ typedef void (*dr_release_t)(struct device *dev, void *res); typedef int (*dr_match_t)(struct device *dev, void *res, void *match_data); void * __malloc __devres_alloc_node(dr_release_t release, size_t size, gfp_t gfp, int nid, const char *name); #define devres_alloc(release, size, gfp) \ __devres_alloc_node(release, size, gfp, NUMA_NO_NODE, #release) #define devres_alloc_node(release, size, gfp, nid) \ __devres_alloc_node(release, size, gfp, nid, #release) void devres_for_each_res(struct device *dev, dr_release_t release, dr_match_t match, void *match_data, void (*fn)(struct device *, void *, void *), void *data); void devres_free(void *res); void devres_add(struct device *dev, void *res); void *devres_find(struct device *dev, dr_release_t release, dr_match_t match, void *match_data); void *devres_get(struct device *dev, void *new_res, dr_match_t match, void *match_data); void *devres_remove(struct device *dev, dr_release_t release, dr_match_t match, void *match_data); int devres_destroy(struct device *dev, dr_release_t release, dr_match_t match, void *match_data); int devres_release(struct device *dev, dr_release_t release, dr_match_t match, void *match_data); /* devres group */ void * __must_check devres_open_group(struct device *dev, void *id, gfp_t gfp); void devres_close_group(struct device *dev, void *id); void devres_remove_group(struct device *dev, void *id); int devres_release_group(struct device *dev, void *id); /* managed devm_k.alloc/kfree for device drivers */ void * __alloc_size(2) devm_kmalloc(struct device *dev, size_t size, gfp_t gfp); void * __must_check __realloc_size(3) devm_krealloc(struct device *dev, void *ptr, size_t size, gfp_t gfp); static inline void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp) { return devm_kmalloc(dev, size, gfp | __GFP_ZERO); } static inline void *devm_kmalloc_array(struct device *dev, size_t n, size_t size, gfp_t flags) { size_t bytes; if (unlikely(check_mul_overflow(n, size, &bytes))) return NULL; return devm_kmalloc(dev, bytes, flags); } static inline void *devm_kcalloc(struct device *dev, size_t n, size_t size, gfp_t flags) { return devm_kmalloc_array(dev, n, size, flags | __GFP_ZERO); } static inline __realloc_size(3, 4) void * __must_check devm_krealloc_array(struct device *dev, void *p, size_t new_n, size_t new_size, gfp_t flags) { size_t bytes; if (unlikely(check_mul_overflow(new_n, new_size, &bytes))) return NULL; return devm_krealloc(dev, p, bytes, flags); } void devm_kfree(struct device *dev, const void *p); void * __realloc_size(3) devm_kmemdup(struct device *dev, const void *src, size_t len, gfp_t gfp); static inline void *devm_kmemdup_array(struct device *dev, const void *src, size_t n, size_t size, gfp_t flags) { return devm_kmemdup(dev, src, size_mul(size, n), flags); } char * __malloc devm_kstrdup(struct device *dev, const char *s, gfp_t gfp); const char *devm_kstrdup_const(struct device *dev, const char *s, gfp_t gfp); char * __printf(3, 0) __malloc devm_kvasprintf(struct device *dev, gfp_t gfp, const char *fmt, va_list ap); char * __printf(3, 4) __malloc devm_kasprintf(struct device *dev, gfp_t gfp, const char *fmt, ...); unsigned long devm_get_free_pages(struct device *dev, gfp_t gfp_mask, unsigned int order); void devm_free_pages(struct device *dev, unsigned long addr); #ifdef CONFIG_HAS_IOMEM void __iomem *devm_ioremap_resource(struct device *dev, const struct resource *res); void __iomem *devm_ioremap_resource_wc(struct device *dev, const struct resource *res); void __iomem *devm_of_iomap(struct device *dev, struct device_node *node, int index, resource_size_t *size); #else static inline void __iomem *devm_ioremap_resource(struct device *dev, const struct resource *res) { return IOMEM_ERR_PTR(-EINVAL); } static inline void __iomem *devm_ioremap_resource_wc(struct device *dev, const struct resource *res) { return IOMEM_ERR_PTR(-EINVAL); } static inline void __iomem *devm_of_iomap(struct device *dev, struct device_node *node, int index, resource_size_t *size) { return IOMEM_ERR_PTR(-EINVAL); } #endif /* allows to add/remove a custom action to devres stack */ int devm_remove_action_nowarn(struct device *dev, void (*action)(void *), void *data); /** * devm_remove_action() - removes previously added custom action * @dev: Device that owns the action * @action: Function implementing the action * @data: Pointer to data passed to @action implementation * * Removes instance of @action previously added by devm_add_action(). * Both action and data should match one of the existing entries. */ static inline void devm_remove_action(struct device *dev, void (*action)(void *), void *data) { WARN_ON(devm_remove_action_nowarn(dev, action, data)); } void devm_release_action(struct device *dev, void (*action)(void *), void *data); int __devm_add_action(struct device *dev, void (*action)(void *), void *data, const char *name); #define devm_add_action(dev, action, data) \ __devm_add_action(dev, action, data, #action) static inline int __devm_add_action_or_reset(struct device *dev, void (*action)(void *), void *data, const char *name) { int ret; ret = __devm_add_action(dev, action, data, name); if (ret) action(data); return ret; } #define devm_add_action_or_reset(dev, action, data) \ __devm_add_action_or_reset(dev, action, data, #action) bool devm_is_action_added(struct device *dev, void (*action)(void *), void *data); #endif /* _DEVICE_DEVRES_H_ */ |
| 1 1 1 3 2 1 2 1 1 3 4 4 4 4 3 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Virtual NCI device simulation driver * * Copyright (C) 2020 Samsung Electronics * Bongsu Jeon <bongsu.jeon@samsung.com> */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/miscdevice.h> #include <linux/mutex.h> #include <linux/wait.h> #include <net/nfc/nci_core.h> #define IOCTL_GET_NCIDEV_IDX 0 #define VIRTUAL_NFC_PROTOCOLS (NFC_PROTO_JEWEL_MASK | \ NFC_PROTO_MIFARE_MASK | \ NFC_PROTO_FELICA_MASK | \ NFC_PROTO_ISO14443_MASK | \ NFC_PROTO_ISO14443_B_MASK | \ NFC_PROTO_ISO15693_MASK) struct virtual_nci_dev { struct nci_dev *ndev; struct mutex mtx; struct sk_buff *send_buff; struct wait_queue_head wq; bool running; }; static int virtual_nci_open(struct nci_dev *ndev) { struct virtual_nci_dev *vdev = nci_get_drvdata(ndev); vdev->running = true; return 0; } static int virtual_nci_close(struct nci_dev *ndev) { struct virtual_nci_dev *vdev = nci_get_drvdata(ndev); mutex_lock(&vdev->mtx); kfree_skb(vdev->send_buff); vdev->send_buff = NULL; vdev->running = false; mutex_unlock(&vdev->mtx); return 0; } static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb) { struct virtual_nci_dev *vdev = nci_get_drvdata(ndev); mutex_lock(&vdev->mtx); if (vdev->send_buff || !vdev->running) { mutex_unlock(&vdev->mtx); kfree_skb(skb); return -1; } vdev->send_buff = skb_copy(skb, GFP_KERNEL); if (!vdev->send_buff) { mutex_unlock(&vdev->mtx); kfree_skb(skb); return -1; } mutex_unlock(&vdev->mtx); wake_up_interruptible(&vdev->wq); consume_skb(skb); return 0; } static const struct nci_ops virtual_nci_ops = { .open = virtual_nci_open, .close = virtual_nci_close, .send = virtual_nci_send }; static ssize_t virtual_ncidev_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct virtual_nci_dev *vdev = file->private_data; size_t actual_len; mutex_lock(&vdev->mtx); while (!vdev->send_buff) { mutex_unlock(&vdev->mtx); if (wait_event_interruptible(vdev->wq, vdev->send_buff)) return -EFAULT; mutex_lock(&vdev->mtx); } actual_len = min_t(size_t, count, vdev->send_buff->len); if (copy_to_user(buf, vdev->send_buff->data, actual_len)) { mutex_unlock(&vdev->mtx); return -EFAULT; } skb_pull(vdev->send_buff, actual_len); if (vdev->send_buff->len == 0) { consume_skb(vdev->send_buff); vdev->send_buff = NULL; } mutex_unlock(&vdev->mtx); return actual_len; } static ssize_t virtual_ncidev_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct virtual_nci_dev *vdev = file->private_data; struct sk_buff *skb; skb = alloc_skb(count, GFP_KERNEL); if (!skb) return -ENOMEM; if (copy_from_user(skb_put(skb, count), buf, count)) { kfree_skb(skb); return -EFAULT; } if (strnlen(skb->data, count) != count) { kfree_skb(skb); return -EINVAL; } nci_recv_frame(vdev->ndev, skb); return count; } static int virtual_ncidev_open(struct inode *inode, struct file *file) { int ret = 0; struct virtual_nci_dev *vdev; vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); if (!vdev) return -ENOMEM; vdev->ndev = nci_allocate_device(&virtual_nci_ops, VIRTUAL_NFC_PROTOCOLS, 0, 0); if (!vdev->ndev) { kfree(vdev); return -ENOMEM; } mutex_init(&vdev->mtx); init_waitqueue_head(&vdev->wq); file->private_data = vdev; nci_set_drvdata(vdev->ndev, vdev); ret = nci_register_device(vdev->ndev); if (ret < 0) { nci_free_device(vdev->ndev); mutex_destroy(&vdev->mtx); kfree(vdev); return ret; } return 0; } static int virtual_ncidev_close(struct inode *inode, struct file *file) { struct virtual_nci_dev *vdev = file->private_data; nci_unregister_device(vdev->ndev); nci_free_device(vdev->ndev); mutex_destroy(&vdev->mtx); kfree(vdev); return 0; } static long virtual_ncidev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct virtual_nci_dev *vdev = file->private_data; const struct nfc_dev *nfc_dev = vdev->ndev->nfc_dev; void __user *p = (void __user *)arg; if (cmd != IOCTL_GET_NCIDEV_IDX) return -ENOTTY; if (copy_to_user(p, &nfc_dev->idx, sizeof(nfc_dev->idx))) return -EFAULT; return 0; } static const struct file_operations virtual_ncidev_fops = { .owner = THIS_MODULE, .read = virtual_ncidev_read, .write = virtual_ncidev_write, .open = virtual_ncidev_open, .release = virtual_ncidev_close, .unlocked_ioctl = virtual_ncidev_ioctl }; static struct miscdevice miscdev = { .minor = MISC_DYNAMIC_MINOR, .name = "virtual_nci", .fops = &virtual_ncidev_fops, .mode = 0600, }; module_misc_device(miscdev); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Virtual NCI device simulation driver"); MODULE_AUTHOR("Bongsu Jeon <bongsu.jeon@samsung.com>"); |
| 724 909 200 724 907 910 1 903 907 910 906 1 910 1 910 904 907 199 910 724 723 720 723 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2020 SiFive */ #include <linux/spinlock.h> #include <linux/mm.h> #include <linux/memory.h> #include <linux/string.h> #include <linux/uaccess.h> #include <linux/stop_machine.h> #include <asm/kprobes.h> #include <asm/cacheflush.h> #include <asm/fixmap.h> #include <asm/ftrace.h> #include <asm/text-patching.h> #include <asm/sections.h> struct patch_insn { void *addr; u32 *insns; size_t len; atomic_t cpu_count; }; int riscv_patch_in_stop_machine = false; #ifdef CONFIG_MMU static inline bool is_kernel_exittext(uintptr_t addr) { return system_state < SYSTEM_RUNNING && addr >= (uintptr_t)__exittext_begin && addr < (uintptr_t)__exittext_end; } /* * The fix_to_virt(, idx) needs a const value (not a dynamic variable of * reg-a0) or BUILD_BUG_ON failed with "idx >= __end_of_fixed_addresses". * So use '__always_inline' and 'const unsigned int fixmap' here. */ static __always_inline void *patch_map(void *addr, const unsigned int fixmap) { uintptr_t uintaddr = (uintptr_t) addr; struct page *page; if (core_kernel_text(uintaddr) || is_kernel_exittext(uintaddr)) page = phys_to_page(__pa_symbol(addr)); else if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) page = vmalloc_to_page(addr); else return addr; BUG_ON(!page); return (void *)set_fixmap_offset(fixmap, page_to_phys(page) + offset_in_page(addr)); } static void patch_unmap(int fixmap) { clear_fixmap(fixmap); } NOKPROBE_SYMBOL(patch_unmap); static int __patch_insn_set(void *addr, u8 c, size_t len) { bool across_pages = (offset_in_page(addr) + len) > PAGE_SIZE; void *waddr = addr; /* * Only two pages can be mapped at a time for writing. */ if (len + offset_in_page(addr) > 2 * PAGE_SIZE) return -EINVAL; /* * Before reaching here, it was expected to lock the text_mutex * already, so we don't need to give another lock here and could * ensure that it was safe between each cores. */ lockdep_assert_held(&text_mutex); preempt_disable(); if (across_pages) patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1); waddr = patch_map(addr, FIX_TEXT_POKE0); memset(waddr, c, len); /* * We could have just patched a function that is about to be * called so make sure we don't execute partially patched * instructions by flushing the icache as soon as possible. */ local_flush_icache_range((unsigned long)waddr, (unsigned long)waddr + len); patch_unmap(FIX_TEXT_POKE0); if (across_pages) patch_unmap(FIX_TEXT_POKE1); preempt_enable(); return 0; } NOKPROBE_SYMBOL(__patch_insn_set); static int __patch_insn_write(void *addr, const void *insn, size_t len) { bool across_pages = (offset_in_page(addr) + len) > PAGE_SIZE; void *waddr = addr; int ret; /* * Only two pages can be mapped at a time for writing. */ if (len + offset_in_page(addr) > 2 * PAGE_SIZE) return -EINVAL; /* * Before reaching here, it was expected to lock the text_mutex * already, so we don't need to give another lock here and could * ensure that it was safe between each cores. * * We're currently using stop_machine() for ftrace & kprobes, and while * that ensures text_mutex is held before installing the mappings it * does not ensure text_mutex is held by the calling thread. That's * safe but triggers a lockdep failure, so just elide it for that * specific case. */ if (!riscv_patch_in_stop_machine) lockdep_assert_held(&text_mutex); preempt_disable(); if (across_pages) patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1); waddr = patch_map(addr, FIX_TEXT_POKE0); ret = copy_to_kernel_nofault(waddr, insn, len); /* * We could have just patched a function that is about to be * called so make sure we don't execute partially patched * instructions by flushing the icache as soon as possible. */ local_flush_icache_range((unsigned long)waddr, (unsigned long)waddr + len); patch_unmap(FIX_TEXT_POKE0); if (across_pages) patch_unmap(FIX_TEXT_POKE1); preempt_enable(); return ret; } NOKPROBE_SYMBOL(__patch_insn_write); #else static int __patch_insn_set(void *addr, u8 c, size_t len) { memset(addr, c, len); return 0; } NOKPROBE_SYMBOL(__patch_insn_set); static int __patch_insn_write(void *addr, const void *insn, size_t len) { return copy_to_kernel_nofault(addr, insn, len); } NOKPROBE_SYMBOL(__patch_insn_write); #endif /* CONFIG_MMU */ static int patch_insn_set(void *addr, u8 c, size_t len) { size_t size; int ret; /* * __patch_insn_set() can only work on 2 pages at a time so call it in a * loop with len <= 2 * PAGE_SIZE. */ while (len) { size = min(len, PAGE_SIZE * 2 - offset_in_page(addr)); ret = __patch_insn_set(addr, c, size); if (ret) return ret; addr += size; len -= size; } return 0; } NOKPROBE_SYMBOL(patch_insn_set); int patch_text_set_nosync(void *addr, u8 c, size_t len) { int ret; ret = patch_insn_set(addr, c, len); if (!ret) flush_icache_range((uintptr_t)addr, (uintptr_t)addr + len); return ret; } NOKPROBE_SYMBOL(patch_text_set_nosync); int patch_insn_write(void *addr, const void *insn, size_t len) { size_t size; int ret; /* * Copy the instructions to the destination address, two pages at a time * because __patch_insn_write() can only handle len <= 2 * PAGE_SIZE. */ while (len) { size = min(len, PAGE_SIZE * 2 - offset_in_page(addr)); ret = __patch_insn_write(addr, insn, size); if (ret) return ret; addr += size; insn += size; len -= size; } return 0; } NOKPROBE_SYMBOL(patch_insn_write); int patch_text_nosync(void *addr, const void *insns, size_t len) { int ret; ret = patch_insn_write(addr, insns, len); if (!ret) flush_icache_range((uintptr_t)addr, (uintptr_t)addr + len); return ret; } NOKPROBE_SYMBOL(patch_text_nosync); static int patch_text_cb(void *data) { struct patch_insn *patch = data; int ret = 0; if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { ret = patch_insn_write(patch->addr, patch->insns, patch->len); /* * Make sure the patching store is effective *before* we * increment the counter which releases all waiting CPUs * by using the release variant of atomic increment. The * release pairs with the call to local_flush_icache_all() * on the waiting CPU. */ atomic_inc_return_release(&patch->cpu_count); } else { while (atomic_read(&patch->cpu_count) <= num_online_cpus()) cpu_relax(); local_flush_icache_all(); } return ret; } NOKPROBE_SYMBOL(patch_text_cb); int patch_text(void *addr, u32 *insns, size_t len) { int ret; struct patch_insn patch = { .addr = addr, .insns = insns, .len = len, .cpu_count = ATOMIC_INIT(0), }; /* * kprobes takes text_mutex, before calling patch_text(), but as we call * calls stop_machine(), the lockdep assertion in patch_insn_write() * gets confused by the context in which the lock is taken. * Instead, ensure the lock is held before calling stop_machine(), and * set riscv_patch_in_stop_machine to skip the check in * patch_insn_write(). */ lockdep_assert_held(&text_mutex); riscv_patch_in_stop_machine = true; ret = stop_machine_cpuslocked(patch_text_cb, &patch, cpu_online_mask); riscv_patch_in_stop_machine = false; return ret; } NOKPROBE_SYMBOL(patch_text); |
| 15 15 15 17 15 17 15 9 9 15 15 9 7 7 15 18 17 10 14 17 17 15 9 17 9 12 15 10 15 15 15 17 17 9 9 10 10 9 9 10 17 17 9 15 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 15 15 15 15 15 10 15 15 15 15 15 15 10 15 15 15 15 15 15 15 15 15 7 7 7 7 7 7 7 7 7 7 15 15 15 15 15 15 15 15 15 15 14 15 15 15 15 15 15 15 15 3 15 15 15 15 15 15 15 9 9 9 9 9 9 9 9 9 9 9 14 15 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 | // SPDX-License-Identifier: GPL-2.0-or-later /* * zsmalloc memory allocator * * Copyright (C) 2011 Nitin Gupta * Copyright (C) 2012, 2013 Minchan Kim * * This code is released using a dual license strategy: BSD/GPL * You can choose the license that better fits your requirements. * * Released under the terms of 3-clause BSD License * Released under the terms of GNU General Public License Version 2.0 */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt /* * lock ordering: * page_lock * pool->lock * class->lock * zspage->lock */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/sched.h> #include <linux/errno.h> #include <linux/highmem.h> #include <linux/string.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/sprintf.h> #include <linux/shrinker.h> #include <linux/types.h> #include <linux/debugfs.h> #include <linux/zsmalloc.h> #include <linux/zpool.h> #include <linux/fs.h> #include <linux/workqueue.h> #include "zpdesc.h" #define ZSPAGE_MAGIC 0x58 /* * This must be power of 2 and greater than or equal to sizeof(link_free). * These two conditions ensure that any 'struct link_free' itself doesn't * span more than 1 page which avoids complex case of mapping 2 pages simply * to restore link_free pointer values. */ #define ZS_ALIGN 8 #define ZS_HANDLE_SIZE (sizeof(unsigned long)) /* * Object location (<PFN>, <obj_idx>) is encoded as * a single (unsigned long) handle value. * * Note that object index <obj_idx> starts from 0. * * This is made more complicated by various memory models and PAE. */ #ifndef MAX_POSSIBLE_PHYSMEM_BITS #ifdef MAX_PHYSMEM_BITS #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS #else /* * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just * be PAGE_SHIFT */ #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG #endif #endif #define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) /* * Head in allocated object should have OBJ_ALLOCATED_TAG * to identify the object was allocated or not. * It's okay to add the status bit in the least bit because * header keeps handle which is 4byte-aligned address so we * have room for two bit at least. */ #define OBJ_ALLOCATED_TAG 1 #define OBJ_TAG_BITS 1 #define OBJ_TAG_MASK OBJ_ALLOCATED_TAG #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) #define HUGE_BITS 1 #define FULLNESS_BITS 4 #define CLASS_BITS 8 #define MAGIC_VAL_BITS 8 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL)) /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ #define ZS_MIN_ALLOC_SIZE \ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) /* each chunk includes extra space to keep handle */ #define ZS_MAX_ALLOC_SIZE PAGE_SIZE /* * On systems with 4K page size, this gives 255 size classes! There is a * trader-off here: * - Large number of size classes is potentially wasteful as free page are * spread across these classes * - Small number of size classes causes large internal fragmentation * - Probably its better to use specific size classes (empirically * determined). NOTE: all those class sizes must be set as multiple of * ZS_ALIGN to make sure link_free itself never has to span 2 pages. * * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN * (reason above) */ #define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS) #define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \ ZS_SIZE_CLASS_DELTA) + 1) /* * Pages are distinguished by the ratio of used memory (that is the ratio * of ->inuse objects to all objects that page can store). For example, * INUSE_RATIO_10 means that the ratio of used objects is > 0% and <= 10%. * * The number of fullness groups is not random. It allows us to keep * difference between the least busy page in the group (minimum permitted * number of ->inuse objects) and the most busy page (maximum permitted * number of ->inuse objects) at a reasonable value. */ enum fullness_group { ZS_INUSE_RATIO_0, ZS_INUSE_RATIO_10, /* NOTE: 8 more fullness groups here */ ZS_INUSE_RATIO_99 = 10, ZS_INUSE_RATIO_100, NR_FULLNESS_GROUPS, }; enum class_stat_type { /* NOTE: stats for 12 fullness groups here: from inuse 0 to 100 */ ZS_OBJS_ALLOCATED = NR_FULLNESS_GROUPS, ZS_OBJS_INUSE, NR_CLASS_STAT_TYPES, }; struct zs_size_stat { unsigned long objs[NR_CLASS_STAT_TYPES]; }; #ifdef CONFIG_ZSMALLOC_STAT static struct dentry *zs_stat_root; #endif static size_t huge_class_size; struct size_class { spinlock_t lock; struct list_head fullness_list[NR_FULLNESS_GROUPS]; /* * Size of objects stored in this class. Must be multiple * of ZS_ALIGN. */ int size; int objs_per_zspage; /* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */ int pages_per_zspage; unsigned int index; struct zs_size_stat stats; }; /* * Placed within free objects to form a singly linked list. * For every zspage, zspage->freeobj gives head of this list. * * This must be power of 2 and less than or equal to ZS_ALIGN */ struct link_free { union { /* * Free object index; * It's valid for non-allocated object */ unsigned long next; /* * Handle of allocated object. */ unsigned long handle; }; }; struct zs_pool { const char *name; struct size_class *size_class[ZS_SIZE_CLASSES]; struct kmem_cache *handle_cachep; struct kmem_cache *zspage_cachep; atomic_long_t pages_allocated; struct zs_pool_stats stats; /* Compact classes */ struct shrinker *shrinker; #ifdef CONFIG_ZSMALLOC_STAT struct dentry *stat_dentry; #endif #ifdef CONFIG_COMPACTION struct work_struct free_work; #endif /* protect zspage migration/compaction */ rwlock_t lock; atomic_t compaction_in_progress; }; static inline void zpdesc_set_first(struct zpdesc *zpdesc) { SetPagePrivate(zpdesc_page(zpdesc)); } static inline void zpdesc_inc_zone_page_state(struct zpdesc *zpdesc) { inc_zone_page_state(zpdesc_page(zpdesc), NR_ZSPAGES); } static inline void zpdesc_dec_zone_page_state(struct zpdesc *zpdesc) { dec_zone_page_state(zpdesc_page(zpdesc), NR_ZSPAGES); } static inline struct zpdesc *alloc_zpdesc(gfp_t gfp, const int nid) { struct page *page = alloc_pages_node(nid, gfp, 0); return page_zpdesc(page); } static inline void free_zpdesc(struct zpdesc *zpdesc) { struct page *page = zpdesc_page(zpdesc); __free_page(page); } #define ZS_PAGE_UNLOCKED 0 #define ZS_PAGE_WRLOCKED -1 struct zspage_lock { spinlock_t lock; int cnt; struct lockdep_map dep_map; }; struct zspage { struct { unsigned int huge:HUGE_BITS; unsigned int fullness:FULLNESS_BITS; unsigned int class:CLASS_BITS + 1; unsigned int magic:MAGIC_VAL_BITS; }; unsigned int inuse; unsigned int freeobj; struct zpdesc *first_zpdesc; struct list_head list; /* fullness list */ struct zs_pool *pool; struct zspage_lock zsl; }; static void zspage_lock_init(struct zspage *zspage) { static struct lock_class_key __key; struct zspage_lock *zsl = &zspage->zsl; lockdep_init_map(&zsl->dep_map, "zspage->lock", &__key, 0); spin_lock_init(&zsl->lock); zsl->cnt = ZS_PAGE_UNLOCKED; } /* * The zspage lock can be held from atomic contexts, but it needs to remain * preemptible when held for reading because it remains held outside of those * atomic contexts, otherwise we unnecessarily lose preemptibility. * * To achieve this, the following rules are enforced on readers and writers: * * - Writers are blocked by both writers and readers, while readers are only * blocked by writers (i.e. normal rwlock semantics). * * - Writers are always atomic (to allow readers to spin waiting for them). * * - Writers always use trylock (as the lock may be held be sleeping readers). * * - Readers may spin on the lock (as they can only wait for atomic writers). * * - Readers may sleep while holding the lock (as writes only use trylock). */ static void zspage_read_lock(struct zspage *zspage) { struct zspage_lock *zsl = &zspage->zsl; rwsem_acquire_read(&zsl->dep_map, 0, 0, _RET_IP_); spin_lock(&zsl->lock); zsl->cnt++; spin_unlock(&zsl->lock); lock_acquired(&zsl->dep_map, _RET_IP_); } static void zspage_read_unlock(struct zspage *zspage) { struct zspage_lock *zsl = &zspage->zsl; rwsem_release(&zsl->dep_map, _RET_IP_); spin_lock(&zsl->lock); zsl->cnt--; spin_unlock(&zsl->lock); } static __must_check bool zspage_write_trylock(struct zspage *zspage) { struct zspage_lock *zsl = &zspage->zsl; spin_lock(&zsl->lock); if (zsl->cnt == ZS_PAGE_UNLOCKED) { zsl->cnt = ZS_PAGE_WRLOCKED; rwsem_acquire(&zsl->dep_map, 0, 1, _RET_IP_); lock_acquired(&zsl->dep_map, _RET_IP_); return true; } spin_unlock(&zsl->lock); return false; } static void zspage_write_unlock(struct zspage *zspage) { struct zspage_lock *zsl = &zspage->zsl; rwsem_release(&zsl->dep_map, _RET_IP_); zsl->cnt = ZS_PAGE_UNLOCKED; spin_unlock(&zsl->lock); } /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ static void SetZsHugePage(struct zspage *zspage) { zspage->huge = 1; } static bool ZsHugePage(struct zspage *zspage) { return zspage->huge; } #ifdef CONFIG_COMPACTION static void kick_deferred_free(struct zs_pool *pool); static void init_deferred_free(struct zs_pool *pool); static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage); #else static void kick_deferred_free(struct zs_pool *pool) {} static void init_deferred_free(struct zs_pool *pool) {} static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) {} #endif static int create_cache(struct zs_pool *pool) { char *name; name = kasprintf(GFP_KERNEL, "zs_handle-%s", pool->name); if (!name) return -ENOMEM; pool->handle_cachep = kmem_cache_create(name, ZS_HANDLE_SIZE, 0, 0, NULL); kfree(name); if (!pool->handle_cachep) return -EINVAL; name = kasprintf(GFP_KERNEL, "zspage-%s", pool->name); if (!name) return -ENOMEM; pool->zspage_cachep = kmem_cache_create(name, sizeof(struct zspage), 0, 0, NULL); kfree(name); if (!pool->zspage_cachep) { kmem_cache_destroy(pool->handle_cachep); pool->handle_cachep = NULL; return -EINVAL; } return 0; } static void destroy_cache(struct zs_pool *pool) { kmem_cache_destroy(pool->handle_cachep); kmem_cache_destroy(pool->zspage_cachep); } static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) { return (unsigned long)kmem_cache_alloc(pool->handle_cachep, gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); } static void cache_free_handle(struct zs_pool *pool, unsigned long handle) { kmem_cache_free(pool->handle_cachep, (void *)handle); } static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags) { return kmem_cache_zalloc(pool->zspage_cachep, flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); } static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) { kmem_cache_free(pool->zspage_cachep, zspage); } /* class->lock(which owns the handle) synchronizes races */ static void record_obj(unsigned long handle, unsigned long obj) { *(unsigned long *)handle = obj; } /* zpool driver */ #ifdef CONFIG_ZPOOL static void *zs_zpool_create(const char *name, gfp_t gfp) { /* * Ignore global gfp flags: zs_malloc() may be invoked from * different contexts and its caller must provide a valid * gfp mask. */ return zs_create_pool(name); } static void zs_zpool_destroy(void *pool) { zs_destroy_pool(pool); } static int zs_zpool_malloc(void *pool, size_t size, gfp_t gfp, unsigned long *handle, const int nid) { *handle = zs_malloc(pool, size, gfp, nid); if (IS_ERR_VALUE(*handle)) return PTR_ERR((void *)*handle); return 0; } static void zs_zpool_free(void *pool, unsigned long handle) { zs_free(pool, handle); } static void *zs_zpool_obj_read_begin(void *pool, unsigned long handle, void *local_copy) { return zs_obj_read_begin(pool, handle, local_copy); } static void zs_zpool_obj_read_end(void *pool, unsigned long handle, void *handle_mem) { zs_obj_read_end(pool, handle, handle_mem); } static void zs_zpool_obj_write(void *pool, unsigned long handle, void *handle_mem, size_t mem_len) { zs_obj_write(pool, handle, handle_mem, mem_len); } static u64 zs_zpool_total_pages(void *pool) { return zs_get_total_pages(pool); } static struct zpool_driver zs_zpool_driver = { .type = "zsmalloc", .owner = THIS_MODULE, .create = zs_zpool_create, .destroy = zs_zpool_destroy, .malloc = zs_zpool_malloc, .free = zs_zpool_free, .obj_read_begin = zs_zpool_obj_read_begin, .obj_read_end = zs_zpool_obj_read_end, .obj_write = zs_zpool_obj_write, .total_pages = zs_zpool_total_pages, }; MODULE_ALIAS("zpool-zsmalloc"); #endif /* CONFIG_ZPOOL */ static inline bool __maybe_unused is_first_zpdesc(struct zpdesc *zpdesc) { return PagePrivate(zpdesc_page(zpdesc)); } /* Protected by class->lock */ static inline int get_zspage_inuse(struct zspage *zspage) { return zspage->inuse; } static inline void mod_zspage_inuse(struct zspage *zspage, int val) { zspage->inuse += val; } static struct zpdesc *get_first_zpdesc(struct zspage *zspage) { struct zpdesc *first_zpdesc = zspage->first_zpdesc; VM_BUG_ON_PAGE(!is_first_zpdesc(first_zpdesc), zpdesc_page(first_zpdesc)); return first_zpdesc; } #define FIRST_OBJ_PAGE_TYPE_MASK 0xffffff static inline unsigned int get_first_obj_offset(struct zpdesc *zpdesc) { VM_WARN_ON_ONCE(!PageZsmalloc(zpdesc_page(zpdesc))); return zpdesc->first_obj_offset & FIRST_OBJ_PAGE_TYPE_MASK; } static inline void set_first_obj_offset(struct zpdesc *zpdesc, unsigned int offset) { /* With 24 bits available, we can support offsets into 16 MiB pages. */ BUILD_BUG_ON(PAGE_SIZE > SZ_16M); VM_WARN_ON_ONCE(!PageZsmalloc(zpdesc_page(zpdesc))); VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK); zpdesc->first_obj_offset &= ~FIRST_OBJ_PAGE_TYPE_MASK; zpdesc->first_obj_offset |= offset & FIRST_OBJ_PAGE_TYPE_MASK; } static inline unsigned int get_freeobj(struct zspage *zspage) { return zspage->freeobj; } static inline void set_freeobj(struct zspage *zspage, unsigned int obj) { zspage->freeobj = obj; } static struct size_class *zspage_class(struct zs_pool *pool, struct zspage *zspage) { return pool->size_class[zspage->class]; } /* * zsmalloc divides the pool into various size classes where each * class maintains a list of zspages where each zspage is divided * into equal sized chunks. Each allocation falls into one of these * classes depending on its size. This function returns index of the * size class which has chunk size big enough to hold the given size. */ static int get_size_class_index(int size) { int idx = 0; if (likely(size > ZS_MIN_ALLOC_SIZE)) idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE, ZS_SIZE_CLASS_DELTA); return min_t(int, ZS_SIZE_CLASSES - 1, idx); } static inline void class_stat_add(struct size_class *class, int type, unsigned long cnt) { class->stats.objs[type] += cnt; } static inline void class_stat_sub(struct size_class *class, int type, unsigned long cnt) { class->stats.objs[type] -= cnt; } static inline unsigned long class_stat_read(struct size_class *class, int type) { return class->stats.objs[type]; } #ifdef CONFIG_ZSMALLOC_STAT static void __init zs_stat_init(void) { if (!debugfs_initialized()) { pr_warn("debugfs not available, stat dir not created\n"); return; } zs_stat_root = debugfs_create_dir("zsmalloc", NULL); } static void __exit zs_stat_exit(void) { debugfs_remove_recursive(zs_stat_root); } static unsigned long zs_can_compact(struct size_class *class); static int zs_stats_size_show(struct seq_file *s, void *v) { int i, fg; struct zs_pool *pool = s->private; struct size_class *class; int objs_per_zspage; unsigned long obj_allocated, obj_used, pages_used, freeable; unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; unsigned long total_freeable = 0; unsigned long inuse_totals[NR_FULLNESS_GROUPS] = {0, }; seq_printf(s, " %5s %5s %9s %9s %9s %9s %9s %9s %9s %9s %9s %9s %9s %13s %10s %10s %16s %8s\n", "class", "size", "10%", "20%", "30%", "40%", "50%", "60%", "70%", "80%", "90%", "99%", "100%", "obj_allocated", "obj_used", "pages_used", "pages_per_zspage", "freeable"); for (i = 0; i < ZS_SIZE_CLASSES; i++) { class = pool->size_class[i]; if (class->index != i) continue; spin_lock(&class->lock); seq_printf(s, " %5u %5u ", i, class->size); for (fg = ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) { inuse_totals[fg] += class_stat_read(class, fg); seq_printf(s, "%9lu ", class_stat_read(class, fg)); } obj_allocated = class_stat_read(class, ZS_OBJS_ALLOCATED); obj_used = class_stat_read(class, ZS_OBJS_INUSE); freeable = zs_can_compact(class); spin_unlock(&class->lock); objs_per_zspage = class->objs_per_zspage; pages_used = obj_allocated / objs_per_zspage * class->pages_per_zspage; seq_printf(s, "%13lu %10lu %10lu %16d %8lu\n", obj_allocated, obj_used, pages_used, class->pages_per_zspage, freeable); total_objs += obj_allocated; total_used_objs += obj_used; total_pages += pages_used; total_freeable += freeable; } seq_puts(s, "\n"); seq_printf(s, " %5s %5s ", "Total", ""); for (fg = ZS_INUSE_RATIO_10; fg < NR_FULLNESS_GROUPS; fg++) seq_printf(s, "%9lu ", inuse_totals[fg]); seq_printf(s, "%13lu %10lu %10lu %16s %8lu\n", total_objs, total_used_objs, total_pages, "", total_freeable); return 0; } DEFINE_SHOW_ATTRIBUTE(zs_stats_size); static void zs_pool_stat_create(struct zs_pool *pool, const char *name) { if (!zs_stat_root) { pr_warn("no root stat dir, not creating <%s> stat dir\n", name); return; } pool->stat_dentry = debugfs_create_dir(name, zs_stat_root); debugfs_create_file("classes", S_IFREG | 0444, pool->stat_dentry, pool, &zs_stats_size_fops); } static void zs_pool_stat_destroy(struct zs_pool *pool) { debugfs_remove_recursive(pool->stat_dentry); } #else /* CONFIG_ZSMALLOC_STAT */ static void __init zs_stat_init(void) { } static void __exit zs_stat_exit(void) { } static inline void zs_pool_stat_create(struct zs_pool *pool, const char *name) { } static inline void zs_pool_stat_destroy(struct zs_pool *pool) { } #endif /* * For each size class, zspages are divided into different groups * depending on their usage ratio. This function returns fullness * status of the given page. */ static int get_fullness_group(struct size_class *class, struct zspage *zspage) { int inuse, objs_per_zspage, ratio; inuse = get_zspage_inuse(zspage); objs_per_zspage = class->objs_per_zspage; if (inuse == 0) return ZS_INUSE_RATIO_0; if (inuse == objs_per_zspage) return ZS_INUSE_RATIO_100; ratio = 100 * inuse / objs_per_zspage; /* * Take integer division into consideration: a page with one inuse * object out of 127 possible, will end up having 0 usage ratio, * which is wrong as it belongs in ZS_INUSE_RATIO_10 fullness group. */ return ratio / 10 + 1; } /* * Each size class maintains various freelists and zspages are assigned * to one of these freelists based on the number of live objects they * have. This functions inserts the given zspage into the freelist * identified by <class, fullness_group>. */ static void insert_zspage(struct size_class *class, struct zspage *zspage, int fullness) { class_stat_add(class, fullness, 1); list_add(&zspage->list, &class->fullness_list[fullness]); zspage->fullness = fullness; } /* * This function removes the given zspage from the freelist identified * by <class, fullness_group>. */ static void remove_zspage(struct size_class *class, struct zspage *zspage) { int fullness = zspage->fullness; VM_BUG_ON(list_empty(&class->fullness_list[fullness])); list_del_init(&zspage->list); class_stat_sub(class, fullness, 1); } /* * Each size class maintains zspages in different fullness groups depending * on the number of live objects they contain. When allocating or freeing * objects, the fullness status of the page can change, for instance, from * INUSE_RATIO_80 to INUSE_RATIO_70 when freeing an object. This function * checks if such a status change has occurred for the given page and * accordingly moves the page from the list of the old fullness group to that * of the new fullness group. */ static int fix_fullness_group(struct size_class *class, struct zspage *zspage) { int newfg; newfg = get_fullness_group(class, zspage); if (newfg == zspage->fullness) goto out; remove_zspage(class, zspage); insert_zspage(class, zspage, newfg); out: return newfg; } static struct zspage *get_zspage(struct zpdesc *zpdesc) { struct zspage *zspage = zpdesc->zspage; BUG_ON(zspage->magic != ZSPAGE_MAGIC); return zspage; } static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc) { struct zspage *zspage = get_zspage(zpdesc); if (unlikely(ZsHugePage(zspage))) return NULL; return zpdesc->next; } /** * obj_to_location - get (<zpdesc>, <obj_idx>) from encoded object value * @obj: the encoded object value * @zpdesc: zpdesc object resides in zspage * @obj_idx: object index */ static void obj_to_location(unsigned long obj, struct zpdesc **zpdesc, unsigned int *obj_idx) { *zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS); *obj_idx = (obj & OBJ_INDEX_MASK); } static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc) { *zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS); } /** * location_to_obj - get obj value encoded from (<zpdesc>, <obj_idx>) * @zpdesc: zpdesc object resides in zspage * @obj_idx: object index */ static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx) { unsigned long obj; obj = zpdesc_pfn(zpdesc) << OBJ_INDEX_BITS; obj |= obj_idx & OBJ_INDEX_MASK; return obj; } static unsigned long handle_to_obj(unsigned long handle) { return *(unsigned long *)handle; } static inline bool obj_allocated(struct zpdesc *zpdesc, void *obj, unsigned long *phandle) { unsigned long handle; struct zspage *zspage = get_zspage(zpdesc); if (unlikely(ZsHugePage(zspage))) { VM_BUG_ON_PAGE(!is_first_zpdesc(zpdesc), zpdesc_page(zpdesc)); handle = zpdesc->handle; } else handle = *(unsigned long *)obj; if (!(handle & OBJ_ALLOCATED_TAG)) return false; /* Clear all tags before returning the handle */ *phandle = handle & ~OBJ_TAG_MASK; return true; } static void reset_zpdesc(struct zpdesc *zpdesc) { struct page *page = zpdesc_page(zpdesc); __ClearPageMovable(page); ClearPagePrivate(page); zpdesc->zspage = NULL; zpdesc->next = NULL; __ClearPageZsmalloc(page); } static int trylock_zspage(struct zspage *zspage) { struct zpdesc *cursor, *fail; for (cursor = get_first_zpdesc(zspage); cursor != NULL; cursor = get_next_zpdesc(cursor)) { if (!zpdesc_trylock(cursor)) { fail = cursor; goto unlock; } } return 1; unlock: for (cursor = get_first_zpdesc(zspage); cursor != fail; cursor = get_next_zpdesc(cursor)) zpdesc_unlock(cursor); return 0; } static void __free_zspage(struct zs_pool *pool, struct size_class *class, struct zspage *zspage) { struct zpdesc *zpdesc, *next; assert_spin_locked(&class->lock); VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(zspage->fullness != ZS_INUSE_RATIO_0); next = zpdesc = get_first_zpdesc(zspage); do { VM_BUG_ON_PAGE(!zpdesc_is_locked(zpdesc), zpdesc_page(zpdesc)); next = get_next_zpdesc(zpdesc); reset_zpdesc(zpdesc); zpdesc_unlock(zpdesc); zpdesc_dec_zone_page_state(zpdesc); zpdesc_put(zpdesc); zpdesc = next; } while (zpdesc != NULL); cache_free_zspage(pool, zspage); class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage); atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated); } static void free_zspage(struct zs_pool *pool, struct size_class *class, struct zspage *zspage) { VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(list_empty(&zspage->list)); /* * Since zs_free couldn't be sleepable, this function cannot call * lock_page. The page locks trylock_zspage got will be released * by __free_zspage. */ if (!trylock_zspage(zspage)) { kick_deferred_free(pool); return; } remove_zspage(class, zspage); __free_zspage(pool, class, zspage); } /* Initialize a newly allocated zspage */ static void init_zspage(struct size_class *class, struct zspage *zspage) { unsigned int freeobj = 1; unsigned long off = 0; struct zpdesc *zpdesc = get_first_zpdesc(zspage); while (zpdesc) { struct zpdesc *next_zpdesc; struct link_free *link; void *vaddr; set_first_obj_offset(zpdesc, off); vaddr = kmap_local_zpdesc(zpdesc); link = (struct link_free *)vaddr + off / sizeof(*link); while ((off += class->size) < PAGE_SIZE) { link->next = freeobj++ << OBJ_TAG_BITS; link += class->size / sizeof(*link); } /* * We now come to the last (full or partial) object on this * page, which must point to the first object on the next * page (if present) */ next_zpdesc = get_next_zpdesc(zpdesc); if (next_zpdesc) { link->next = freeobj++ << OBJ_TAG_BITS; } else { /* * Reset OBJ_TAG_BITS bit to last link to tell * whether it's allocated object or not. */ link->next = -1UL << OBJ_TAG_BITS; } kunmap_local(vaddr); zpdesc = next_zpdesc; off %= PAGE_SIZE; } set_freeobj(zspage, 0); } static void create_page_chain(struct size_class *class, struct zspage *zspage, struct zpdesc *zpdescs[]) { int i; struct zpdesc *zpdesc; struct zpdesc *prev_zpdesc = NULL; int nr_zpdescs = class->pages_per_zspage; /* * Allocate individual pages and link them together as: * 1. all pages are linked together using zpdesc->next * 2. each sub-page point to zspage using zpdesc->zspage * * we set PG_private to identify the first zpdesc (i.e. no other zpdesc * has this flag set). */ for (i = 0; i < nr_zpdescs; i++) { zpdesc = zpdescs[i]; zpdesc->zspage = zspage; zpdesc->next = NULL; if (i == 0) { zspage->first_zpdesc = zpdesc; zpdesc_set_first(zpdesc); if (unlikely(class->objs_per_zspage == 1 && class->pages_per_zspage == 1)) SetZsHugePage(zspage); } else { prev_zpdesc->next = zpdesc; } prev_zpdesc = zpdesc; } } /* * Allocate a zspage for the given size class */ static struct zspage *alloc_zspage(struct zs_pool *pool, struct size_class *class, gfp_t gfp, const int nid) { int i; struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE]; struct zspage *zspage = cache_alloc_zspage(pool, gfp); if (!zspage) return NULL; zspage->magic = ZSPAGE_MAGIC; zspage->pool = pool; zspage->class = class->index; zspage_lock_init(zspage); for (i = 0; i < class->pages_per_zspage; i++) { struct zpdesc *zpdesc; zpdesc = alloc_zpdesc(gfp, nid); if (!zpdesc) { while (--i >= 0) { zpdesc_dec_zone_page_state(zpdescs[i]); __zpdesc_clear_zsmalloc(zpdescs[i]); free_zpdesc(zpdescs[i]); } cache_free_zspage(pool, zspage); return NULL; } __zpdesc_set_zsmalloc(zpdesc); zpdesc_inc_zone_page_state(zpdesc); zpdescs[i] = zpdesc; } create_page_chain(class, zspage, zpdescs); init_zspage(class, zspage); return zspage; } static struct zspage *find_get_zspage(struct size_class *class) { int i; struct zspage *zspage; for (i = ZS_INUSE_RATIO_99; i >= ZS_INUSE_RATIO_0; i--) { zspage = list_first_entry_or_null(&class->fullness_list[i], struct zspage, list); if (zspage) break; } return zspage; } static bool can_merge(struct size_class *prev, int pages_per_zspage, int objs_per_zspage) { if (prev->pages_per_zspage == pages_per_zspage && prev->objs_per_zspage == objs_per_zspage) return true; return false; } static bool zspage_full(struct size_class *class, struct zspage *zspage) { return get_zspage_inuse(zspage) == class->objs_per_zspage; } static bool zspage_empty(struct zspage *zspage) { return get_zspage_inuse(zspage) == 0; } /** * zs_lookup_class_index() - Returns index of the zsmalloc &size_class * that hold objects of the provided size. * @pool: zsmalloc pool to use * @size: object size * * Context: Any context. * * Return: the index of the zsmalloc &size_class that hold objects of the * provided size. */ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size) { struct size_class *class; class = pool->size_class[get_size_class_index(size)]; return class->index; } EXPORT_SYMBOL_GPL(zs_lookup_class_index); unsigned long zs_get_total_pages(struct zs_pool *pool) { return atomic_long_read(&pool->pages_allocated); } EXPORT_SYMBOL_GPL(zs_get_total_pages); void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, void *local_copy) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; struct size_class *class; void *addr; /* Guarantee we can get zspage from handle safely */ read_lock(&pool->lock); obj = handle_to_obj(handle); obj_to_location(obj, &zpdesc, &obj_idx); zspage = get_zspage(zpdesc); /* Make sure migration doesn't move any pages in this zspage */ zspage_read_lock(zspage); read_unlock(&pool->lock); class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ addr = kmap_local_zpdesc(zpdesc); addr += off; } else { size_t sizes[2]; /* this object spans two pages */ sizes[0] = PAGE_SIZE - off; sizes[1] = class->size - sizes[0]; addr = local_copy; memcpy_from_page(addr, zpdesc_page(zpdesc), off, sizes[0]); zpdesc = get_next_zpdesc(zpdesc); memcpy_from_page(addr + sizes[0], zpdesc_page(zpdesc), 0, sizes[1]); } if (!ZsHugePage(zspage)) addr += ZS_HANDLE_SIZE; return addr; } EXPORT_SYMBOL_GPL(zs_obj_read_begin); void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, void *handle_mem) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; struct size_class *class; obj = handle_to_obj(handle); obj_to_location(obj, &zpdesc, &obj_idx); zspage = get_zspage(zpdesc); class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); if (off + class->size <= PAGE_SIZE) { if (!ZsHugePage(zspage)) off += ZS_HANDLE_SIZE; handle_mem -= off; kunmap_local(handle_mem); } zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_obj_read_end); void zs_obj_write(struct zs_pool *pool, unsigned long handle, void *handle_mem, size_t mem_len) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; struct size_class *class; /* Guarantee we can get zspage from handle safely */ read_lock(&pool->lock); obj = handle_to_obj(handle); obj_to_location(obj, &zpdesc, &obj_idx); zspage = get_zspage(zpdesc); /* Make sure migration doesn't move any pages in this zspage */ zspage_read_lock(zspage); read_unlock(&pool->lock); class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); if (!ZsHugePage(zspage)) off += ZS_HANDLE_SIZE; if (off + mem_len <= PAGE_SIZE) { /* this object is contained entirely within a page */ void *dst = kmap_local_zpdesc(zpdesc); memcpy(dst + off, handle_mem, mem_len); kunmap_local(dst); } else { /* this object spans two pages */ size_t sizes[2]; sizes[0] = PAGE_SIZE - off; sizes[1] = mem_len - sizes[0]; memcpy_to_page(zpdesc_page(zpdesc), off, handle_mem, sizes[0]); zpdesc = get_next_zpdesc(zpdesc); memcpy_to_page(zpdesc_page(zpdesc), 0, handle_mem + sizes[0], sizes[1]); } zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_obj_write); /** * zs_huge_class_size() - Returns the size (in bytes) of the first huge * zsmalloc &size_class. * @pool: zsmalloc pool to use * * The function returns the size of the first huge class - any object of equal * or bigger size will be stored in zspage consisting of a single physical * page. * * Context: Any context. * * Return: the size (in bytes) of the first huge zsmalloc &size_class. */ size_t zs_huge_class_size(struct zs_pool *pool) { return huge_class_size; } EXPORT_SYMBOL_GPL(zs_huge_class_size); static unsigned long obj_malloc(struct zs_pool *pool, struct zspage *zspage, unsigned long handle) { int i, nr_zpdesc, offset; unsigned long obj; struct link_free *link; struct size_class *class; struct zpdesc *m_zpdesc; unsigned long m_offset; void *vaddr; class = pool->size_class[zspage->class]; obj = get_freeobj(zspage); offset = obj * class->size; nr_zpdesc = offset >> PAGE_SHIFT; m_offset = offset_in_page(offset); m_zpdesc = get_first_zpdesc(zspage); for (i = 0; i < nr_zpdesc; i++) m_zpdesc = get_next_zpdesc(m_zpdesc); vaddr = kmap_local_zpdesc(m_zpdesc); link = (struct link_free *)vaddr + m_offset / sizeof(*link); set_freeobj(zspage, link->next >> OBJ_TAG_BITS); if (likely(!ZsHugePage(zspage))) /* record handle in the header of allocated chunk */ link->handle = handle | OBJ_ALLOCATED_TAG; else zspage->first_zpdesc->handle = handle | OBJ_ALLOCATED_TAG; kunmap_local(vaddr); mod_zspage_inuse(zspage, 1); obj = location_to_obj(m_zpdesc, obj); record_obj(handle, obj); return obj; } /** * zs_malloc - Allocate block of given size from pool. * @pool: pool to allocate from * @size: size of block to allocate * @gfp: gfp flags when allocating object * @nid: The preferred node id to allocate new zspage (if needed) * * On success, handle to the allocated object is returned, * otherwise an ERR_PTR(). * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail. */ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, const int nid) { unsigned long handle; struct size_class *class; int newfg; struct zspage *zspage; if (unlikely(!size)) return (unsigned long)ERR_PTR(-EINVAL); if (unlikely(size > ZS_MAX_ALLOC_SIZE)) return (unsigned long)ERR_PTR(-ENOSPC); handle = cache_alloc_handle(pool, gfp); if (!handle) return (unsigned long)ERR_PTR(-ENOMEM); /* extra space in chunk to keep the handle */ size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; /* class->lock effectively protects the zpage migration */ spin_lock(&class->lock); zspage = find_get_zspage(class); if (likely(zspage)) { obj_malloc(pool, zspage, handle); /* Now move the zspage to another fullness group, if required */ fix_fullness_group(class, zspage); class_stat_add(class, ZS_OBJS_INUSE, 1); goto out; } spin_unlock(&class->lock); zspage = alloc_zspage(pool, class, gfp, nid); if (!zspage) { cache_free_handle(pool, handle); return (unsigned long)ERR_PTR(-ENOMEM); } spin_lock(&class->lock); obj_malloc(pool, zspage, handle); newfg = get_fullness_group(class, zspage); insert_zspage(class, zspage, newfg); atomic_long_add(class->pages_per_zspage, &pool->pages_allocated); class_stat_add(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage); class_stat_add(class, ZS_OBJS_INUSE, 1); /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); out: spin_unlock(&class->lock); return handle; } EXPORT_SYMBOL_GPL(zs_malloc); static void obj_free(int class_size, unsigned long obj) { struct link_free *link; struct zspage *zspage; struct zpdesc *f_zpdesc; unsigned long f_offset; unsigned int f_objidx; void *vaddr; obj_to_location(obj, &f_zpdesc, &f_objidx); f_offset = offset_in_page(class_size * f_objidx); zspage = get_zspage(f_zpdesc); vaddr = kmap_local_zpdesc(f_zpdesc); link = (struct link_free *)(vaddr + f_offset); /* Insert this object in containing zspage's freelist */ if (likely(!ZsHugePage(zspage))) link->next = get_freeobj(zspage) << OBJ_TAG_BITS; else f_zpdesc->handle = 0; set_freeobj(zspage, f_objidx); kunmap_local(vaddr); mod_zspage_inuse(zspage, -1); } void zs_free(struct zs_pool *pool, unsigned long handle) { struct zspage *zspage; struct zpdesc *f_zpdesc; unsigned long obj; struct size_class *class; int fullness; if (IS_ERR_OR_NULL((void *)handle)) return; /* * The pool->lock protects the race with zpage's migration * so it's safe to get the page from handle. */ read_lock(&pool->lock); obj = handle_to_obj(handle); obj_to_zpdesc(obj, &f_zpdesc); zspage = get_zspage(f_zpdesc); class = zspage_class(pool, zspage); spin_lock(&class->lock); read_unlock(&pool->lock); class_stat_sub(class, ZS_OBJS_INUSE, 1); obj_free(class->size, obj); fullness = fix_fullness_group(class, zspage); if (fullness == ZS_INUSE_RATIO_0) free_zspage(pool, class, zspage); spin_unlock(&class->lock); cache_free_handle(pool, handle); } EXPORT_SYMBOL_GPL(zs_free); static void zs_object_copy(struct size_class *class, unsigned long dst, unsigned long src) { struct zpdesc *s_zpdesc, *d_zpdesc; unsigned int s_objidx, d_objidx; unsigned long s_off, d_off; void *s_addr, *d_addr; int s_size, d_size, size; int written = 0; s_size = d_size = class->size; obj_to_location(src, &s_zpdesc, &s_objidx); obj_to_location(dst, &d_zpdesc, &d_objidx); s_off = offset_in_page(class->size * s_objidx); d_off = offset_in_page(class->size * d_objidx); if (s_off + class->size > PAGE_SIZE) s_size = PAGE_SIZE - s_off; if (d_off + class->size > PAGE_SIZE) d_size = PAGE_SIZE - d_off; s_addr = kmap_local_zpdesc(s_zpdesc); d_addr = kmap_local_zpdesc(d_zpdesc); while (1) { size = min(s_size, d_size); memcpy(d_addr + d_off, s_addr + s_off, size); written += size; if (written == class->size) break; s_off += size; s_size -= size; d_off += size; d_size -= size; /* * Calling kunmap_local(d_addr) is necessary. kunmap_local() * calls must occurs in reverse order of calls to kmap_local_page(). * So, to call kunmap_local(s_addr) we should first call * kunmap_local(d_addr). For more details see * Documentation/mm/highmem.rst. */ if (s_off >= PAGE_SIZE) { kunmap_local(d_addr); kunmap_local(s_addr); s_zpdesc = get_next_zpdesc(s_zpdesc); s_addr = kmap_local_zpdesc(s_zpdesc); d_addr = kmap_local_zpdesc(d_zpdesc); s_size = class->size - written; s_off = 0; } if (d_off >= PAGE_SIZE) { kunmap_local(d_addr); d_zpdesc = get_next_zpdesc(d_zpdesc); d_addr = kmap_local_zpdesc(d_zpdesc); d_size = class->size - written; d_off = 0; } } kunmap_local(d_addr); kunmap_local(s_addr); } /* * Find alloced object in zspage from index object and * return handle. */ static unsigned long find_alloced_obj(struct size_class *class, struct zpdesc *zpdesc, int *obj_idx) { unsigned int offset; int index = *obj_idx; unsigned long handle = 0; void *addr = kmap_local_zpdesc(zpdesc); offset = get_first_obj_offset(zpdesc); offset += class->size * index; while (offset < PAGE_SIZE) { if (obj_allocated(zpdesc, addr + offset, &handle)) break; offset += class->size; index++; } kunmap_local(addr); *obj_idx = index; return handle; } static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage, struct zspage *dst_zspage) { unsigned long used_obj, free_obj; unsigned long handle; int obj_idx = 0; struct zpdesc *s_zpdesc = get_first_zpdesc(src_zspage); struct size_class *class = pool->size_class[src_zspage->class]; while (1) { handle = find_alloced_obj(class, s_zpdesc, &obj_idx); if (!handle) { s_zpdesc = get_next_zpdesc(s_zpdesc); if (!s_zpdesc) break; obj_idx = 0; continue; } used_obj = handle_to_obj(handle); free_obj = obj_malloc(pool, dst_zspage, handle); zs_object_copy(class, free_obj, used_obj); obj_idx++; obj_free(class->size, used_obj); /* Stop if there is no more space */ if (zspage_full(class, dst_zspage)) break; /* Stop if there are no more objects to migrate */ if (zspage_empty(src_zspage)) break; } } static struct zspage *isolate_src_zspage(struct size_class *class) { struct zspage *zspage; int fg; for (fg = ZS_INUSE_RATIO_10; fg <= ZS_INUSE_RATIO_99; fg++) { zspage = list_first_entry_or_null(&class->fullness_list[fg], struct zspage, list); if (zspage) { remove_zspage(class, zspage); return zspage; } } return zspage; } static struct zspage *isolate_dst_zspage(struct size_class *class) { struct zspage *zspage; int fg; for (fg = ZS_INUSE_RATIO_99; fg >= ZS_INUSE_RATIO_10; fg--) { zspage = list_first_entry_or_null(&class->fullness_list[fg], struct zspage, list); if (zspage) { remove_zspage(class, zspage); return zspage; } } return zspage; } /* * putback_zspage - add @zspage into right class's fullness list * @class: destination class * @zspage: target page * * Return @zspage's fullness status */ static int putback_zspage(struct size_class *class, struct zspage *zspage) { int fullness; fullness = get_fullness_group(class, zspage); insert_zspage(class, zspage, fullness); return fullness; } #ifdef CONFIG_COMPACTION /* * To prevent zspage destroy during migration, zspage freeing should * hold locks of all pages in the zspage. */ static void lock_zspage(struct zspage *zspage) { struct zpdesc *curr_zpdesc, *zpdesc; /* * Pages we haven't locked yet can be migrated off the list while we're * trying to lock them, so we need to be careful and only attempt to * lock each page under zspage_read_lock(). Otherwise, the page we lock * may no longer belong to the zspage. This means that we may wait for * the wrong page to unlock, so we must take a reference to the page * prior to waiting for it to unlock outside zspage_read_lock(). */ while (1) { zspage_read_lock(zspage); zpdesc = get_first_zpdesc(zspage); if (zpdesc_trylock(zpdesc)) break; zpdesc_get(zpdesc); zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); } curr_zpdesc = zpdesc; while ((zpdesc = get_next_zpdesc(curr_zpdesc))) { if (zpdesc_trylock(zpdesc)) { curr_zpdesc = zpdesc; } else { zpdesc_get(zpdesc); zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); zspage_read_lock(zspage); } } zspage_read_unlock(zspage); } #endif /* CONFIG_COMPACTION */ #ifdef CONFIG_COMPACTION static const struct movable_operations zsmalloc_mops; static void replace_sub_page(struct size_class *class, struct zspage *zspage, struct zpdesc *newzpdesc, struct zpdesc *oldzpdesc) { struct zpdesc *zpdesc; struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, }; unsigned int first_obj_offset; int idx = 0; zpdesc = get_first_zpdesc(zspage); do { if (zpdesc == oldzpdesc) zpdescs[idx] = newzpdesc; else zpdescs[idx] = zpdesc; idx++; } while ((zpdesc = get_next_zpdesc(zpdesc)) != NULL); create_page_chain(class, zspage, zpdescs); first_obj_offset = get_first_obj_offset(oldzpdesc); set_first_obj_offset(newzpdesc, first_obj_offset); if (unlikely(ZsHugePage(zspage))) newzpdesc->handle = oldzpdesc->handle; __zpdesc_set_movable(newzpdesc, &zsmalloc_mops); } static bool zs_page_isolate(struct page *page, isolate_mode_t mode) { /* * Page is locked so zspage couldn't be destroyed. For detail, look at * lock_zspage in free_zspage. */ VM_BUG_ON_PAGE(PageIsolated(page), page); return true; } static int zs_page_migrate(struct page *newpage, struct page *page, enum migrate_mode mode) { struct zs_pool *pool; struct size_class *class; struct zspage *zspage; struct zpdesc *dummy; struct zpdesc *newzpdesc = page_zpdesc(newpage); struct zpdesc *zpdesc = page_zpdesc(page); void *s_addr, *d_addr, *addr; unsigned int offset; unsigned long handle; unsigned long old_obj, new_obj; unsigned int obj_idx; VM_BUG_ON_PAGE(!zpdesc_is_isolated(zpdesc), zpdesc_page(zpdesc)); /* The page is locked, so this pointer must remain valid */ zspage = get_zspage(zpdesc); pool = zspage->pool; /* * The pool migrate_lock protects the race between zpage migration * and zs_free. */ write_lock(&pool->lock); class = zspage_class(pool, zspage); /* * the class lock protects zpage alloc/free in the zspage. */ spin_lock(&class->lock); /* the zspage write_lock protects zpage access via zs_obj_read/write() */ if (!zspage_write_trylock(zspage)) { spin_unlock(&class->lock); write_unlock(&pool->lock); return -EINVAL; } /* We're committed, tell the world that this is a Zsmalloc page. */ __zpdesc_set_zsmalloc(newzpdesc); offset = get_first_obj_offset(zpdesc); s_addr = kmap_local_zpdesc(zpdesc); /* * Here, any user cannot access all objects in the zspage so let's move. */ d_addr = kmap_local_zpdesc(newzpdesc); copy_page(d_addr, s_addr); kunmap_local(d_addr); for (addr = s_addr + offset; addr < s_addr + PAGE_SIZE; addr += class->size) { if (obj_allocated(zpdesc, addr, &handle)) { old_obj = handle_to_obj(handle); obj_to_location(old_obj, &dummy, &obj_idx); new_obj = (unsigned long)location_to_obj(newzpdesc, obj_idx); record_obj(handle, new_obj); } } kunmap_local(s_addr); replace_sub_page(class, zspage, newzpdesc, zpdesc); /* * Since we complete the data copy and set up new zspage structure, * it's okay to release migration_lock. */ write_unlock(&pool->lock); spin_unlock(&class->lock); zspage_write_unlock(zspage); zpdesc_get(newzpdesc); if (zpdesc_zone(newzpdesc) != zpdesc_zone(zpdesc)) { zpdesc_dec_zone_page_state(zpdesc); zpdesc_inc_zone_page_state(newzpdesc); } reset_zpdesc(zpdesc); zpdesc_put(zpdesc); return MIGRATEPAGE_SUCCESS; } static void zs_page_putback(struct page *page) { VM_BUG_ON_PAGE(!PageIsolated(page), page); } static const struct movable_operations zsmalloc_mops = { .isolate_page = zs_page_isolate, .migrate_page = zs_page_migrate, .putback_page = zs_page_putback, }; /* * Caller should hold page_lock of all pages in the zspage * In here, we cannot use zspage meta data. */ static void async_free_zspage(struct work_struct *work) { int i; struct size_class *class; struct zspage *zspage, *tmp; LIST_HEAD(free_pages); struct zs_pool *pool = container_of(work, struct zs_pool, free_work); for (i = 0; i < ZS_SIZE_CLASSES; i++) { class = pool->size_class[i]; if (class->index != i) continue; spin_lock(&class->lock); list_splice_init(&class->fullness_list[ZS_INUSE_RATIO_0], &free_pages); spin_unlock(&class->lock); } list_for_each_entry_safe(zspage, tmp, &free_pages, list) { list_del(&zspage->list); lock_zspage(zspage); class = zspage_class(pool, zspage); spin_lock(&class->lock); class_stat_sub(class, ZS_INUSE_RATIO_0, 1); __free_zspage(pool, class, zspage); spin_unlock(&class->lock); } }; static void kick_deferred_free(struct zs_pool *pool) { schedule_work(&pool->free_work); } static void zs_flush_migration(struct zs_pool *pool) { flush_work(&pool->free_work); } static void init_deferred_free(struct zs_pool *pool) { INIT_WORK(&pool->free_work, async_free_zspage); } static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) { struct zpdesc *zpdesc = get_first_zpdesc(zspage); do { WARN_ON(!zpdesc_trylock(zpdesc)); __zpdesc_set_movable(zpdesc, &zsmalloc_mops); zpdesc_unlock(zpdesc); } while ((zpdesc = get_next_zpdesc(zpdesc)) != NULL); } #else static inline void zs_flush_migration(struct zs_pool *pool) { } #endif /* * * Based on the number of unused allocated objects calculate * and return the number of pages that we can free. */ static unsigned long zs_can_compact(struct size_class *class) { unsigned long obj_wasted; unsigned long obj_allocated = class_stat_read(class, ZS_OBJS_ALLOCATED); unsigned long obj_used = class_stat_read(class, ZS_OBJS_INUSE); if (obj_allocated <= obj_used) return 0; obj_wasted = obj_allocated - obj_used; obj_wasted /= class->objs_per_zspage; return obj_wasted * class->pages_per_zspage; } static unsigned long __zs_compact(struct zs_pool *pool, struct size_class *class) { struct zspage *src_zspage = NULL; struct zspage *dst_zspage = NULL; unsigned long pages_freed = 0; /* * protect the race between zpage migration and zs_free * as well as zpage allocation/free */ write_lock(&pool->lock); spin_lock(&class->lock); while (zs_can_compact(class)) { int fg; if (!dst_zspage) { dst_zspage = isolate_dst_zspage(class); if (!dst_zspage) break; } src_zspage = isolate_src_zspage(class); if (!src_zspage) break; if (!zspage_write_trylock(src_zspage)) break; migrate_zspage(pool, src_zspage, dst_zspage); zspage_write_unlock(src_zspage); fg = putback_zspage(class, src_zspage); if (fg == ZS_INUSE_RATIO_0) { free_zspage(pool, class, src_zspage); pages_freed += class->pages_per_zspage; } src_zspage = NULL; if (get_fullness_group(class, dst_zspage) == ZS_INUSE_RATIO_100 || rwlock_is_contended(&pool->lock)) { putback_zspage(class, dst_zspage); dst_zspage = NULL; spin_unlock(&class->lock); write_unlock(&pool->lock); cond_resched(); write_lock(&pool->lock); spin_lock(&class->lock); } } if (src_zspage) putback_zspage(class, src_zspage); if (dst_zspage) putback_zspage(class, dst_zspage); spin_unlock(&class->lock); write_unlock(&pool->lock); return pages_freed; } unsigned long zs_compact(struct zs_pool *pool) { int i; struct size_class *class; unsigned long pages_freed = 0; /* * Pool compaction is performed under pool->lock so it is basically * single-threaded. Having more than one thread in __zs_compact() * will increase pool->lock contention, which will impact other * zsmalloc operations that need pool->lock. */ if (atomic_xchg(&pool->compaction_in_progress, 1)) return 0; for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { class = pool->size_class[i]; if (class->index != i) continue; pages_freed += __zs_compact(pool, class); } atomic_long_add(pages_freed, &pool->stats.pages_compacted); atomic_set(&pool->compaction_in_progress, 0); return pages_freed; } EXPORT_SYMBOL_GPL(zs_compact); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) { memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); } EXPORT_SYMBOL_GPL(zs_pool_stats); static unsigned long zs_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) { unsigned long pages_freed; struct zs_pool *pool = shrinker->private_data; /* * Compact classes and calculate compaction delta. * Can run concurrently with a manually triggered * (by user) compaction. */ pages_freed = zs_compact(pool); return pages_freed ? pages_freed : SHRINK_STOP; } static unsigned long zs_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc) { int i; struct size_class *class; unsigned long pages_to_free = 0; struct zs_pool *pool = shrinker->private_data; for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { class = pool->size_class[i]; if (class->index != i) continue; pages_to_free += zs_can_compact(class); } return pages_to_free; } static void zs_unregister_shrinker(struct zs_pool *pool) { shrinker_free(pool->shrinker); } static int zs_register_shrinker(struct zs_pool *pool) { pool->shrinker = shrinker_alloc(0, "mm-zspool:%s", pool->name); if (!pool->shrinker) return -ENOMEM; pool->shrinker->scan_objects = zs_shrinker_scan; pool->shrinker->count_objects = zs_shrinker_count; pool->shrinker->batch = 0; pool->shrinker->private_data = pool; shrinker_register(pool->shrinker); return 0; } static int calculate_zspage_chain_size(int class_size) { int i, min_waste = INT_MAX; int chain_size = 1; if (is_power_of_2(class_size)) return chain_size; for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) { int waste; waste = (i * PAGE_SIZE) % class_size; if (waste < min_waste) { min_waste = waste; chain_size = i; } } return chain_size; } /** * zs_create_pool - Creates an allocation pool to work from. * @name: pool name to be created * * This function must be called before anything when using * the zsmalloc allocator. * * On success, a pointer to the newly created pool is returned, * otherwise NULL. */ struct zs_pool *zs_create_pool(const char *name) { int i; struct zs_pool *pool; struct size_class *prev_class = NULL; pool = kzalloc(sizeof(*pool), GFP_KERNEL); if (!pool) return NULL; init_deferred_free(pool); rwlock_init(&pool->lock); atomic_set(&pool->compaction_in_progress, 0); pool->name = kstrdup(name, GFP_KERNEL); if (!pool->name) goto err; if (create_cache(pool)) goto err; /* * Iterate reversely, because, size of size_class that we want to use * for merging should be larger or equal to current size. */ for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { int size; int pages_per_zspage; int objs_per_zspage; struct size_class *class; int fullness; size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA; if (size > ZS_MAX_ALLOC_SIZE) size = ZS_MAX_ALLOC_SIZE; pages_per_zspage = calculate_zspage_chain_size(size); objs_per_zspage = pages_per_zspage * PAGE_SIZE / size; /* * We iterate from biggest down to smallest classes, * so huge_class_size holds the size of the first huge * class. Any object bigger than or equal to that will * endup in the huge class. */ if (pages_per_zspage != 1 && objs_per_zspage != 1 && !huge_class_size) { huge_class_size = size; /* * The object uses ZS_HANDLE_SIZE bytes to store the * handle. We need to subtract it, because zs_malloc() * unconditionally adds handle size before it performs * size class search - so object may be smaller than * huge class size, yet it still can end up in the huge * class because it grows by ZS_HANDLE_SIZE extra bytes * right before class lookup. */ huge_class_size -= (ZS_HANDLE_SIZE - 1); } /* * size_class is used for normal zsmalloc operation such * as alloc/free for that size. Although it is natural that we * have one size_class for each size, there is a chance that we * can get more memory utilization if we use one size_class for * many different sizes whose size_class have same * characteristics. So, we makes size_class point to * previous size_class if possible. */ if (prev_class) { if (can_merge(prev_class, pages_per_zspage, objs_per_zspage)) { pool->size_class[i] = prev_class; continue; } } class = kzalloc(sizeof(struct size_class), GFP_KERNEL); if (!class) goto err; class->size = size; class->index = i; class->pages_per_zspage = pages_per_zspage; class->objs_per_zspage = objs_per_zspage; spin_lock_init(&class->lock); pool->size_class[i] = class; fullness = ZS_INUSE_RATIO_0; while (fullness < NR_FULLNESS_GROUPS) { INIT_LIST_HEAD(&class->fullness_list[fullness]); fullness++; } prev_class = class; } /* debug only, don't abort if it fails */ zs_pool_stat_create(pool, name); /* * Not critical since shrinker is only used to trigger internal * defragmentation of the pool which is pretty optional thing. If * registration fails we still can use the pool normally and user can * trigger compaction manually. Thus, ignore return code. */ zs_register_shrinker(pool); return pool; err: zs_destroy_pool(pool); return NULL; } EXPORT_SYMBOL_GPL(zs_create_pool); void zs_destroy_pool(struct zs_pool *pool) { int i; zs_unregister_shrinker(pool); zs_flush_migration(pool); zs_pool_stat_destroy(pool); for (i = 0; i < ZS_SIZE_CLASSES; i++) { int fg; struct size_class *class = pool->size_class[i]; if (!class) continue; if (class->index != i) continue; for (fg = ZS_INUSE_RATIO_0; fg < NR_FULLNESS_GROUPS; fg++) { if (list_empty(&class->fullness_list[fg])) continue; pr_err("Class-%d fullness group %d is not empty\n", class->size, fg); } kfree(class); } destroy_cache(pool); kfree(pool->name); kfree(pool); } EXPORT_SYMBOL_GPL(zs_destroy_pool); static int __init zs_init(void) { #ifdef CONFIG_ZPOOL zpool_register_driver(&zs_zpool_driver); #endif zs_stat_init(); return 0; } static void __exit zs_exit(void) { #ifdef CONFIG_ZPOOL zpool_unregister_driver(&zs_zpool_driver); #endif zs_stat_exit(); } module_init(zs_init); module_exit(zs_exit); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Nitin Gupta <ngupta@vflare.org>"); MODULE_DESCRIPTION("zsmalloc memory allocator"); |
| 2 12 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | // SPDX-License-Identifier: GPL-2.0-only /* dummy.c: a dummy net driver The purpose of this driver is to provide a device to point a route through, but not to actually transmit packets. Why? If you have a machine whose only connection is an occasional PPP/SLIP/PLIP link, you can only connect to your own hostname when the link is up. Otherwise you have to use localhost. This isn't very consistent. One solution is to set up a dummy link using PPP/SLIP/PLIP, but this seems (to me) too much overhead for too little gain. This driver provides a small alternative. Thus you can do [when not running slip] ifconfig dummy slip.addr.ess.here up [to go to slip] ifconfig dummy down dip whatever This was written by looking at Donald Becker's skeleton driver and the loopback driver. I then threw away anything that didn't apply! Thanks to Alan Cox for the key clue on what to do with misguided packets. Nick Holloway, 27th May 1994 [I tweaked this explanation a little but that's all] Alan Cox, 30th May 1994 */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/ethtool.h> #include <linux/init.h> #include <linux/moduleparam.h> #include <linux/rtnetlink.h> #include <linux/net_tstamp.h> #include <net/netdev_lock.h> #include <net/rtnetlink.h> #include <linux/u64_stats_sync.h> #define DRV_NAME "dummy" static int numdummies = 1; /* fake multicast ability */ static void set_multicast_list(struct net_device *dev) { } static void dummy_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { dev_lstats_read(dev, &stats->tx_packets, &stats->tx_bytes); } static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev) { dev_lstats_add(dev, skb->len); skb_tx_timestamp(skb); dev_kfree_skb(skb); return NETDEV_TX_OK; } static int dummy_dev_init(struct net_device *dev) { dev->pcpu_stat_type = NETDEV_PCPU_STAT_LSTATS; netdev_lockdep_set_classes(dev); return 0; } static int dummy_change_carrier(struct net_device *dev, bool new_carrier) { if (new_carrier) netif_carrier_on(dev); else netif_carrier_off(dev); return 0; } static const struct net_device_ops dummy_netdev_ops = { .ndo_init = dummy_dev_init, .ndo_start_xmit = dummy_xmit, .ndo_validate_addr = eth_validate_addr, .ndo_set_rx_mode = set_multicast_list, .ndo_set_mac_address = eth_mac_addr, .ndo_get_stats64 = dummy_get_stats64, .ndo_change_carrier = dummy_change_carrier, }; static const struct ethtool_ops dummy_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, }; static void dummy_setup(struct net_device *dev) { ether_setup(dev); /* Initialize the device structure. */ dev->netdev_ops = &dummy_netdev_ops; dev->ethtool_ops = &dummy_ethtool_ops; dev->needs_free_netdev = true; dev->request_ops_lock = true; /* Fill in device structure with ethernet-generic values. */ dev->flags |= IFF_NOARP; dev->flags &= ~IFF_MULTICAST; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE; dev->lltx = true; dev->features |= NETIF_F_SG | NETIF_F_FRAGLIST; dev->features |= NETIF_F_GSO_SOFTWARE; dev->features |= NETIF_F_HW_CSUM | NETIF_F_HIGHDMA; dev->features |= NETIF_F_GSO_ENCAP_ALL; dev->hw_features |= dev->features; dev->hw_enc_features |= dev->features; eth_hw_addr_random(dev); dev->min_mtu = 0; dev->max_mtu = 0; } static int dummy_validate(struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { if (tb[IFLA_ADDRESS]) { if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) return -EINVAL; if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) return -EADDRNOTAVAIL; } return 0; } static struct rtnl_link_ops dummy_link_ops __read_mostly = { .kind = DRV_NAME, .setup = dummy_setup, .validate = dummy_validate, }; /* Number of dummy devices to be set up by this module. */ module_param(numdummies, int, 0); MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices"); static int __init dummy_init_one(void) { struct net_device *dev_dummy; int err; dev_dummy = alloc_netdev(0, "dummy%d", NET_NAME_ENUM, dummy_setup); if (!dev_dummy) return -ENOMEM; dev_dummy->rtnl_link_ops = &dummy_link_ops; err = register_netdevice(dev_dummy); if (err < 0) goto err; return 0; err: free_netdev(dev_dummy); return err; } static int __init dummy_init_module(void) { int i, err = 0; err = rtnl_link_register(&dummy_link_ops); if (err < 0) return err; rtnl_net_lock(&init_net); for (i = 0; i < numdummies && !err; i++) { err = dummy_init_one(); cond_resched(); } rtnl_net_unlock(&init_net); if (err < 0) rtnl_link_unregister(&dummy_link_ops); return err; } static void __exit dummy_cleanup_module(void) { rtnl_link_unregister(&dummy_link_ops); } module_init(dummy_init_module); module_exit(dummy_cleanup_module); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Dummy netdevice driver which discards all packets sent to it"); MODULE_ALIAS_RTNL_LINK(DRV_NAME); |
| 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/types.h> #include <linux/init.h> #include <linux/memblock.h> #include <linux/seq_file.h> static bool early_memtest_done; static phys_addr_t early_memtest_bad_size; static u64 patterns[] __initdata = { /* The first entry has to be 0 to leave memtest with zeroed memory */ 0, 0xffffffffffffffffULL, 0x5555555555555555ULL, 0xaaaaaaaaaaaaaaaaULL, 0x1111111111111111ULL, 0x2222222222222222ULL, 0x4444444444444444ULL, 0x8888888888888888ULL, 0x3333333333333333ULL, 0x6666666666666666ULL, 0x9999999999999999ULL, 0xccccccccccccccccULL, 0x7777777777777777ULL, 0xbbbbbbbbbbbbbbbbULL, 0xddddddddddddddddULL, 0xeeeeeeeeeeeeeeeeULL, 0x7a6c7258554e494cULL, /* yeah ;-) */ }; static void __init reserve_bad_mem(u64 pattern, phys_addr_t start_bad, phys_addr_t end_bad) { pr_info(" %016llx bad mem addr %pa - %pa reserved\n", cpu_to_be64(pattern), &start_bad, &end_bad); memblock_reserve(start_bad, end_bad - start_bad); early_memtest_bad_size += (end_bad - start_bad); } static void __init memtest(u64 pattern, phys_addr_t start_phys, phys_addr_t size) { u64 *p, *start, *end; phys_addr_t start_bad, last_bad; phys_addr_t start_phys_aligned; const size_t incr = sizeof(pattern); start_phys_aligned = ALIGN(start_phys, incr); start = __va(start_phys_aligned); end = start + (size - (start_phys_aligned - start_phys)) / incr; start_bad = 0; last_bad = 0; for (p = start; p < end; p++) WRITE_ONCE(*p, pattern); for (p = start; p < end; p++, start_phys_aligned += incr) { if (READ_ONCE(*p) == pattern) continue; if (start_phys_aligned == last_bad + incr) { last_bad += incr; continue; } if (start_bad) reserve_bad_mem(pattern, start_bad, last_bad + incr); start_bad = last_bad = start_phys_aligned; } if (start_bad) reserve_bad_mem(pattern, start_bad, last_bad + incr); early_memtest_done = true; } static void __init do_one_pass(u64 pattern, phys_addr_t start, phys_addr_t end) { u64 i; phys_addr_t this_start, this_end; for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &this_start, &this_end, NULL) { this_start = clamp(this_start, start, end); this_end = clamp(this_end, start, end); if (this_start < this_end) { pr_info(" %pa - %pa pattern %016llx\n", &this_start, &this_end, cpu_to_be64(pattern)); memtest(pattern, this_start, this_end - this_start); } } } /* default is disabled */ static unsigned int memtest_pattern __initdata; static int __init parse_memtest(char *arg) { int ret = 0; if (arg) ret = kstrtouint(arg, 0, &memtest_pattern); else memtest_pattern = ARRAY_SIZE(patterns); return ret; } early_param("memtest", parse_memtest); void __init early_memtest(phys_addr_t start, phys_addr_t end) { unsigned int i; unsigned int idx = 0; if (!memtest_pattern) return; pr_info("early_memtest: # of tests: %u\n", memtest_pattern); for (i = memtest_pattern-1; i < UINT_MAX; --i) { idx = i % ARRAY_SIZE(patterns); do_one_pass(patterns[idx], start, end); } } void memtest_report_meminfo(struct seq_file *m) { unsigned long early_memtest_bad_size_kb; if (!IS_ENABLED(CONFIG_PROC_FS)) return; if (!early_memtest_done) return; early_memtest_bad_size_kb = early_memtest_bad_size >> 10; if (early_memtest_bad_size && !early_memtest_bad_size_kb) early_memtest_bad_size_kb = 1; /* When 0 is reported, it means there actually was a successful test */ seq_printf(m, "EarlyMemtestBad: %5lu kB\n", early_memtest_bad_size_kb); } |
| 74 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef IOCONTEXT_H #define IOCONTEXT_H #include <linux/radix-tree.h> #include <linux/rcupdate.h> #include <linux/workqueue.h> enum { ICQ_EXITED = 1 << 2, ICQ_DESTROYED = 1 << 3, }; /* * An io_cq (icq) is association between an io_context (ioc) and a * request_queue (q). This is used by elevators which need to track * information per ioc - q pair. * * Elevator can request use of icq by setting elevator_type->icq_size and * ->icq_align. Both size and align must be larger than that of struct * io_cq and elevator can use the tail area for private information. The * recommended way to do this is defining a struct which contains io_cq as * the first member followed by private members and using its size and * align. For example, * * struct snail_io_cq { * struct io_cq icq; * int poke_snail; * int feed_snail; * }; * * struct elevator_type snail_elv_type { * .ops = { ... }, * .icq_size = sizeof(struct snail_io_cq), * .icq_align = __alignof__(struct snail_io_cq), * ... * }; * * If icq_size is set, block core will manage icq's. All requests will * have its ->elv.icq field set before elevator_ops->elevator_set_req_fn() * is called and be holding a reference to the associated io_context. * * Whenever a new icq is created, elevator_ops->elevator_init_icq_fn() is * called and, on destruction, ->elevator_exit_icq_fn(). Both functions * are called with both the associated io_context and queue locks held. * * Elevator is allowed to lookup icq using ioc_lookup_icq() while holding * queue lock but the returned icq is valid only until the queue lock is * released. Elevators can not and should not try to create or destroy * icq's. * * As icq's are linked from both ioc and q, the locking rules are a bit * complex. * * - ioc lock nests inside q lock. * * - ioc->icq_list and icq->ioc_node are protected by ioc lock. * q->icq_list and icq->q_node by q lock. * * - ioc->icq_tree and ioc->icq_hint are protected by ioc lock, while icq * itself is protected by q lock. However, both the indexes and icq * itself are also RCU managed and lookup can be performed holding only * the q lock. * * - icq's are not reference counted. They are destroyed when either the * ioc or q goes away. Each request with icq set holds an extra * reference to ioc to ensure it stays until the request is completed. * * - Linking and unlinking icq's are performed while holding both ioc and q * locks. Due to the lock ordering, q exit is simple but ioc exit * requires reverse-order double lock dance. */ struct io_cq { struct request_queue *q; struct io_context *ioc; /* * q_node and ioc_node link io_cq through icq_list of q and ioc * respectively. Both fields are unused once ioc_exit_icq() is * called and shared with __rcu_icq_cache and __rcu_head which are * used for RCU free of io_cq. */ union { struct list_head q_node; struct kmem_cache *__rcu_icq_cache; }; union { struct hlist_node ioc_node; struct rcu_head __rcu_head; }; unsigned int flags; }; /* * I/O subsystem state of the associated processes. It is refcounted * and kmalloc'ed. These could be shared between processes. */ struct io_context { atomic_long_t refcount; atomic_t active_ref; unsigned short ioprio; #ifdef CONFIG_BLK_ICQ /* all the fields below are protected by this lock */ spinlock_t lock; struct radix_tree_root icq_tree; struct io_cq __rcu *icq_hint; struct hlist_head icq_list; struct work_struct release_work; #endif /* CONFIG_BLK_ICQ */ }; struct task_struct; #ifdef CONFIG_BLOCK void put_io_context(struct io_context *ioc); void exit_io_context(struct task_struct *task); int __copy_io(unsigned long clone_flags, struct task_struct *tsk); static inline int copy_io(unsigned long clone_flags, struct task_struct *tsk) { if (!current->io_context) return 0; return __copy_io(clone_flags, tsk); } #else struct io_context; static inline void put_io_context(struct io_context *ioc) { } static inline void exit_io_context(struct task_struct *task) { } static inline int copy_io(unsigned long clone_flags, struct task_struct *tsk) { return 0; } #endif /* CONFIG_BLOCK */ #endif /* IOCONTEXT_H */ |
| 157 39 39 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Checksumming functions for IPv6 * * Authors: Jorge Cwik, <jorge@laser.satlink.net> * Arnt Gulbrandsen, <agulbra@nvg.unit.no> * Borrows very liberally from tcp.c and ip.c, see those * files for more names. */ /* * Fixes: * * Ralf Baechle : generic ipv6 checksum * <ralf@waldorf-gmbh.de> */ #ifndef _CHECKSUM_IPV6_H #define _CHECKSUM_IPV6_H #include <asm/types.h> #include <asm/byteorder.h> #include <net/ip.h> #include <asm/checksum.h> #include <linux/in6.h> #include <linux/tcp.h> #include <linux/ipv6.h> #ifndef _HAVE_ARCH_IPV6_CSUM __sum16 csum_ipv6_magic(const struct in6_addr *saddr, const struct in6_addr *daddr, __u32 len, __u8 proto, __wsum csum); #endif static inline __wsum ip6_compute_pseudo(struct sk_buff *skb, int proto) { return ~csum_unfold(csum_ipv6_magic(&ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, skb->len, proto, 0)); } static __inline__ __sum16 tcp_v6_check(int len, const struct in6_addr *saddr, const struct in6_addr *daddr, __wsum base) { return csum_ipv6_magic(saddr, daddr, len, IPPROTO_TCP, base); } static inline void __tcp_v6_send_check(struct sk_buff *skb, const struct in6_addr *saddr, const struct in6_addr *daddr) { struct tcphdr *th = tcp_hdr(skb); th->check = ~tcp_v6_check(skb->len, saddr, daddr, 0); skb->csum_start = skb_transport_header(skb) - skb->head; skb->csum_offset = offsetof(struct tcphdr, check); } static inline void tcp_v6_gso_csum_prep(struct sk_buff *skb) { struct ipv6hdr *ipv6h = ipv6_hdr(skb); struct tcphdr *th = tcp_hdr(skb); ipv6h->payload_len = 0; th->check = ~tcp_v6_check(0, &ipv6h->saddr, &ipv6h->daddr, 0); } static inline __sum16 udp_v6_check(int len, const struct in6_addr *saddr, const struct in6_addr *daddr, __wsum base) { return csum_ipv6_magic(saddr, daddr, len, IPPROTO_UDP, base); } void udp6_set_csum(bool nocheck, struct sk_buff *skb, const struct in6_addr *saddr, const struct in6_addr *daddr, int len); int udp6_csum_init(struct sk_buff *skb, struct udphdr *uh, int proto); #endif |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Kernel interface for the RISCV arch_random_* functions * * Copyright (c) 2023 Rivos Inc. * */ #ifndef ASM_RISCV_ARCHRANDOM_H #define ASM_RISCV_ARCHRANDOM_H #include <asm/csr.h> #include <asm/processor.h> #define SEED_RETRY_LOOPS 100 static inline bool __must_check csr_seed_long(unsigned long *v) { unsigned int retry = SEED_RETRY_LOOPS, valid_seeds = 0; const int needed_seeds = sizeof(long) / sizeof(u16); u16 *entropy = (u16 *)v; do { /* * The SEED CSR must be accessed with a read-write instruction. */ unsigned long csr_seed = csr_swap(CSR_SEED, 0); unsigned long opst = csr_seed & SEED_OPST_MASK; switch (opst) { case SEED_OPST_ES16: entropy[valid_seeds++] = csr_seed & SEED_ENTROPY_MASK; if (valid_seeds == needed_seeds) return true; break; case SEED_OPST_DEAD: pr_err_once("archrandom: Unrecoverable error\n"); return false; case SEED_OPST_BIST: case SEED_OPST_WAIT: default: cpu_relax(); continue; } } while (--retry); return false; } static inline size_t __must_check arch_get_random_longs(unsigned long *v, size_t max_longs) { return 0; } static inline size_t __must_check arch_get_random_seed_longs(unsigned long *v, size_t max_longs) { if (!max_longs) return 0; /* * If Zkr is supported and csr_seed_long succeeds, we return one long * worth of entropy. */ if (riscv_has_extension_likely(RISCV_ISA_EXT_ZKR) && csr_seed_long(v)) return 1; return 0; } #endif /* ASM_RISCV_ARCHRANDOM_H */ |
| 1 1 1 11 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #ifndef IOU_REQ_REF_H #define IOU_REQ_REF_H #include <linux/atomic.h> #include <linux/io_uring_types.h> /* * Shamelessly stolen from the mm implementation of page reference checking, * see commit f958d7b528b1 for details. */ #define req_ref_zero_or_close_to_overflow(req) \ ((unsigned int) atomic_read(&(req->refs)) + 127u <= 127u) static inline bool req_ref_inc_not_zero(struct io_kiocb *req) { WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT)); return atomic_inc_not_zero(&req->refs); } static inline bool req_ref_put_and_test_atomic(struct io_kiocb *req) { WARN_ON_ONCE(!(data_race(req->flags) & REQ_F_REFCOUNT)); WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req)); return atomic_dec_and_test(&req->refs); } static inline bool req_ref_put_and_test(struct io_kiocb *req) { if (likely(!(req->flags & REQ_F_REFCOUNT))) return true; WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req)); return atomic_dec_and_test(&req->refs); } static inline void req_ref_get(struct io_kiocb *req) { WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT)); WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req)); atomic_inc(&req->refs); } static inline void req_ref_put(struct io_kiocb *req) { WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT)); WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req)); atomic_dec(&req->refs); } static inline void __io_req_set_refcount(struct io_kiocb *req, int nr) { if (!(req->flags & REQ_F_REFCOUNT)) { req->flags |= REQ_F_REFCOUNT; atomic_set(&req->refs, nr); } } static inline void io_req_set_refcount(struct io_kiocb *req) { __io_req_set_refcount(req, 1); } #endif |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 | // SPDX-License-Identifier: GPL-2.0-only /* * drivers/acpi/device_pm.c - ACPI device power management routines. * * Copyright (C) 2012, Intel Corp. * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ #define pr_fmt(fmt) "PM: " fmt #include <linux/acpi.h> #include <linux/export.h> #include <linux/mutex.h> #include <linux/pm_qos.h> #include <linux/pm_domain.h> #include <linux/pm_runtime.h> #include <linux/suspend.h> #include "fan.h" #include "internal.h" /** * acpi_power_state_string - String representation of ACPI device power state. * @state: ACPI device power state to return the string representation of. */ const char *acpi_power_state_string(int state) { switch (state) { case ACPI_STATE_D0: return "D0"; case ACPI_STATE_D1: return "D1"; case ACPI_STATE_D2: return "D2"; case ACPI_STATE_D3_HOT: return "D3hot"; case ACPI_STATE_D3_COLD: return "D3cold"; default: return "(unknown)"; } } static int acpi_dev_pm_explicit_get(struct acpi_device *device, int *state) { unsigned long long psc; acpi_status status; status = acpi_evaluate_integer(device->handle, "_PSC", NULL, &psc); if (ACPI_FAILURE(status)) return -ENODEV; *state = psc; return 0; } /** * acpi_device_get_power - Get power state of an ACPI device. * @device: Device to get the power state of. * @state: Place to store the power state of the device. * * This function does not update the device's power.state field, but it may * update its parent's power.state field (when the parent's power state is * unknown and the device's power state turns out to be D0). * * Also, it does not update power resource reference counters to ensure that * the power state returned by it will be persistent and it may return a power * state shallower than previously set by acpi_device_set_power() for @device * (if that power state depends on any power resources). */ int acpi_device_get_power(struct acpi_device *device, int *state) { int result = ACPI_STATE_UNKNOWN; struct acpi_device *parent; int error; if (!device || !state) return -EINVAL; parent = acpi_dev_parent(device); if (!device->flags.power_manageable) { /* TBD: Non-recursive algorithm for walking up hierarchy. */ *state = parent ? parent->power.state : ACPI_STATE_D0; goto out; } /* * Get the device's power state from power resources settings and _PSC, * if available. */ if (device->power.flags.power_resources) { error = acpi_power_get_inferred_state(device, &result); if (error) return error; } if (device->power.flags.explicit_get) { int psc; error = acpi_dev_pm_explicit_get(device, &psc); if (error) return error; /* * The power resources settings may indicate a power state * shallower than the actual power state of the device, because * the same power resources may be referenced by other devices. * * For systems predating ACPI 4.0 we assume that D3hot is the * deepest state that can be supported. */ if (psc > result && psc < ACPI_STATE_D3_COLD) result = psc; else if (result == ACPI_STATE_UNKNOWN) result = psc > ACPI_STATE_D2 ? ACPI_STATE_D3_HOT : psc; } /* * If we were unsure about the device parent's power state up to this * point, the fact that the device is in D0 implies that the parent has * to be in D0 too, except if ignore_parent is set. */ if (!device->power.flags.ignore_parent && parent && parent->power.state == ACPI_STATE_UNKNOWN && result == ACPI_STATE_D0) parent->power.state = ACPI_STATE_D0; *state = result; out: acpi_handle_debug(device->handle, "Power state: %s\n", acpi_power_state_string(*state)); return 0; } static int acpi_dev_pm_explicit_set(struct acpi_device *adev, int state) { if (adev->power.states[state].flags.explicit_set) { char method[5] = { '_', 'P', 'S', '0' + state, '\0' }; acpi_status status; status = acpi_evaluate_object(adev->handle, method, NULL, NULL); if (ACPI_FAILURE(status)) return -ENODEV; } return 0; } /** * acpi_device_set_power - Set power state of an ACPI device. * @device: Device to set the power state of. * @state: New power state to set. * * Callers must ensure that the device is power manageable before using this * function. */ int acpi_device_set_power(struct acpi_device *device, int state) { int target_state = state; int result = 0; if (!device || !device->flags.power_manageable || (state < ACPI_STATE_D0) || (state > ACPI_STATE_D3_COLD)) return -EINVAL; acpi_handle_debug(device->handle, "Power state change: %s -> %s\n", acpi_power_state_string(device->power.state), acpi_power_state_string(state)); /* Make sure this is a valid target state */ /* There is a special case for D0 addressed below. */ if (state > ACPI_STATE_D0 && state == device->power.state) goto no_change; if (state == ACPI_STATE_D3_COLD) { /* * For transitions to D3cold we need to execute _PS3 and then * possibly drop references to the power resources in use. */ state = ACPI_STATE_D3_HOT; /* If D3cold is not supported, use D3hot as the target state. */ if (!device->power.states[ACPI_STATE_D3_COLD].flags.valid) target_state = state; } else if (!device->power.states[state].flags.valid) { acpi_handle_debug(device->handle, "Power state %s not supported\n", acpi_power_state_string(state)); return -ENODEV; } if (!device->power.flags.ignore_parent) { struct acpi_device *parent; parent = acpi_dev_parent(device); if (parent && state < parent->power.state) { acpi_handle_debug(device->handle, "Cannot transition to %s for parent in %s\n", acpi_power_state_string(state), acpi_power_state_string(parent->power.state)); return -ENODEV; } } /* * Transition Power * ---------------- * In accordance with ACPI 6, _PSx is executed before manipulating power * resources, unless the target state is D0, in which case _PS0 is * supposed to be executed after turning the power resources on. */ if (state > ACPI_STATE_D0) { /* * According to ACPI 6, devices cannot go from lower-power * (deeper) states to higher-power (shallower) states. */ if (state < device->power.state) { acpi_handle_debug(device->handle, "Cannot transition from %s to %s\n", acpi_power_state_string(device->power.state), acpi_power_state_string(state)); return -ENODEV; } /* * If the device goes from D3hot to D3cold, _PS3 has been * evaluated for it already, so skip it in that case. */ if (device->power.state < ACPI_STATE_D3_HOT) { result = acpi_dev_pm_explicit_set(device, state); if (result) goto end; } if (device->power.flags.power_resources) result = acpi_power_transition(device, target_state); } else { int cur_state = device->power.state; if (device->power.flags.power_resources) { result = acpi_power_transition(device, ACPI_STATE_D0); if (result) goto end; } if (cur_state == ACPI_STATE_D0) { int psc; /* Nothing to do here if _PSC is not present. */ if (!device->power.flags.explicit_get) goto no_change; /* * The power state of the device was set to D0 last * time, but that might have happened before a * system-wide transition involving the platform * firmware, so it may be necessary to evaluate _PS0 * for the device here. However, use extra care here * and evaluate _PSC to check the device's current power * state, and only invoke _PS0 if the evaluation of _PSC * is successful and it returns a power state different * from D0. */ result = acpi_dev_pm_explicit_get(device, &psc); if (result || psc == ACPI_STATE_D0) goto no_change; } result = acpi_dev_pm_explicit_set(device, ACPI_STATE_D0); } end: if (result) { acpi_handle_debug(device->handle, "Failed to change power state to %s\n", acpi_power_state_string(target_state)); } else { device->power.state = target_state; acpi_handle_debug(device->handle, "Power state changed to %s\n", acpi_power_state_string(target_state)); } return result; no_change: acpi_handle_debug(device->handle, "Already in %s\n", acpi_power_state_string(state)); return 0; } EXPORT_SYMBOL(acpi_device_set_power); int acpi_bus_set_power(acpi_handle handle, int state) { struct acpi_device *device = acpi_fetch_acpi_dev(handle); if (device) return acpi_device_set_power(device, state); return -ENODEV; } EXPORT_SYMBOL(acpi_bus_set_power); int acpi_bus_init_power(struct acpi_device *device) { int state; int result; if (!device) return -EINVAL; device->power.state = ACPI_STATE_UNKNOWN; if (!acpi_device_is_present(device)) { device->flags.initialized = false; return -ENXIO; } result = acpi_device_get_power(device, &state); if (result) return result; if (state < ACPI_STATE_D3_COLD && device->power.flags.power_resources) { /* Reference count the power resources. */ result = acpi_power_on_resources(device, state); if (result) return result; if (state == ACPI_STATE_D0) { /* * If _PSC is not present and the state inferred from * power resources appears to be D0, it still may be * necessary to execute _PS0 at this point, because * another device using the same power resources may * have been put into D0 previously and that's why we * see D0 here. */ result = acpi_dev_pm_explicit_set(device, state); if (result) return result; } } else if (state == ACPI_STATE_UNKNOWN) { /* * No power resources and missing _PSC? Cross fingers and make * it D0 in hope that this is what the BIOS put the device into. * [We tried to force D0 here by executing _PS0, but that broke * Toshiba P870-303 in a nasty way.] */ state = ACPI_STATE_D0; } device->power.state = state; return 0; } /** * acpi_device_fix_up_power - Force device with missing _PSC into D0. * @device: Device object whose power state is to be fixed up. * * Devices without power resources and _PSC, but having _PS0 and _PS3 defined, * are assumed to be put into D0 by the BIOS. However, in some cases that may * not be the case and this function should be used then. */ int acpi_device_fix_up_power(struct acpi_device *device) { int ret = 0; if (!device->power.flags.power_resources && !device->power.flags.explicit_get && device->power.state == ACPI_STATE_D0) ret = acpi_dev_pm_explicit_set(device, ACPI_STATE_D0); return ret; } EXPORT_SYMBOL_GPL(acpi_device_fix_up_power); static int fix_up_power_if_applicable(struct acpi_device *adev, void *not_used) { if (adev->status.present && adev->status.enabled) acpi_device_fix_up_power(adev); return 0; } /** * acpi_device_fix_up_power_extended - Force device and its children into D0. * @adev: Parent device object whose power state is to be fixed up. * * Call acpi_device_fix_up_power() for @adev and its children so long as they * are reported as present and enabled. */ void acpi_device_fix_up_power_extended(struct acpi_device *adev) { acpi_device_fix_up_power(adev); acpi_dev_for_each_child(adev, fix_up_power_if_applicable, NULL); } EXPORT_SYMBOL_GPL(acpi_device_fix_up_power_extended); /** * acpi_device_fix_up_power_children - Force a device's children into D0. * @adev: Parent device object whose children's power state is to be fixed up. * * Call acpi_device_fix_up_power() for @adev's children so long as they * are reported as present and enabled. */ void acpi_device_fix_up_power_children(struct acpi_device *adev) { acpi_dev_for_each_child(adev, fix_up_power_if_applicable, NULL); } EXPORT_SYMBOL_GPL(acpi_device_fix_up_power_children); int acpi_device_update_power(struct acpi_device *device, int *state_p) { int state; int result; if (device->power.state == ACPI_STATE_UNKNOWN) { result = acpi_bus_init_power(device); if (!result && state_p) *state_p = device->power.state; return result; } result = acpi_device_get_power(device, &state); if (result) return result; if (state == ACPI_STATE_UNKNOWN) { state = ACPI_STATE_D0; result = acpi_device_set_power(device, state); if (result) return result; } else { if (device->power.flags.power_resources) { /* * We don't need to really switch the state, bu we need * to update the power resources' reference counters. */ result = acpi_power_transition(device, state); if (result) return result; } device->power.state = state; } if (state_p) *state_p = state; return 0; } EXPORT_SYMBOL_GPL(acpi_device_update_power); int acpi_bus_update_power(acpi_handle handle, int *state_p) { struct acpi_device *device = acpi_fetch_acpi_dev(handle); if (device) return acpi_device_update_power(device, state_p); return -ENODEV; } EXPORT_SYMBOL_GPL(acpi_bus_update_power); bool acpi_bus_power_manageable(acpi_handle handle) { struct acpi_device *device = acpi_fetch_acpi_dev(handle); return device && device->flags.power_manageable; } EXPORT_SYMBOL(acpi_bus_power_manageable); static int acpi_power_up_if_adr_present(struct acpi_device *adev, void *not_used) { if (!(adev->flags.power_manageable && adev->pnp.type.bus_address)) return 0; acpi_handle_debug(adev->handle, "Power state: %s\n", acpi_power_state_string(adev->power.state)); if (adev->power.state == ACPI_STATE_D3_COLD) return acpi_device_set_power(adev, ACPI_STATE_D0); return 0; } /** * acpi_dev_power_up_children_with_adr - Power up childres with valid _ADR * @adev: Parent ACPI device object. * * Change the power states of the direct children of @adev that are in D3cold * and hold valid _ADR objects to D0 in order to allow bus (e.g. PCI) * enumeration code to access them. */ void acpi_dev_power_up_children_with_adr(struct acpi_device *adev) { acpi_dev_for_each_child(adev, acpi_power_up_if_adr_present, NULL); } /** * acpi_dev_power_state_for_wake - Deepest power state for wakeup signaling * @adev: ACPI companion of the target device. * * Evaluate _S0W for @adev and return the value produced by it or return * ACPI_STATE_UNKNOWN on errors (including _S0W not present). */ u8 acpi_dev_power_state_for_wake(struct acpi_device *adev) { unsigned long long state; acpi_status status; status = acpi_evaluate_integer(adev->handle, "_S0W", NULL, &state); if (ACPI_FAILURE(status)) return ACPI_STATE_UNKNOWN; return state; } #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { pm_wakeup_dev_event(dev, 0, acpi_s2idle_wakeup()); } EXPORT_SYMBOL_GPL(acpi_pm_wakeup_event); static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) { struct acpi_device *adev; if (val != ACPI_NOTIFY_DEVICE_WAKE) return; acpi_handle_debug(handle, "Wake notify\n"); adev = acpi_get_acpi_dev(handle); if (!adev) return; mutex_lock(&acpi_pm_notifier_lock); if (adev->wakeup.flags.notifier_present) { pm_wakeup_ws_event(adev->wakeup.ws, 0, acpi_s2idle_wakeup()); if (adev->wakeup.context.func) { acpi_handle_debug(handle, "Running %pS for %s\n", adev->wakeup.context.func, dev_name(adev->wakeup.context.dev)); adev->wakeup.context.func(&adev->wakeup.context); } } mutex_unlock(&acpi_pm_notifier_lock); acpi_put_acpi_dev(adev); } /** * acpi_add_pm_notifier - Register PM notify handler for given ACPI device. * @adev: ACPI device to add the notify handler for. * @dev: Device to generate a wakeup event for while handling the notification. * @func: Work function to execute when handling the notification. * * NOTE: @adev need not be a run-wake or wakeup device to be a valid source of * PM wakeup events. For example, wakeup events may be generated for bridges * if one of the devices below the bridge is signaling wakeup, even if the * bridge itself doesn't have a wakeup GPE associated with it. */ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, void (*func)(struct acpi_device_wakeup_context *context)) { acpi_status status = AE_ALREADY_EXISTS; if (!dev && !func) return AE_BAD_PARAMETER; mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.ws = wakeup_source_register(&adev->dev, dev_name(&adev->dev)); adev->wakeup.context.dev = dev; adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; mutex_unlock(&acpi_pm_notifier_lock); out: mutex_unlock(&acpi_pm_notifier_install_lock); return status; } /** * acpi_remove_pm_notifier - Unregister PM notifier from given ACPI device. * @adev: ACPI device to remove the notifier from. */ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { acpi_status status = AE_BAD_PARAMETER; mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; status = acpi_remove_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler); if (ACPI_FAILURE(status)) goto out; mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); adev->wakeup.flags.notifier_present = false; mutex_unlock(&acpi_pm_notifier_lock); out: mutex_unlock(&acpi_pm_notifier_install_lock); return status; } bool acpi_bus_can_wakeup(acpi_handle handle) { struct acpi_device *device = acpi_fetch_acpi_dev(handle); return device && device->wakeup.flags.valid; } EXPORT_SYMBOL(acpi_bus_can_wakeup); bool acpi_pm_device_can_wakeup(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); return adev ? acpi_device_can_wakeup(adev) : false; } /** * acpi_dev_pm_get_state - Get preferred power state of ACPI device. * @dev: Device whose preferred target power state to return. * @adev: ACPI device node corresponding to @dev. * @target_state: System state to match the resultant device state. * @d_min_p: Location to store the highest power state available to the device. * @d_max_p: Location to store the lowest power state available to the device. * * Find the lowest power (highest number) and highest power (lowest number) ACPI * device power states that the device can be in while the system is in the * state represented by @target_state. Store the integer numbers representing * those stats in the memory locations pointed to by @d_max_p and @d_min_p, * respectively. * * Callers must ensure that @dev and @adev are valid pointers and that @adev * actually corresponds to @dev before using this function. * * Returns 0 on success or -ENODATA when one of the ACPI methods fails or * returns a value that doesn't make sense. The memory locations pointed to by * @d_max_p and @d_min_p are only modified on success. */ static int acpi_dev_pm_get_state(struct device *dev, struct acpi_device *adev, u32 target_state, int *d_min_p, int *d_max_p) { char method[] = { '_', 'S', '0' + target_state, 'D', '\0' }; acpi_handle handle = adev->handle; unsigned long long ret; int d_min, d_max; bool wakeup = false; bool has_sxd = false; acpi_status status; /* * If the system state is S0, the lowest power state the device can be * in is D3cold, unless the device has _S0W and is supposed to signal * wakeup, in which case the return value of _S0W has to be used as the * lowest power state available to the device. */ d_min = ACPI_STATE_D0; d_max = ACPI_STATE_D3_COLD; /* * If present, _SxD methods return the minimum D-state (highest power * state) we can use for the corresponding S-states. Otherwise, the * minimum D-state is D0 (ACPI 3.x). */ if (target_state > ACPI_STATE_S0) { /* * We rely on acpi_evaluate_integer() not clobbering the integer * provided if AE_NOT_FOUND is returned. */ ret = d_min; status = acpi_evaluate_integer(handle, method, NULL, &ret); if ((ACPI_FAILURE(status) && status != AE_NOT_FOUND) || ret > ACPI_STATE_D3_COLD) return -ENODATA; /* * We need to handle legacy systems where D3hot and D3cold are * the same and 3 is returned in both cases, so fall back to * D3cold if D3hot is not a valid state. */ if (!adev->power.states[ret].flags.valid) { if (ret == ACPI_STATE_D3_HOT) ret = ACPI_STATE_D3_COLD; else return -ENODATA; } if (status == AE_OK) has_sxd = true; d_min = ret; wakeup = device_may_wakeup(dev) && adev->wakeup.flags.valid && adev->wakeup.sleep_state >= target_state; } else if (device_may_wakeup(dev) && dev->power.wakeirq) { /* * The ACPI subsystem doesn't manage the wake bit for IRQs * defined with ExclusiveAndWake and SharedAndWake. Instead we * expect them to be managed via the PM subsystem. Drivers * should call dev_pm_set_wake_irq to register an IRQ as a wake * source. * * If a device has a wake IRQ attached we need to check the * _S0W method to get the correct wake D-state. Otherwise we * end up putting the device into D3Cold which will more than * likely disable wake functionality. */ wakeup = true; } else { /* ACPI GPE is specified in _PRW. */ wakeup = adev->wakeup.flags.valid; } /* * If _PRW says we can wake up the system from the target sleep state, * the D-state returned by _SxD is sufficient for that (we assume a * wakeup-aware driver if wake is set). Still, if _SxW exists * (ACPI 3.x), it should return the maximum (lowest power) D-state that * can wake the system. _S0W may be valid, too. */ if (wakeup) { method[3] = 'W'; status = acpi_evaluate_integer(handle, method, NULL, &ret); if (status == AE_NOT_FOUND) { /* No _SxW. In this case, the ACPI spec says that we * must not go into any power state deeper than the * value returned from _SxD. */ if (has_sxd && target_state > ACPI_STATE_S0) d_max = d_min; } else if (ACPI_SUCCESS(status) && ret <= ACPI_STATE_D3_COLD) { /* Fall back to D3cold if ret is not a valid state. */ if (!adev->power.states[ret].flags.valid) ret = ACPI_STATE_D3_COLD; d_max = ret > d_min ? ret : d_min; } else { return -ENODATA; } } if (d_min_p) *d_min_p = d_min; if (d_max_p) *d_max_p = d_max; return 0; } /** * acpi_pm_device_sleep_state - Get preferred power state of ACPI device. * @dev: Device whose preferred target power state to return. * @d_min_p: Location to store the upper limit of the allowed states range. * @d_max_in: Deepest low-power state to take into consideration. * Return value: Preferred power state of the device on success, -ENODEV * if there's no 'struct acpi_device' for @dev, -EINVAL if @d_max_in is * incorrect, or -ENODATA on ACPI method failure. * * The caller must ensure that @dev is valid before using this function. */ int acpi_pm_device_sleep_state(struct device *dev, int *d_min_p, int d_max_in) { struct acpi_device *adev; int ret, d_min, d_max; if (d_max_in < ACPI_STATE_D0 || d_max_in > ACPI_STATE_D3_COLD) return -EINVAL; if (d_max_in > ACPI_STATE_D2) { enum pm_qos_flags_status stat; stat = dev_pm_qos_flags(dev, PM_QOS_FLAG_NO_POWER_OFF); if (stat == PM_QOS_FLAGS_ALL) d_max_in = ACPI_STATE_D2; } adev = ACPI_COMPANION(dev); if (!adev) { dev_dbg(dev, "ACPI companion missing in %s!\n", __func__); return -ENODEV; } ret = acpi_dev_pm_get_state(dev, adev, acpi_target_system_state(), &d_min, &d_max); if (ret) return ret; if (d_max_in < d_min) return -EINVAL; if (d_max > d_max_in) { for (d_max = d_max_in; d_max > d_min; d_max--) { if (adev->power.states[d_max].flags.valid) break; } } if (d_min_p) *d_min_p = d_min; return d_max; } EXPORT_SYMBOL(acpi_pm_device_sleep_state); /** * acpi_pm_notify_work_func - ACPI devices wakeup notification work function. * @context: Device wakeup context. */ static void acpi_pm_notify_work_func(struct acpi_device_wakeup_context *context) { struct device *dev = context->dev; if (dev) { pm_wakeup_event(dev, 0); pm_request_resume(dev); } } static DEFINE_MUTEX(acpi_wakeup_lock); static int __acpi_device_wakeup_enable(struct acpi_device *adev, u32 target_state) { struct acpi_device_wakeup *wakeup = &adev->wakeup; acpi_status status; int error = 0; mutex_lock(&acpi_wakeup_lock); /* * If the device wakeup power is already enabled, disable it and enable * it again in case it depends on the configuration of subordinate * devices and the conditions have changed since it was enabled last * time. */ if (wakeup->enable_count > 0) acpi_disable_wakeup_device_power(adev); error = acpi_enable_wakeup_device_power(adev, target_state); if (error) { if (wakeup->enable_count > 0) { acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number); wakeup->enable_count = 0; } goto out; } if (wakeup->enable_count > 0) goto inc; status = acpi_enable_gpe(wakeup->gpe_device, wakeup->gpe_number); if (ACPI_FAILURE(status)) { acpi_disable_wakeup_device_power(adev); error = -EIO; goto out; } acpi_handle_debug(adev->handle, "GPE%2X enabled for wakeup\n", (unsigned int)wakeup->gpe_number); inc: if (wakeup->enable_count < INT_MAX) wakeup->enable_count++; else acpi_handle_info(adev->handle, "Wakeup enable count out of bounds!\n"); out: mutex_unlock(&acpi_wakeup_lock); return error; } /** * acpi_device_wakeup_enable - Enable wakeup functionality for device. * @adev: ACPI device to enable wakeup functionality for. * @target_state: State the system is transitioning into. * * Enable the GPE associated with @adev so that it can generate wakeup signals * for the device in response to external (remote) events and enable wakeup * power for it. * * Callers must ensure that @adev is a valid ACPI device node before executing * this function. */ static int acpi_device_wakeup_enable(struct acpi_device *adev, u32 target_state) { return __acpi_device_wakeup_enable(adev, target_state); } /** * acpi_device_wakeup_disable - Disable wakeup functionality for device. * @adev: ACPI device to disable wakeup functionality for. * * Disable the GPE associated with @adev and disable wakeup power for it. * * Callers must ensure that @adev is a valid ACPI device node before executing * this function. */ static void acpi_device_wakeup_disable(struct acpi_device *adev) { struct acpi_device_wakeup *wakeup = &adev->wakeup; mutex_lock(&acpi_wakeup_lock); if (!wakeup->enable_count) goto out; acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number); acpi_disable_wakeup_device_power(adev); wakeup->enable_count--; out: mutex_unlock(&acpi_wakeup_lock); } /** * acpi_pm_set_device_wakeup - Enable/disable remote wakeup for given device. * @dev: Device to enable/disable to generate wakeup events. * @enable: Whether to enable or disable the wakeup functionality. */ int acpi_pm_set_device_wakeup(struct device *dev, bool enable) { struct acpi_device *adev; int error; adev = ACPI_COMPANION(dev); if (!adev) { dev_dbg(dev, "ACPI companion missing in %s!\n", __func__); return -ENODEV; } if (!acpi_device_can_wakeup(adev)) return -EINVAL; if (!enable) { acpi_device_wakeup_disable(adev); dev_dbg(dev, "Wakeup disabled by ACPI\n"); return 0; } error = __acpi_device_wakeup_enable(adev, acpi_target_system_state()); if (!error) dev_dbg(dev, "Wakeup enabled by ACPI\n"); return error; } EXPORT_SYMBOL_GPL(acpi_pm_set_device_wakeup); /** * acpi_dev_pm_low_power - Put ACPI device into a low-power state. * @dev: Device to put into a low-power state. * @adev: ACPI device node corresponding to @dev. * @system_state: System state to choose the device state for. */ static int acpi_dev_pm_low_power(struct device *dev, struct acpi_device *adev, u32 system_state) { int ret, state; if (!acpi_device_power_manageable(adev)) return 0; ret = acpi_dev_pm_get_state(dev, adev, system_state, NULL, &state); return ret ? ret : acpi_device_set_power(adev, state); } /** * acpi_dev_pm_full_power - Put ACPI device into the full-power state. * @adev: ACPI device node to put into the full-power state. */ static int acpi_dev_pm_full_power(struct acpi_device *adev) { return acpi_device_power_manageable(adev) ? acpi_device_set_power(adev, ACPI_STATE_D0) : 0; } /** * acpi_dev_suspend - Put device into a low-power state using ACPI. * @dev: Device to put into a low-power state. * @wakeup: Whether or not to enable wakeup for the device. * * Put the given device into a low-power state using the standard ACPI * mechanism. Set up remote wakeup if desired, choose the state to put the * device into (this checks if remote wakeup is expected to work too), and set * the power state of the device. */ int acpi_dev_suspend(struct device *dev, bool wakeup) { struct acpi_device *adev = ACPI_COMPANION(dev); u32 target_state = acpi_target_system_state(); int error; if (!adev) return 0; if (wakeup && acpi_device_can_wakeup(adev)) { error = acpi_device_wakeup_enable(adev, target_state); if (error) return -EAGAIN; } else { wakeup = false; } error = acpi_dev_pm_low_power(dev, adev, target_state); if (error && wakeup) acpi_device_wakeup_disable(adev); return error; } EXPORT_SYMBOL_GPL(acpi_dev_suspend); /** * acpi_dev_resume - Put device into the full-power state using ACPI. * @dev: Device to put into the full-power state. * * Put the given device into the full-power state using the standard ACPI * mechanism. Set the power state of the device to ACPI D0 and disable wakeup. */ int acpi_dev_resume(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); int error; if (!adev) return 0; error = acpi_dev_pm_full_power(adev); acpi_device_wakeup_disable(adev); return error; } EXPORT_SYMBOL_GPL(acpi_dev_resume); /** * acpi_subsys_runtime_suspend - Suspend device using ACPI. * @dev: Device to suspend. * * Carry out the generic runtime suspend procedure for @dev and use ACPI to put * it into a runtime low-power state. */ int acpi_subsys_runtime_suspend(struct device *dev) { int ret = pm_generic_runtime_suspend(dev); return ret ? ret : acpi_dev_suspend(dev, true); } EXPORT_SYMBOL_GPL(acpi_subsys_runtime_suspend); /** * acpi_subsys_runtime_resume - Resume device using ACPI. * @dev: Device to Resume. * * Use ACPI to put the given device into the full-power state and carry out the * generic runtime resume procedure for it. */ int acpi_subsys_runtime_resume(struct device *dev) { int ret = acpi_dev_resume(dev); return ret ? ret : pm_generic_runtime_resume(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_runtime_resume); #ifdef CONFIG_PM_SLEEP static bool acpi_dev_needs_resume(struct device *dev, struct acpi_device *adev) { u32 sys_target = acpi_target_system_state(); int ret, state; if (!pm_runtime_suspended(dev) || !adev || (adev->wakeup.flags.valid && device_may_wakeup(dev) != !!adev->wakeup.prepare_count)) return true; if (sys_target == ACPI_STATE_S0) return false; if (adev->power.flags.dsw_present) return true; ret = acpi_dev_pm_get_state(dev, adev, sys_target, NULL, &state); if (ret) return true; return state != adev->power.state; } /** * acpi_subsys_prepare - Prepare device for system transition to a sleep state. * @dev: Device to prepare. */ int acpi_subsys_prepare(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); if (dev->driver && dev->driver->pm && dev->driver->pm->prepare) { int ret = dev->driver->pm->prepare(dev); if (ret < 0) return ret; if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE)) return 0; } return !acpi_dev_needs_resume(dev, adev); } EXPORT_SYMBOL_GPL(acpi_subsys_prepare); /** * acpi_subsys_complete - Finalize device's resume during system resume. * @dev: Device to handle. */ void acpi_subsys_complete(struct device *dev) { pm_generic_complete(dev); /* * If the device had been runtime-suspended before the system went into * the sleep state it is going out of and it has never been resumed till * now, resume it in case the firmware powered it up. */ if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) pm_request_resume(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_complete); /** * acpi_subsys_suspend - Run the device driver's suspend callback. * @dev: Device to handle. * * Follow PCI and resume devices from runtime suspend before running their * system suspend callbacks, unless the driver can cope with runtime-suspended * devices during system suspend and there are no ACPI-specific reasons for * resuming them. */ int acpi_subsys_suspend(struct device *dev) { if (!dev_pm_smart_suspend(dev) || acpi_dev_needs_resume(dev, ACPI_COMPANION(dev))) pm_runtime_resume(dev); return pm_generic_suspend(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_suspend); /** * acpi_subsys_suspend_late - Suspend device using ACPI. * @dev: Device to suspend. * * Carry out the generic late suspend procedure for @dev and use ACPI to put * it into a low-power state during system transition into a sleep state. */ int acpi_subsys_suspend_late(struct device *dev) { int ret; if (dev_pm_skip_suspend(dev)) return 0; ret = pm_generic_suspend_late(dev); return ret ? ret : acpi_dev_suspend(dev, device_may_wakeup(dev)); } EXPORT_SYMBOL_GPL(acpi_subsys_suspend_late); /** * acpi_subsys_suspend_noirq - Run the device driver's "noirq" suspend callback. * @dev: Device to suspend. */ int acpi_subsys_suspend_noirq(struct device *dev) { int ret; if (dev_pm_skip_suspend(dev)) return 0; ret = pm_generic_suspend_noirq(dev); if (ret) return ret; /* * If the target system sleep state is suspend-to-idle, it is sufficient * to check whether or not the device's wakeup settings are good for * runtime PM. Otherwise, the pm_resume_via_firmware() check will cause * acpi_subsys_complete() to take care of fixing up the device's state * anyway, if need be. */ if (device_can_wakeup(dev) && !device_may_wakeup(dev)) dev->power.may_skip_resume = false; return 0; } EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq); /** * acpi_subsys_resume_noirq - Run the device driver's "noirq" resume callback. * @dev: Device to handle. */ static int acpi_subsys_resume_noirq(struct device *dev) { if (dev_pm_skip_resume(dev)) return 0; return pm_generic_resume_noirq(dev); } /** * acpi_subsys_resume_early - Resume device using ACPI. * @dev: Device to Resume. * * Use ACPI to put the given device into the full-power state and carry out the * generic early resume procedure for it during system transition into the * working state, but only do that if device either defines early resume * handler, or does not define power operations at all. Otherwise powering up * of the device is postponed to the normal resume phase. */ static int acpi_subsys_resume_early(struct device *dev) { const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; int ret; if (dev_pm_skip_resume(dev)) return 0; if (pm && !pm->resume_early) { dev_dbg(dev, "postponing D0 transition to normal resume stage\n"); return 0; } ret = acpi_dev_resume(dev); return ret ? ret : pm_generic_resume_early(dev); } /** * acpi_subsys_resume - Resume device using ACPI. * @dev: Device to Resume. * * Use ACPI to put the given device into the full-power state if it has not been * powered up during early resume phase, and carry out the generic resume * procedure for it during system transition into the working state. */ static int acpi_subsys_resume(struct device *dev) { const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; int ret = 0; if (!dev_pm_skip_resume(dev) && pm && !pm->resume_early) { dev_dbg(dev, "executing postponed D0 transition\n"); ret = acpi_dev_resume(dev); } return ret ? ret : pm_generic_resume(dev); } /** * acpi_subsys_freeze - Run the device driver's freeze callback. * @dev: Device to handle. */ int acpi_subsys_freeze(struct device *dev) { /* * Resume all runtime-suspended devices before creating a snapshot * image of system memory, because the restore kernel generally cannot * be expected to always handle them consistently and they need to be * put into the runtime-active metastate during system resume anyway, * so it is better to ensure that the state saved in the image will be * always consistent with that. */ pm_runtime_resume(dev); return pm_generic_freeze(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_freeze); /** * acpi_subsys_restore_early - Restore device using ACPI. * @dev: Device to restore. */ int acpi_subsys_restore_early(struct device *dev) { int ret = acpi_dev_resume(dev); return ret ? ret : pm_generic_restore_early(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_restore_early); /** * acpi_subsys_poweroff - Run the device driver's poweroff callback. * @dev: Device to handle. * * Follow PCI and resume devices from runtime suspend before running their * system poweroff callbacks, unless the driver can cope with runtime-suspended * devices during system suspend and there are no ACPI-specific reasons for * resuming them. */ int acpi_subsys_poweroff(struct device *dev) { if (!dev_pm_smart_suspend(dev) || acpi_dev_needs_resume(dev, ACPI_COMPANION(dev))) pm_runtime_resume(dev); return pm_generic_poweroff(dev); } EXPORT_SYMBOL_GPL(acpi_subsys_poweroff); /** * acpi_subsys_poweroff_late - Run the device driver's poweroff callback. * @dev: Device to handle. * * Carry out the generic late poweroff procedure for @dev and use ACPI to put * it into a low-power state during system transition into a sleep state. */ static int acpi_subsys_poweroff_late(struct device *dev) { int ret; if (dev_pm_skip_suspend(dev)) return 0; ret = pm_generic_poweroff_late(dev); if (ret) return ret; return acpi_dev_suspend(dev, device_may_wakeup(dev)); } /** * acpi_subsys_poweroff_noirq - Run the driver's "noirq" poweroff callback. * @dev: Device to suspend. */ static int acpi_subsys_poweroff_noirq(struct device *dev) { if (dev_pm_skip_suspend(dev)) return 0; return pm_generic_poweroff_noirq(dev); } #endif /* CONFIG_PM_SLEEP */ static struct dev_pm_domain acpi_general_pm_domain = { .ops = { .runtime_suspend = acpi_subsys_runtime_suspend, .runtime_resume = acpi_subsys_runtime_resume, #ifdef CONFIG_PM_SLEEP .prepare = acpi_subsys_prepare, .complete = acpi_subsys_complete, .suspend = acpi_subsys_suspend, .resume = acpi_subsys_resume, .suspend_late = acpi_subsys_suspend_late, .suspend_noirq = acpi_subsys_suspend_noirq, .resume_noirq = acpi_subsys_resume_noirq, .resume_early = acpi_subsys_resume_early, .freeze = acpi_subsys_freeze, .poweroff = acpi_subsys_poweroff, .poweroff_late = acpi_subsys_poweroff_late, .poweroff_noirq = acpi_subsys_poweroff_noirq, .restore_early = acpi_subsys_restore_early, #endif }, }; /** * acpi_dev_pm_detach - Remove ACPI power management from the device. * @dev: Device to take care of. * @power_off: Whether or not to try to remove power from the device. * * Remove the device from the general ACPI PM domain and remove its wakeup * notifier. If @power_off is set, additionally remove power from the device if * possible. * * Callers must ensure proper synchronization of this function with power * management callbacks. */ static void acpi_dev_pm_detach(struct device *dev, bool power_off) { struct acpi_device *adev = ACPI_COMPANION(dev); if (adev && dev->pm_domain == &acpi_general_pm_domain) { dev_pm_domain_set(dev, NULL); acpi_remove_pm_notifier(adev); if (power_off) { /* * If the device's PM QoS resume latency limit or flags * have been exposed to user space, they have to be * hidden at this point, so that they don't affect the * choice of the low-power state to put the device into. */ dev_pm_qos_hide_latency_limit(dev); dev_pm_qos_hide_flags(dev); acpi_device_wakeup_disable(adev); acpi_dev_pm_low_power(dev, adev, ACPI_STATE_S0); } } } /** * acpi_dev_pm_attach - Prepare device for ACPI power management. * @dev: Device to prepare. * @power_on: Whether or not to power on the device. * * If @dev has a valid ACPI handle that has a valid struct acpi_device object * attached to it, install a wakeup notification handler for the device and * add it to the general ACPI PM domain. If @power_on is set, the device will * be put into the ACPI D0 state before the function returns. * * This assumes that the @dev's bus type uses generic power management callbacks * (or doesn't use any power management callbacks at all). * * Callers must ensure proper synchronization of this function with power * management callbacks. */ int acpi_dev_pm_attach(struct device *dev, bool power_on) { /* * Skip devices whose ACPI companions match the device IDs below, * because they require special power management handling incompatible * with the generic ACPI PM domain. */ static const struct acpi_device_id special_pm_ids[] = { ACPI_FAN_DEVICE_IDS, {} }; struct acpi_device *adev = ACPI_COMPANION(dev); if (!adev || !acpi_match_device_ids(adev, special_pm_ids)) return 0; /* * Only attach the power domain to the first device if the * companion is shared by multiple. This is to prevent doing power * management twice. */ if (!acpi_device_is_first_physical_node(adev, dev)) return 0; acpi_add_pm_notifier(adev, dev, acpi_pm_notify_work_func); dev_pm_domain_set(dev, &acpi_general_pm_domain); if (power_on) { acpi_dev_pm_full_power(adev); acpi_device_wakeup_disable(adev); } dev->pm_domain->detach = acpi_dev_pm_detach; return 1; } EXPORT_SYMBOL_GPL(acpi_dev_pm_attach); /** * acpi_storage_d3 - Check if D3 should be used in the suspend path * @dev: Device to check * * Return %true if the platform firmware wants @dev to be programmed * into D3hot or D3cold (if supported) in the suspend path, or %false * when there is no specific preference. On some platforms, if this * hint is ignored, @dev may remain unresponsive after suspending the * platform as a whole. * * Although the property has storage in the name it actually is * applied to the PCIe slot and plugging in a non-storage device the * same platform restrictions will likely apply. */ bool acpi_storage_d3(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); u8 val; if (force_storage_d3()) return true; if (!adev) return false; if (fwnode_property_read_u8(acpi_fwnode_handle(adev), "StorageD3Enable", &val)) return false; return val == 1; } EXPORT_SYMBOL_GPL(acpi_storage_d3); /** * acpi_dev_state_d0 - Tell if the device is in D0 power state * @dev: Physical device the ACPI power state of which to check * * On a system without ACPI, return true. On a system with ACPI, return true if * the current ACPI power state of the device is D0, or false otherwise. * * Note that the power state of a device is not well-defined after it has been * passed to acpi_device_set_power() and before that function returns, so it is * not valid to ask for the ACPI power state of the device in that time frame. * * This function is intended to be used in a driver's probe or remove * function. See Documentation/firmware-guide/acpi/non-d0-probe.rst for * more information. */ bool acpi_dev_state_d0(struct device *dev) { struct acpi_device *adev = ACPI_COMPANION(dev); if (!adev) return true; return adev->power.state == ACPI_STATE_D0; } EXPORT_SYMBOL_GPL(acpi_dev_state_d0); #endif /* CONFIG_PM */ |
| 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | /* * DRBG based on NIST SP800-90A * * Copyright Stephan Mueller <smueller@chronox.de>, 2014 * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, and the entire permission notice in its entirety, * including the disclaimer of warranties. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * ALTERNATIVELY, this product may be distributed under the terms of * the GNU General Public License, in which case the provisions of the GPL are * required INSTEAD OF the above restrictions. (This clause is * necessary due to a potential bad interaction between the GPL and * the restrictions contained in a BSD-style copyright.) * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ #ifndef _DRBG_H #define _DRBG_H #include <linux/random.h> #include <linux/scatterlist.h> #include <crypto/hash.h> #include <crypto/skcipher.h> #include <linux/module.h> #include <linux/crypto.h> #include <linux/slab.h> #include <crypto/internal/rng.h> #include <crypto/rng.h> #include <linux/fips.h> #include <linux/mutex.h> #include <linux/list.h> #include <linux/workqueue.h> /* * Concatenation Helper and string operation helper * * SP800-90A requires the concatenation of different data. To avoid copying * buffers around or allocate additional memory, the following data structure * is used to point to the original memory with its size. In addition, it * is used to build a linked list. The linked list defines the concatenation * of individual buffers. The order of memory block referenced in that * linked list determines the order of concatenation. */ struct drbg_string { const unsigned char *buf; size_t len; struct list_head list; }; static inline void drbg_string_fill(struct drbg_string *string, const unsigned char *buf, size_t len) { string->buf = buf; string->len = len; INIT_LIST_HEAD(&string->list); } struct drbg_state; typedef uint32_t drbg_flag_t; struct drbg_core { drbg_flag_t flags; /* flags for the cipher */ __u8 statelen; /* maximum state length */ __u8 blocklen_bytes; /* block size of output in bytes */ char cra_name[CRYPTO_MAX_ALG_NAME]; /* mapping to kernel crypto API */ /* kernel crypto API backend cipher name */ char backend_cra_name[CRYPTO_MAX_ALG_NAME]; }; struct drbg_state_ops { int (*update)(struct drbg_state *drbg, struct list_head *seed, int reseed); int (*generate)(struct drbg_state *drbg, unsigned char *buf, unsigned int buflen, struct list_head *addtl); int (*crypto_init)(struct drbg_state *drbg); int (*crypto_fini)(struct drbg_state *drbg); }; struct drbg_test_data { struct drbg_string *testentropy; /* TEST PARAMETER: test entropy */ }; enum drbg_seed_state { DRBG_SEED_STATE_UNSEEDED, DRBG_SEED_STATE_PARTIAL, /* Seeded with !rng_is_initialized() */ DRBG_SEED_STATE_FULL, }; struct drbg_state { struct mutex drbg_mutex; /* lock around DRBG */ unsigned char *V; /* internal state 10.1.1.1 1a) */ unsigned char *Vbuf; /* hash: static value 10.1.1.1 1b) hmac / ctr: key */ unsigned char *C; unsigned char *Cbuf; /* Number of RNG requests since last reseed -- 10.1.1.1 1c) */ size_t reseed_ctr; size_t reseed_threshold; /* some memory the DRBG can use for its operation */ unsigned char *scratchpad; unsigned char *scratchpadbuf; void *priv_data; /* Cipher handle */ struct crypto_skcipher *ctr_handle; /* CTR mode cipher handle */ struct skcipher_request *ctr_req; /* CTR mode request handle */ __u8 *outscratchpadbuf; /* CTR mode output scratchpad */ __u8 *outscratchpad; /* CTR mode aligned outbuf */ struct crypto_wait ctr_wait; /* CTR mode async wait obj */ struct scatterlist sg_in, sg_out; /* CTR mode SGLs */ enum drbg_seed_state seeded; /* DRBG fully seeded? */ unsigned long last_seed_time; bool pr; /* Prediction resistance enabled? */ bool fips_primed; /* Continuous test primed? */ unsigned char *prev; /* FIPS 140-2 continuous test value */ struct crypto_rng *jent; const struct drbg_state_ops *d_ops; const struct drbg_core *core; struct drbg_string test_data; }; static inline __u8 drbg_statelen(struct drbg_state *drbg) { if (drbg && drbg->core) return drbg->core->statelen; return 0; } static inline __u8 drbg_blocklen(struct drbg_state *drbg) { if (drbg && drbg->core) return drbg->core->blocklen_bytes; return 0; } static inline __u8 drbg_keylen(struct drbg_state *drbg) { if (drbg && drbg->core) return (drbg->core->statelen - drbg->core->blocklen_bytes); return 0; } static inline size_t drbg_max_request_bytes(struct drbg_state *drbg) { /* SP800-90A requires the limit 2**19 bits, but we return bytes */ return (1 << 16); } static inline size_t drbg_max_addtl(struct drbg_state *drbg) { /* SP800-90A requires 2**35 bytes additional info str / pers str */ #if (__BITS_PER_LONG == 32) /* * SP800-90A allows smaller maximum numbers to be returned -- we * return SIZE_MAX - 1 to allow the verification of the enforcement * of this value in drbg_healthcheck_sanity. */ return (SIZE_MAX - 1); #else return (1UL<<35); #endif } static inline size_t drbg_max_requests(struct drbg_state *drbg) { /* SP800-90A requires 2**48 maximum requests before reseeding */ return (1<<20); } /* * This is a wrapper to the kernel crypto API function of * crypto_rng_generate() to allow the caller to provide additional data. * * @drng DRBG handle -- see crypto_rng_get_bytes * @outbuf output buffer -- see crypto_rng_get_bytes * @outlen length of output buffer -- see crypto_rng_get_bytes * @addtl_input additional information string input buffer * @addtllen length of additional information string buffer * * return * see crypto_rng_get_bytes */ static inline int crypto_drbg_get_bytes_addtl(struct crypto_rng *drng, unsigned char *outbuf, unsigned int outlen, struct drbg_string *addtl) { return crypto_rng_generate(drng, addtl->buf, addtl->len, outbuf, outlen); } /* * TEST code * * This is a wrapper to the kernel crypto API function of * crypto_rng_generate() to allow the caller to provide additional data and * allow furnishing of test_data * * @drng DRBG handle -- see crypto_rng_get_bytes * @outbuf output buffer -- see crypto_rng_get_bytes * @outlen length of output buffer -- see crypto_rng_get_bytes * @addtl_input additional information string input buffer * @addtllen length of additional information string buffer * @test_data filled test data * * return * see crypto_rng_get_bytes */ static inline int crypto_drbg_get_bytes_addtl_test(struct crypto_rng *drng, unsigned char *outbuf, unsigned int outlen, struct drbg_string *addtl, struct drbg_test_data *test_data) { crypto_rng_set_entropy(drng, test_data->testentropy->buf, test_data->testentropy->len); return crypto_rng_generate(drng, addtl->buf, addtl->len, outbuf, outlen); } /* * TEST code * * This is a wrapper to the kernel crypto API function of * crypto_rng_reset() to allow the caller to provide test_data * * @drng DRBG handle -- see crypto_rng_reset * @pers personalization string input buffer * @perslen length of additional information string buffer * @test_data filled test data * * return * see crypto_rng_reset */ static inline int crypto_drbg_reset_test(struct crypto_rng *drng, struct drbg_string *pers, struct drbg_test_data *test_data) { crypto_rng_set_entropy(drng, test_data->testentropy->buf, test_data->testentropy->len); return crypto_rng_reset(drng, pers->buf, pers->len); } /* DRBG type flags */ #define DRBG_CTR ((drbg_flag_t)1<<0) #define DRBG_HMAC ((drbg_flag_t)1<<1) #define DRBG_HASH ((drbg_flag_t)1<<2) #define DRBG_TYPE_MASK (DRBG_CTR | DRBG_HMAC | DRBG_HASH) /* DRBG strength flags */ #define DRBG_STRENGTH128 ((drbg_flag_t)1<<3) #define DRBG_STRENGTH192 ((drbg_flag_t)1<<4) #define DRBG_STRENGTH256 ((drbg_flag_t)1<<5) #define DRBG_STRENGTH_MASK (DRBG_STRENGTH128 | DRBG_STRENGTH192 | \ DRBG_STRENGTH256) enum drbg_prefixes { DRBG_PREFIX0 = 0x00, DRBG_PREFIX1, DRBG_PREFIX2, DRBG_PREFIX3 }; #endif /* _DRBG_H */ |
| 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 11 12 11 12 10 12 12 12 12 3 12 12 8 8 7 7 7 1 1 1 1 6 1 6 1 6 1 6 1 6 1 6 6 6 1 6 1 6 1 6 6 6 8 1 8 8 6 6 6 6 6 8 5 4 5 5 5 5 5 5 5 5 4 5 5 5 5 5 5 5 5 5 5 9 10 10 12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 | // SPDX-License-Identifier: GPL-2.0-only /* Flow Queue PIE discipline * * Copyright (C) 2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in> * Copyright (C) 2019 Sachin D. Patil <sdp.sachin@gmail.com> * Copyright (C) 2019 V. Saicharan <vsaicharan1998@gmail.com> * Copyright (C) 2019 Mohit Bhasi <mohitbhasi1998@gmail.com> * Copyright (C) 2019 Leslie Monis <lesliemonis@gmail.com> * Copyright (C) 2019 Gautam Ramakrishnan <gautamramk@gmail.com> */ #include <linux/jhash.h> #include <linux/module.h> #include <linux/sizes.h> #include <linux/vmalloc.h> #include <net/pkt_cls.h> #include <net/pie.h> /* Flow Queue PIE * * Principles: * - Packets are classified on flows. * - This is a Stochastic model (as we use a hash, several flows might * be hashed to the same slot) * - Each flow has a PIE managed queue. * - Flows are linked onto two (Round Robin) lists, * so that new flows have priority on old ones. * - For a given flow, packets are not reordered. * - Drops during enqueue only. * - ECN capability is off by default. * - ECN threshold (if ECN is enabled) is at 10% by default. * - Uses timestamps to calculate queue delay by default. */ /** * struct fq_pie_flow - contains data for each flow * @vars: pie vars associated with the flow * @deficit: number of remaining byte credits * @backlog: size of data in the flow * @qlen: number of packets in the flow * @flowchain: flowchain for the flow * @head: first packet in the flow * @tail: last packet in the flow */ struct fq_pie_flow { struct pie_vars vars; s32 deficit; u32 backlog; u32 qlen; struct list_head flowchain; struct sk_buff *head; struct sk_buff *tail; }; struct fq_pie_sched_data { struct tcf_proto __rcu *filter_list; /* optional external classifier */ struct tcf_block *block; struct fq_pie_flow *flows; struct Qdisc *sch; struct list_head old_flows; struct list_head new_flows; struct pie_params p_params; u32 ecn_prob; u32 flows_cnt; u32 flows_cursor; u32 quantum; u32 memory_limit; u32 new_flow_count; u32 memory_usage; u32 overmemory; struct pie_stats stats; struct timer_list adapt_timer; }; static unsigned int fq_pie_hash(const struct fq_pie_sched_data *q, struct sk_buff *skb) { return reciprocal_scale(skb_get_hash(skb), q->flows_cnt); } static unsigned int fq_pie_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) { struct fq_pie_sched_data *q = qdisc_priv(sch); struct tcf_proto *filter; struct tcf_result res; int result; if (TC_H_MAJ(skb->priority) == sch->handle && TC_H_MIN(skb->priority) > 0 && TC_H_MIN(skb->priority) <= q->flows_cnt) return TC_H_MIN(skb->priority); filter = rcu_dereference_bh(q->filter_list); if (!filter) return fq_pie_hash(q, skb) + 1; *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; result = tcf_classify(skb, NULL, filter, &res, false); if (result >= 0) { #ifdef CONFIG_NET_CLS_ACT switch (result) { case TC_ACT_STOLEN: case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; fallthrough; case TC_ACT_SHOT: return 0; } #endif if (TC_H_MIN(res.classid) <= q->flows_cnt) return TC_H_MIN(res.classid); } return 0; } /* add skb to flow queue (tail add) */ static inline void flow_queue_add(struct fq_pie_flow *flow, struct sk_buff *skb) { if (!flow->head) flow->head = skb; else flow->tail->next = skb; flow->tail = skb; skb->next = NULL; } static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { enum skb_drop_reason reason = SKB_DROP_REASON_QDISC_OVERLIMIT; struct fq_pie_sched_data *q = qdisc_priv(sch); struct fq_pie_flow *sel_flow; int ret; u8 memory_limited = false; u8 enqueue = false; u32 pkt_len; u32 idx; /* Classifies packet into corresponding flow */ idx = fq_pie_classify(skb, sch, &ret); if (idx == 0) { if (ret & __NET_XMIT_BYPASS) qdisc_qstats_drop(sch); __qdisc_drop(skb, to_free); return ret; } idx--; sel_flow = &q->flows[idx]; /* Checks whether adding a new packet would exceed memory limit */ get_pie_cb(skb)->mem_usage = skb->truesize; memory_limited = q->memory_usage > q->memory_limit + skb->truesize; /* Checks if the qdisc is full */ if (unlikely(qdisc_qlen(sch) >= sch->limit)) { q->stats.overlimit++; goto out; } else if (unlikely(memory_limited)) { q->overmemory++; } reason = SKB_DROP_REASON_QDISC_CONGESTED; if (!pie_drop_early(sch, &q->p_params, &sel_flow->vars, sel_flow->backlog, skb->len)) { enqueue = true; } else if (q->p_params.ecn && sel_flow->vars.prob <= (MAX_PROB / 100) * q->ecn_prob && INET_ECN_set_ce(skb)) { /* If packet is ecn capable, mark it if drop probability * is lower than the parameter ecn_prob, else drop it. */ q->stats.ecn_mark++; enqueue = true; } if (enqueue) { /* Set enqueue time only when dq_rate_estimator is disabled. */ if (!q->p_params.dq_rate_estimator) pie_set_enqueue_time(skb); pkt_len = qdisc_pkt_len(skb); q->stats.packets_in++; q->memory_usage += skb->truesize; sch->qstats.backlog += pkt_len; sch->q.qlen++; flow_queue_add(sel_flow, skb); if (list_empty(&sel_flow->flowchain)) { list_add_tail(&sel_flow->flowchain, &q->new_flows); q->new_flow_count++; sel_flow->deficit = q->quantum; sel_flow->qlen = 0; sel_flow->backlog = 0; } sel_flow->qlen++; sel_flow->backlog += pkt_len; return NET_XMIT_SUCCESS; } out: q->stats.dropped++; sel_flow->vars.accu_prob = 0; qdisc_drop_reason(skb, sch, to_free, reason); return NET_XMIT_CN; } static const struct netlink_range_validation fq_pie_q_range = { .min = 1, .max = 1 << 20, }; static const struct nla_policy fq_pie_policy[TCA_FQ_PIE_MAX + 1] = { [TCA_FQ_PIE_LIMIT] = {.type = NLA_U32}, [TCA_FQ_PIE_FLOWS] = {.type = NLA_U32}, [TCA_FQ_PIE_TARGET] = {.type = NLA_U32}, [TCA_FQ_PIE_TUPDATE] = {.type = NLA_U32}, [TCA_FQ_PIE_ALPHA] = {.type = NLA_U32}, [TCA_FQ_PIE_BETA] = {.type = NLA_U32}, [TCA_FQ_PIE_QUANTUM] = NLA_POLICY_FULL_RANGE(NLA_U32, &fq_pie_q_range), [TCA_FQ_PIE_MEMORY_LIMIT] = {.type = NLA_U32}, [TCA_FQ_PIE_ECN_PROB] = {.type = NLA_U32}, [TCA_FQ_PIE_ECN] = {.type = NLA_U32}, [TCA_FQ_PIE_BYTEMODE] = {.type = NLA_U32}, [TCA_FQ_PIE_DQ_RATE_ESTIMATOR] = {.type = NLA_U32}, }; static inline struct sk_buff *dequeue_head(struct fq_pie_flow *flow) { struct sk_buff *skb = flow->head; flow->head = skb->next; skb->next = NULL; return skb; } static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch) { struct fq_pie_sched_data *q = qdisc_priv(sch); struct sk_buff *skb = NULL; struct fq_pie_flow *flow; struct list_head *head; u32 pkt_len; begin: head = &q->new_flows; if (list_empty(head)) { head = &q->old_flows; if (list_empty(head)) return NULL; } flow = list_first_entry(head, struct fq_pie_flow, flowchain); /* Flow has exhausted all its credits */ if (flow->deficit <= 0) { flow->deficit += q->quantum; list_move_tail(&flow->flowchain, &q->old_flows); goto begin; } if (flow->head) { skb = dequeue_head(flow); pkt_len = qdisc_pkt_len(skb); sch->qstats.backlog -= pkt_len; sch->q.qlen--; qdisc_bstats_update(sch, skb); } if (!skb) { /* force a pass through old_flows to prevent starvation */ if (head == &q->new_flows && !list_empty(&q->old_flows)) list_move_tail(&flow->flowchain, &q->old_flows); else list_del_init(&flow->flowchain); goto begin; } flow->qlen--; flow->deficit -= pkt_len; flow->backlog -= pkt_len; q->memory_usage -= get_pie_cb(skb)->mem_usage; pie_process_dequeue(skb, &q->p_params, &flow->vars, flow->backlog); return skb; } static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct fq_pie_sched_data *q = qdisc_priv(sch); struct nlattr *tb[TCA_FQ_PIE_MAX + 1]; unsigned int len_dropped = 0; unsigned int num_dropped = 0; int err; err = nla_parse_nested(tb, TCA_FQ_PIE_MAX, opt, fq_pie_policy, extack); if (err < 0) return err; sch_tree_lock(sch); if (tb[TCA_FQ_PIE_LIMIT]) { u32 limit = nla_get_u32(tb[TCA_FQ_PIE_LIMIT]); WRITE_ONCE(q->p_params.limit, limit); WRITE_ONCE(sch->limit, limit); } if (tb[TCA_FQ_PIE_FLOWS]) { if (q->flows) { NL_SET_ERR_MSG_MOD(extack, "Number of flows cannot be changed"); goto flow_error; } q->flows_cnt = nla_get_u32(tb[TCA_FQ_PIE_FLOWS]); if (!q->flows_cnt || q->flows_cnt > 65536) { NL_SET_ERR_MSG_MOD(extack, "Number of flows must range in [1..65536]"); goto flow_error; } } /* convert from microseconds to pschedtime */ if (tb[TCA_FQ_PIE_TARGET]) { /* target is in us */ u32 target = nla_get_u32(tb[TCA_FQ_PIE_TARGET]); /* convert to pschedtime */ WRITE_ONCE(q->p_params.target, PSCHED_NS2TICKS((u64)target * NSEC_PER_USEC)); } /* tupdate is in jiffies */ if (tb[TCA_FQ_PIE_TUPDATE]) WRITE_ONCE(q->p_params.tupdate, usecs_to_jiffies(nla_get_u32(tb[TCA_FQ_PIE_TUPDATE]))); if (tb[TCA_FQ_PIE_ALPHA]) WRITE_ONCE(q->p_params.alpha, nla_get_u32(tb[TCA_FQ_PIE_ALPHA])); if (tb[TCA_FQ_PIE_BETA]) WRITE_ONCE(q->p_params.beta, nla_get_u32(tb[TCA_FQ_PIE_BETA])); if (tb[TCA_FQ_PIE_QUANTUM]) WRITE_ONCE(q->quantum, nla_get_u32(tb[TCA_FQ_PIE_QUANTUM])); if (tb[TCA_FQ_PIE_MEMORY_LIMIT]) WRITE_ONCE(q->memory_limit, nla_get_u32(tb[TCA_FQ_PIE_MEMORY_LIMIT])); if (tb[TCA_FQ_PIE_ECN_PROB]) WRITE_ONCE(q->ecn_prob, nla_get_u32(tb[TCA_FQ_PIE_ECN_PROB])); if (tb[TCA_FQ_PIE_ECN]) WRITE_ONCE(q->p_params.ecn, nla_get_u32(tb[TCA_FQ_PIE_ECN])); if (tb[TCA_FQ_PIE_BYTEMODE]) WRITE_ONCE(q->p_params.bytemode, nla_get_u32(tb[TCA_FQ_PIE_BYTEMODE])); if (tb[TCA_FQ_PIE_DQ_RATE_ESTIMATOR]) WRITE_ONCE(q->p_params.dq_rate_estimator, nla_get_u32(tb[TCA_FQ_PIE_DQ_RATE_ESTIMATOR])); /* Drop excess packets if new limit is lower */ while (sch->q.qlen > sch->limit) { struct sk_buff *skb = qdisc_dequeue_internal(sch, false); len_dropped += qdisc_pkt_len(skb); num_dropped += 1; rtnl_kfree_skbs(skb, skb); } qdisc_tree_reduce_backlog(sch, num_dropped, len_dropped); sch_tree_unlock(sch); return 0; flow_error: sch_tree_unlock(sch); return -EINVAL; } static void fq_pie_timer(struct timer_list *t) { struct fq_pie_sched_data *q = timer_container_of(q, t, adapt_timer); unsigned long next, tupdate; struct Qdisc *sch = q->sch; spinlock_t *root_lock; /* to lock qdisc for probability calculations */ int max_cnt, i; rcu_read_lock(); root_lock = qdisc_lock(qdisc_root_sleeping(sch)); spin_lock(root_lock); /* Limit this expensive loop to 2048 flows per round. */ max_cnt = min_t(int, q->flows_cnt - q->flows_cursor, 2048); for (i = 0; i < max_cnt; i++) { pie_calculate_probability(&q->p_params, &q->flows[q->flows_cursor].vars, q->flows[q->flows_cursor].backlog); q->flows_cursor++; } tupdate = q->p_params.tupdate; next = 0; if (q->flows_cursor >= q->flows_cnt) { q->flows_cursor = 0; next = tupdate; } if (tupdate) mod_timer(&q->adapt_timer, jiffies + next); spin_unlock(root_lock); rcu_read_unlock(); } static int fq_pie_init(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct fq_pie_sched_data *q = qdisc_priv(sch); int err; u32 idx; pie_params_init(&q->p_params); sch->limit = 10 * 1024; q->p_params.limit = sch->limit; q->quantum = psched_mtu(qdisc_dev(sch)); q->sch = sch; q->ecn_prob = 10; q->flows_cnt = 1024; q->memory_limit = SZ_32M; INIT_LIST_HEAD(&q->new_flows); INIT_LIST_HEAD(&q->old_flows); timer_setup(&q->adapt_timer, fq_pie_timer, 0); if (opt) { err = fq_pie_change(sch, opt, extack); if (err) return err; } err = tcf_block_get(&q->block, &q->filter_list, sch, extack); if (err) goto init_failure; q->flows = kvcalloc(q->flows_cnt, sizeof(struct fq_pie_flow), GFP_KERNEL); if (!q->flows) { err = -ENOMEM; goto init_failure; } for (idx = 0; idx < q->flows_cnt; idx++) { struct fq_pie_flow *flow = q->flows + idx; INIT_LIST_HEAD(&flow->flowchain); pie_vars_init(&flow->vars); } mod_timer(&q->adapt_timer, jiffies + HZ / 2); return 0; init_failure: q->flows_cnt = 0; return err; } static int fq_pie_dump(struct Qdisc *sch, struct sk_buff *skb) { struct fq_pie_sched_data *q = qdisc_priv(sch); struct nlattr *opts; opts = nla_nest_start(skb, TCA_OPTIONS); if (!opts) return -EMSGSIZE; /* convert target from pschedtime to us */ if (nla_put_u32(skb, TCA_FQ_PIE_LIMIT, READ_ONCE(sch->limit)) || nla_put_u32(skb, TCA_FQ_PIE_FLOWS, READ_ONCE(q->flows_cnt)) || nla_put_u32(skb, TCA_FQ_PIE_TARGET, ((u32)PSCHED_TICKS2NS(READ_ONCE(q->p_params.target))) / NSEC_PER_USEC) || nla_put_u32(skb, TCA_FQ_PIE_TUPDATE, jiffies_to_usecs(READ_ONCE(q->p_params.tupdate))) || nla_put_u32(skb, TCA_FQ_PIE_ALPHA, READ_ONCE(q->p_params.alpha)) || nla_put_u32(skb, TCA_FQ_PIE_BETA, READ_ONCE(q->p_params.beta)) || nla_put_u32(skb, TCA_FQ_PIE_QUANTUM, READ_ONCE(q->quantum)) || nla_put_u32(skb, TCA_FQ_PIE_MEMORY_LIMIT, READ_ONCE(q->memory_limit)) || nla_put_u32(skb, TCA_FQ_PIE_ECN_PROB, READ_ONCE(q->ecn_prob)) || nla_put_u32(skb, TCA_FQ_PIE_ECN, READ_ONCE(q->p_params.ecn)) || nla_put_u32(skb, TCA_FQ_PIE_BYTEMODE, READ_ONCE(q->p_params.bytemode)) || nla_put_u32(skb, TCA_FQ_PIE_DQ_RATE_ESTIMATOR, READ_ONCE(q->p_params.dq_rate_estimator))) goto nla_put_failure; return nla_nest_end(skb, opts); nla_put_failure: nla_nest_cancel(skb, opts); return -EMSGSIZE; } static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d) { struct fq_pie_sched_data *q = qdisc_priv(sch); struct tc_fq_pie_xstats st = { .packets_in = q->stats.packets_in, .overlimit = q->stats.overlimit, .overmemory = q->overmemory, .dropped = q->stats.dropped, .ecn_mark = q->stats.ecn_mark, .new_flow_count = q->new_flow_count, .memory_usage = q->memory_usage, }; struct list_head *pos; sch_tree_lock(sch); list_for_each(pos, &q->new_flows) st.new_flows_len++; list_for_each(pos, &q->old_flows) st.old_flows_len++; sch_tree_unlock(sch); return gnet_stats_copy_app(d, &st, sizeof(st)); } static void fq_pie_reset(struct Qdisc *sch) { struct fq_pie_sched_data *q = qdisc_priv(sch); u32 idx; INIT_LIST_HEAD(&q->new_flows); INIT_LIST_HEAD(&q->old_flows); for (idx = 0; idx < q->flows_cnt; idx++) { struct fq_pie_flow *flow = q->flows + idx; /* Removes all packets from flow */ rtnl_kfree_skbs(flow->head, flow->tail); flow->head = NULL; INIT_LIST_HEAD(&flow->flowchain); pie_vars_init(&flow->vars); } } static void fq_pie_destroy(struct Qdisc *sch) { struct fq_pie_sched_data *q = qdisc_priv(sch); tcf_block_put(q->block); q->p_params.tupdate = 0; timer_delete_sync(&q->adapt_timer); kvfree(q->flows); } static struct Qdisc_ops fq_pie_qdisc_ops __read_mostly = { .id = "fq_pie", .priv_size = sizeof(struct fq_pie_sched_data), .enqueue = fq_pie_qdisc_enqueue, .dequeue = fq_pie_qdisc_dequeue, .peek = qdisc_peek_dequeued, .init = fq_pie_init, .destroy = fq_pie_destroy, .reset = fq_pie_reset, .change = fq_pie_change, .dump = fq_pie_dump, .dump_stats = fq_pie_dump_stats, .owner = THIS_MODULE, }; MODULE_ALIAS_NET_SCH("fq_pie"); static int __init fq_pie_module_init(void) { return register_qdisc(&fq_pie_qdisc_ops); } static void __exit fq_pie_module_exit(void) { unregister_qdisc(&fq_pie_qdisc_ops); } module_init(fq_pie_module_init); module_exit(fq_pie_module_exit); MODULE_DESCRIPTION("Flow Queue Proportional Integral controller Enhanced (FQ-PIE)"); MODULE_AUTHOR("Mohit P. Tahiliani"); MODULE_LICENSE("GPL"); |
| 333 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM fib #if !defined(_TRACE_FIB_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_FIB_H #include <linux/skbuff.h> #include <linux/netdevice.h> #include <net/ip_fib.h> #include <linux/tracepoint.h> TRACE_EVENT(fib_table_lookup, TP_PROTO(u32 tb_id, const struct flowi4 *flp, const struct fib_nh_common *nhc, int err), TP_ARGS(tb_id, flp, nhc, err), TP_STRUCT__entry( __field( u32, tb_id ) __field( int, err ) __field( int, oif ) __field( int, iif ) __field( u8, proto ) __field( __u8, tos ) __field( __u8, scope ) __field( __u8, flags ) __array( __u8, src, 4 ) __array( __u8, dst, 4 ) __array( __u8, gw4, 4 ) __array( __u8, gw6, 16 ) __field( u16, sport ) __field( u16, dport ) __array(char, name, IFNAMSIZ ) ), TP_fast_assign( struct net_device *dev; struct in6_addr *in6; __be32 *p32; __entry->tb_id = tb_id; __entry->err = err; __entry->oif = flp->flowi4_oif; __entry->iif = flp->flowi4_iif; __entry->tos = flp->flowi4_tos; __entry->scope = flp->flowi4_scope; __entry->flags = flp->flowi4_flags; p32 = (__be32 *) __entry->src; *p32 = flp->saddr; p32 = (__be32 *) __entry->dst; *p32 = flp->daddr; __entry->proto = flp->flowi4_proto; if (__entry->proto == IPPROTO_TCP || __entry->proto == IPPROTO_UDP) { __entry->sport = ntohs(flp->fl4_sport); __entry->dport = ntohs(flp->fl4_dport); } else { __entry->sport = 0; __entry->dport = 0; } dev = nhc ? nhc->nhc_dev : NULL; strscpy(__entry->name, dev ? dev->name : "-", IFNAMSIZ); if (nhc) { if (nhc->nhc_gw_family == AF_INET) { p32 = (__be32 *) __entry->gw4; *p32 = nhc->nhc_gw.ipv4; in6 = (struct in6_addr *)__entry->gw6; *in6 = in6addr_any; } else if (nhc->nhc_gw_family == AF_INET6) { p32 = (__be32 *) __entry->gw4; *p32 = 0; in6 = (struct in6_addr *)__entry->gw6; *in6 = nhc->nhc_gw.ipv6; } } else { p32 = (__be32 *) __entry->gw4; *p32 = 0; in6 = (struct in6_addr *)__entry->gw6; *in6 = in6addr_any; } ), TP_printk("table %u oif %d iif %d proto %u %pI4/%u -> %pI4/%u tos %d scope %d flags %x ==> dev %s gw %pI4/%pI6c err %d", __entry->tb_id, __entry->oif, __entry->iif, __entry->proto, __entry->src, __entry->sport, __entry->dst, __entry->dport, __entry->tos, __entry->scope, __entry->flags, __entry->name, __entry->gw4, __entry->gw6, __entry->err) ); #endif /* _TRACE_FIB_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 347 346 47 47 233 233 487 486 278 276 279 276 279 273 275 278 275 261 260 261 261 374 233 232 234 234 74 74 74 375 161 234 371 74 74 74 73 74 74 74 300 170 172 170 47 146 300 171 172 172 47 47 47 47 301 16 16 16 16 16 16 32 32 32 31 31 32 32 41 42 37 19 19 19 37 37 1 38 37 37 31 37 38 154 10 153 3 3 3 3 4 2 2 4 246 246 245 141 139 245 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 10 10 10 2 2 2 81 270 195 76 76 76 5 5 19 19 19 1054 1056 1069 14 13 13 4 1135 1314 1065 15 11 4 4 1060 14 14 14 14 14 13 12 12 1 14 14 14 1 14 14 13 13 13 1 1 1057 1054 1059 1046 1061 1046 1052 9 9 9 9 1061 1055 9 1056 257 259 238 257 185 257 257 4 4 5 155 152 3 3 3 3 3 3 3 3 140 152 155 152 3 3 1 1 25 25 1 25 25 25 25 19 25 25 25 25 18 25 25 25 1 25 25 25 25 25 23 25 25 18 18 18 18 16 18 18 18 18 18 16 18 11 25 25 25 25 25 25 24 16 42 26 26 24 20 20 20 4 3 20 20 20 1 1 1 1 18 18 19 15 19 2 2 2 2 4 15 15 15 2 15 15 19 14 19 19 19 4 1 15 12 1 1 9 13 1 7 18 18 17 17 14 18 12 18 14 18 18 18 4 14 16 2 18 20 19 19 20 16 7 19 18 46 47 18 47 32 31 16 16 30 32 32 31 3 31 32 32 3 32 31 15 31 32 32 31 20 21 20 21 21 19 21 21 20 19 20 2 19 20 20 19 21 20 21 6 32 43 44 32 19 19 4 4 4 4 2 2 2 1 2 2 20 20 20 20 20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 | /* * mm/rmap.c - physical to virtual reverse mappings * * Copyright 2001, Rik van Riel <riel@conectiva.com.br> * Released under the General Public License (GPL). * * Simple, low overhead reverse mapping scheme. * Please try to keep this thing as modular as possible. * * Provides methods for unmapping each kind of mapped page: * the anon methods track anonymous pages, and * the file methods track pages belonging to an inode. * * Original design by Rik van Riel <riel@conectiva.com.br> 2001 * File methods by Dave McCracken <dmccr@us.ibm.com> 2003, 2004 * Anonymous methods by Andrea Arcangeli <andrea@suse.de> 2004 * Contributions by Hugh Dickins 2003, 2004 */ /* * Lock ordering in mm: * * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) * folio_lock * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) * vma_start_write * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in block_dirty_folio) * i_pages lock (widely used) * lruvec->lru_lock (in folio_lruvec_lock_irq) * inode->i_lock (in set_page_dirty's __mark_inode_dirty) * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) * sb_lock (within inode_lock in fs/fs-writeback.c) * i_pages lock (widely used, in set_page_dirty, * in arch-dependent flush_dcache_mmap_lock, * within bdi.wb->list_lock in __sync_single_inode) * * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock * * hugetlbfs PageHuge() take locks in this order: * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * vma_lock (hugetlb specific lock for pmd_sharing) * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) * folio_lock */ #include <linux/mm.h> #include <linux/sched/mm.h> #include <linux/sched/task.h> #include <linux/pagemap.h> #include <linux/swap.h> #include <linux/swapops.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/ksm.h> #include <linux/rmap.h> #include <linux/rcupdate.h> #include <linux/export.h> #include <linux/memcontrol.h> #include <linux/mmu_notifier.h> #include <linux/migrate.h> #include <linux/hugetlb.h> #include <linux/huge_mm.h> #include <linux/backing-dev.h> #include <linux/page_idle.h> #include <linux/memremap.h> #include <linux/userfaultfd_k.h> #include <linux/mm_inline.h> #include <linux/oom.h> #include <asm/tlbflush.h> #define CREATE_TRACE_POINTS #include <trace/events/tlb.h> #include <trace/events/migrate.h> #include "internal.h" static struct kmem_cache *anon_vma_cachep; static struct kmem_cache *anon_vma_chain_cachep; static inline struct anon_vma *anon_vma_alloc(void) { struct anon_vma *anon_vma; anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL); if (anon_vma) { atomic_set(&anon_vma->refcount, 1); anon_vma->num_children = 0; anon_vma->num_active_vmas = 0; anon_vma->parent = anon_vma; /* * Initialise the anon_vma root to point to itself. If called * from fork, the root will be reset to the parents anon_vma. */ anon_vma->root = anon_vma; } return anon_vma; } static inline void anon_vma_free(struct anon_vma *anon_vma) { VM_BUG_ON(atomic_read(&anon_vma->refcount)); /* * Synchronize against folio_lock_anon_vma_read() such that * we can safely hold the lock without the anon_vma getting * freed. * * Relies on the full mb implied by the atomic_dec_and_test() from * put_anon_vma() against the acquire barrier implied by * down_read_trylock() from folio_lock_anon_vma_read(). This orders: * * folio_lock_anon_vma_read() VS put_anon_vma() * down_read_trylock() atomic_dec_and_test() * LOCK MB * atomic_read() rwsem_is_locked() * * LOCK should suffice since the actual taking of the lock must * happen _before_ what follows. */ might_sleep(); if (rwsem_is_locked(&anon_vma->root->rwsem)) { anon_vma_lock_write(anon_vma); anon_vma_unlock_write(anon_vma); } kmem_cache_free(anon_vma_cachep, anon_vma); } static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp) { return kmem_cache_alloc(anon_vma_chain_cachep, gfp); } static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain) { kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain); } static void anon_vma_chain_link(struct vm_area_struct *vma, struct anon_vma_chain *avc, struct anon_vma *anon_vma) { avc->vma = vma; avc->anon_vma = anon_vma; list_add(&avc->same_vma, &vma->anon_vma_chain); anon_vma_interval_tree_insert(avc, &anon_vma->rb_root); } /** * __anon_vma_prepare - attach an anon_vma to a memory region * @vma: the memory region in question * * This makes sure the memory mapping described by 'vma' has * an 'anon_vma' attached to it, so that we can associate the * anonymous pages mapped into it with that anon_vma. * * The common case will be that we already have one, which * is handled inline by anon_vma_prepare(). But if * not we either need to find an adjacent mapping that we * can re-use the anon_vma from (very common when the only * reason for splitting a vma has been mprotect()), or we * allocate a new one. * * Anon-vma allocations are very subtle, because we may have * optimistically looked up an anon_vma in folio_lock_anon_vma_read() * and that may actually touch the rwsem even in the newly * allocated vma (it depends on RCU to make sure that the * anon_vma isn't actually destroyed). * * As a result, we need to do proper anon_vma locking even * for the new allocation. At the same time, we do not want * to do any locking for the common case of already having * an anon_vma. */ int __anon_vma_prepare(struct vm_area_struct *vma) { struct mm_struct *mm = vma->vm_mm; struct anon_vma *anon_vma, *allocated; struct anon_vma_chain *avc; mmap_assert_locked(mm); might_sleep(); avc = anon_vma_chain_alloc(GFP_KERNEL); if (!avc) goto out_enomem; anon_vma = find_mergeable_anon_vma(vma); allocated = NULL; if (!anon_vma) { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) goto out_enomem_free_avc; anon_vma->num_children++; /* self-parent link for new root */ allocated = anon_vma; } anon_vma_lock_write(anon_vma); /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma; anon_vma_chain_link(vma, avc, anon_vma); anon_vma->num_active_vmas++; allocated = NULL; avc = NULL; } spin_unlock(&mm->page_table_lock); anon_vma_unlock_write(anon_vma); if (unlikely(allocated)) put_anon_vma(allocated); if (unlikely(avc)) anon_vma_chain_free(avc); return 0; out_enomem_free_avc: anon_vma_chain_free(avc); out_enomem: return -ENOMEM; } /* * This is a useful helper function for locking the anon_vma root as * we traverse the vma->anon_vma_chain, looping over anon_vma's that * have the same vma. * * Such anon_vma's should have the same root, so you'd expect to see * just a single mutex_lock for the whole traversal. */ static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma) { struct anon_vma *new_root = anon_vma->root; if (new_root != root) { if (WARN_ON_ONCE(root)) up_write(&root->rwsem); root = new_root; down_write(&root->rwsem); } return root; } static inline void unlock_anon_vma_root(struct anon_vma *root) { if (root) up_write(&root->rwsem); } /* * Attach the anon_vmas from src to dst. * Returns 0 on success, -ENOMEM on failure. * * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(), * copy_vma() and anon_vma_fork(). The first four want an exact copy of src, * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before * call, we can identify this case by checking (!dst->anon_vma && * src->anon_vma). * * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find * and reuse existing anon_vma which has no vmas and only one child anon_vma. * This prevents degradation of anon_vma hierarchy to endless linear chain in * case of constantly forking task. On the other hand, an anon_vma with more * than one child isn't reused even if there was no alive vma, thus rmap * walker has a good chance of avoiding scanning the whole hierarchy when it * searches where page is mapped. */ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; struct anon_vma *root = NULL; list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { struct anon_vma *anon_vma; avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN); if (unlikely(!avc)) { unlock_anon_vma_root(root); root = NULL; avc = anon_vma_chain_alloc(GFP_KERNEL); if (!avc) goto enomem_failure; } anon_vma = pavc->anon_vma; root = lock_anon_vma_root(root, anon_vma); anon_vma_chain_link(dst, avc, anon_vma); /* * Reuse existing anon_vma if it has no vma and only one * anon_vma child. * * Root anon_vma is never reused: * it has self-parent reference and at least one child. */ if (!dst->anon_vma && src->anon_vma && anon_vma->num_children < 2 && anon_vma->num_active_vmas == 0) dst->anon_vma = anon_vma; } if (dst->anon_vma) dst->anon_vma->num_active_vmas++; unlock_anon_vma_root(root); return 0; enomem_failure: /* * dst->anon_vma is dropped here otherwise its num_active_vmas can * be incorrectly decremented in unlink_anon_vmas(). * We can safely do this because callers of anon_vma_clone() don't care * about dst->anon_vma if anon_vma_clone() failed. */ dst->anon_vma = NULL; unlink_anon_vmas(dst); return -ENOMEM; } /* * Attach vma to its own anon_vma, as well as to the anon_vmas that * the corresponding VMA in the parent process is attached to. * Returns 0 on success, non-zero on failure. */ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) { struct anon_vma_chain *avc; struct anon_vma *anon_vma; int error; /* Don't bother if the parent process has no anon_vma here. */ if (!pvma->anon_vma) return 0; /* Drop inherited anon_vma, we'll reuse existing or allocate new. */ vma->anon_vma = NULL; /* * First, attach the new VMA to the parent VMA's anon_vmas, * so rmap can find non-COWed pages in child processes. */ error = anon_vma_clone(vma, pvma); if (error) return error; /* An existing anon_vma has been reused, all done then. */ if (vma->anon_vma) return 0; /* Then add our own anon_vma. */ anon_vma = anon_vma_alloc(); if (!anon_vma) goto out_error; anon_vma->num_active_vmas++; avc = anon_vma_chain_alloc(GFP_KERNEL); if (!avc) goto out_error_free_anon_vma; /* * The root anon_vma's rwsem is the lock actually used when we * lock any of the anon_vmas in this anon_vma tree. */ anon_vma->root = pvma->anon_vma->root; anon_vma->parent = pvma->anon_vma; /* * With refcounts, an anon_vma can stay around longer than the * process it belongs to. The root anon_vma needs to be pinned until * this anon_vma is freed, because the lock lives in the root. */ get_anon_vma(anon_vma->root); /* Mark this anon_vma as the one where our new (COWed) pages go. */ vma->anon_vma = anon_vma; anon_vma_lock_write(anon_vma); anon_vma_chain_link(vma, avc, anon_vma); anon_vma->parent->num_children++; anon_vma_unlock_write(anon_vma); return 0; out_error_free_anon_vma: put_anon_vma(anon_vma); out_error: unlink_anon_vmas(vma); return -ENOMEM; } void unlink_anon_vmas(struct vm_area_struct *vma) { struct anon_vma_chain *avc, *next; struct anon_vma *root = NULL; /* * Unlink each anon_vma chained to the VMA. This list is ordered * from newest to oldest, ensuring the root anon_vma gets freed last. */ list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) { struct anon_vma *anon_vma = avc->anon_vma; root = lock_anon_vma_root(root, anon_vma); anon_vma_interval_tree_remove(avc, &anon_vma->rb_root); /* * Leave empty anon_vmas on the list - we'll need * to free them outside the lock. */ if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) { anon_vma->parent->num_children--; continue; } list_del(&avc->same_vma); anon_vma_chain_free(avc); } if (vma->anon_vma) { vma->anon_vma->num_active_vmas--; /* * vma would still be needed after unlink, and anon_vma will be prepared * when handle fault. */ vma->anon_vma = NULL; } unlock_anon_vma_root(root); /* * Iterate the list once more, it now only contains empty and unlinked * anon_vmas, destroy them. Could not do before due to __put_anon_vma() * needing to write-acquire the anon_vma->root->rwsem. */ list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) { struct anon_vma *anon_vma = avc->anon_vma; VM_WARN_ON(anon_vma->num_children); VM_WARN_ON(anon_vma->num_active_vmas); put_anon_vma(anon_vma); list_del(&avc->same_vma); anon_vma_chain_free(avc); } } static void anon_vma_ctor(void *data) { struct anon_vma *anon_vma = data; init_rwsem(&anon_vma->rwsem); atomic_set(&anon_vma->refcount, 0); anon_vma->rb_root = RB_ROOT_CACHED; } void __init anon_vma_init(void) { anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct anon_vma), 0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT, anon_vma_ctor); anon_vma_chain_cachep = KMEM_CACHE(anon_vma_chain, SLAB_PANIC|SLAB_ACCOUNT); } /* * Getting a lock on a stable anon_vma from a page off the LRU is tricky! * * Since there is no serialization what so ever against folio_remove_rmap_*() * the best this function can do is return a refcount increased anon_vma * that might have been relevant to this page. * * The page might have been remapped to a different anon_vma or the anon_vma * returned may already be freed (and even reused). * * In case it was remapped to a different anon_vma, the new anon_vma will be a * child of the old anon_vma, and the anon_vma lifetime rules will therefore * ensure that any anon_vma obtained from the page will still be valid for as * long as we observe page_mapped() [ hence all those page_mapped() tests ]. * * All users of this function must be very careful when walking the anon_vma * chain and verify that the page in question is indeed mapped in it * [ something equivalent to page_mapped_in_vma() ]. * * Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid * if there is a mapcount, we can dereference the anon_vma after observing * those. * * NOTE: the caller should normally hold folio lock when calling this. If * not, the caller needs to double check the anon_vma didn't change after * taking the anon_vma lock for either read or write (UFFDIO_MOVE can modify it * concurrently without folio lock protection). See folio_lock_anon_vma_read() * which has already covered that, and comment above remap_pages(). */ struct anon_vma *folio_get_anon_vma(const struct folio *folio) { struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; rcu_read_lock(); anon_mapping = (unsigned long)READ_ONCE(folio->mapping); if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) goto out; if (!folio_mapped(folio)) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); if (!atomic_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; goto out; } /* * If this folio is still mapped, then its anon_vma cannot have been * freed. But if it has been unmapped, we have no security against the * anon_vma structure being freed and reused (for another anon_vma: * SLAB_TYPESAFE_BY_RCU guarantees that - so the atomic_inc_not_zero() * above cannot corrupt). */ if (!folio_mapped(folio)) { rcu_read_unlock(); put_anon_vma(anon_vma); return NULL; } out: rcu_read_unlock(); return anon_vma; } /* * Similar to folio_get_anon_vma() except it locks the anon_vma. * * Its a little more complex as it tries to keep the fast path to a single * atomic op -- the trylock. If we fail the trylock, we fall back to getting a * reference like with folio_get_anon_vma() and then block on the mutex * on !rwc->try_lock case. */ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, struct rmap_walk_control *rwc) { struct anon_vma *anon_vma = NULL; struct anon_vma *root_anon_vma; unsigned long anon_mapping; retry: rcu_read_lock(); anon_mapping = (unsigned long)READ_ONCE(folio->mapping); if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) goto out; if (!folio_mapped(folio)) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); root_anon_vma = READ_ONCE(anon_vma->root); if (down_read_trylock(&root_anon_vma->rwsem)) { /* * folio_move_anon_rmap() might have changed the anon_vma as we * might not hold the folio lock here. */ if (unlikely((unsigned long)READ_ONCE(folio->mapping) != anon_mapping)) { up_read(&root_anon_vma->rwsem); rcu_read_unlock(); goto retry; } /* * If the folio is still mapped, then this anon_vma is still * its anon_vma, and holding the mutex ensures that it will * not go away, see anon_vma_free(). */ if (!folio_mapped(folio)) { up_read(&root_anon_vma->rwsem); anon_vma = NULL; } goto out; } if (rwc && rwc->try_lock) { anon_vma = NULL; rwc->contended = true; goto out; } /* trylock failed, we got to sleep */ if (!atomic_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; goto out; } if (!folio_mapped(folio)) { rcu_read_unlock(); put_anon_vma(anon_vma); return NULL; } /* we pinned the anon_vma, its safe to sleep */ rcu_read_unlock(); anon_vma_lock_read(anon_vma); /* * folio_move_anon_rmap() might have changed the anon_vma as we might * not hold the folio lock here. */ if (unlikely((unsigned long)READ_ONCE(folio->mapping) != anon_mapping)) { anon_vma_unlock_read(anon_vma); put_anon_vma(anon_vma); anon_vma = NULL; goto retry; } if (atomic_dec_and_test(&anon_vma->refcount)) { /* * Oops, we held the last refcount, release the lock * and bail -- can't simply use put_anon_vma() because * we'll deadlock on the anon_vma_lock_write() recursion. */ anon_vma_unlock_read(anon_vma); __put_anon_vma(anon_vma); anon_vma = NULL; } return anon_vma; out: rcu_read_unlock(); return anon_vma; } #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed * before any IO is initiated on the page to prevent lost writes. Similarly, * it must be flushed before freeing to prevent data leakage. */ void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; if (!tlb_ubc->flush_required) return; arch_tlbbatch_flush(&tlb_ubc->arch); tlb_ubc->flush_required = false; tlb_ubc->writable = false; } /* Flush iff there are potentially writable TLB entries that can race with IO */ void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; if (tlb_ubc->writable) try_to_unmap_flush(); } /* * Bits 0-14 of mm->tlb_flush_batched record pending generations. * Bits 16-30 of mm->tlb_flush_batched bit record flushed generations. */ #define TLB_FLUSH_BATCH_FLUSHED_SHIFT 16 #define TLB_FLUSH_BATCH_PENDING_MASK \ ((1 << (TLB_FLUSH_BATCH_FLUSHED_SHIFT - 1)) - 1) #define TLB_FLUSH_BATCH_PENDING_LARGE \ (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; bool writable = pte_dirty(pteval); if (!pte_accessible(mm, pteval)) return; arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. */ barrier(); batch = atomic_read(&mm->tlb_flush_batched); retry: if ((batch & TLB_FLUSH_BATCH_PENDING_MASK) > TLB_FLUSH_BATCH_PENDING_LARGE) { /* * Prevent `pending' from catching up with `flushed' because of * overflow. Reset `pending' and `flushed' to be 1 and 0 if * `pending' becomes large. */ if (!atomic_try_cmpxchg(&mm->tlb_flush_batched, &batch, 1)) goto retry; } else { atomic_inc(&mm->tlb_flush_batched); } /* * If the PTE was dirty then it's best to assume it's writable. The * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush() * before the page is queued for IO. */ if (writable) tlb_ubc->writable = true; } /* * Returns true if the TLB flush should be deferred to the end of a batch of * unmap operations to reduce IPIs. */ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) { if (!(flags & TTU_BATCH_FLUSH)) return false; return arch_tlbbatch_should_defer(mm); } /* * Reclaim unmaps pages under the PTL but do not flush the TLB prior to * releasing the PTL if TLB flushes are batched. It's possible for a parallel * operation such as mprotect or munmap to race between reclaim unmapping * the page and flushing the page. If this race occurs, it potentially allows * access to data via a stale TLB entry. Tracking all mm's that have TLB * batching in flight would be expensive during reclaim so instead track * whether TLB batching occurred in the past and if so then do a flush here * if required. This will cost one additional flush per reclaim cycle paid * by the first operation at risk such as mprotect and mumap. * * This must be called under the PTL so that an access to tlb_flush_batched * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise * via the PTL. */ void flush_tlb_batched_pending(struct mm_struct *mm) { int batch = atomic_read(&mm->tlb_flush_batched); int pending = batch & TLB_FLUSH_BATCH_PENDING_MASK; int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; if (pending != flushed) { arch_flush_tlb_batched_pending(mm); /* * If the new TLB flushing is pending during flushing, leave * mm->tlb_flush_batched as is, to avoid losing flushing. */ atomic_cmpxchg(&mm->tlb_flush_batched, batch, pending | (pending << TLB_FLUSH_BATCH_FLUSHED_SHIFT)); } } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end) { } static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) { return false; } #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ /** * page_address_in_vma - The virtual address of a page in this VMA. * @folio: The folio containing the page. * @page: The page within the folio. * @vma: The VMA we need to know the address in. * * Calculates the user virtual address of this page in the specified VMA. * It is the caller's responsibility to check the page is actually * within the VMA. There may not currently be a PTE pointing at this * page, but if a page fault occurs at this address, this is the page * which will be accessed. * * Context: Caller should hold a reference to the folio. Caller should * hold a lock (eg the i_mmap_lock or the mmap_lock) which keeps the * VMA from being altered. * * Return: The virtual address corresponding to this page in the VMA. */ unsigned long page_address_in_vma(const struct folio *folio, const struct page *page, const struct vm_area_struct *vma) { if (folio_test_anon(folio)) { struct anon_vma *anon_vma = folio_anon_vma(folio); /* * Note: swapoff's unuse_vma() is more efficient with this * check, and needs it to match anon_vma when KSM is active. */ if (!vma->anon_vma || !anon_vma || vma->anon_vma->root != anon_vma->root) return -EFAULT; } else if (!vma->vm_file) { return -EFAULT; } else if (vma->vm_file->f_mapping != folio->mapping) { return -EFAULT; } /* KSM folios don't reach here because of the !anon_vma check */ return vma_address(vma, page_pgoff(folio, page), 1); } /* * Returns the actual pmd_t* where we expect 'address' to be mapped from, or * NULL if it doesn't exist. No guarantees / checks on what the pmd_t* * represents. */ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) goto out; p4d = p4d_offset(pgd, address); if (!p4d_present(*p4d)) goto out; pud = pud_offset(p4d, address); if (!pud_present(*pud)) goto out; pmd = pmd_offset(pud, address); out: return pmd; } struct folio_referenced_arg { int mapcount; int referenced; unsigned long vm_flags; struct mem_cgroup *memcg; }; /* * arg: folio_referenced_arg will be passed */ static bool folio_referenced_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { struct folio_referenced_arg *pra = arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); int referenced = 0; unsigned long start = address, ptes = 0; while (page_vma_mapped_walk(&pvmw)) { address = pvmw.address; if (vma->vm_flags & VM_LOCKED) { if (!folio_test_large(folio) || !pvmw.pte) { /* Restore the mlock which got missed */ mlock_vma_folio(folio, vma); page_vma_mapped_walk_done(&pvmw); pra->vm_flags |= VM_LOCKED; return false; /* To break the loop */ } /* * For large folio fully mapped to VMA, will * be handled after the pvmw loop. * * For large folio cross VMA boundaries, it's * expected to be picked by page reclaim. But * should skip reference of pages which are in * the range of VM_LOCKED vma. As page reclaim * should just count the reference of pages out * the range of VM_LOCKED vma. */ ptes++; pra->mapcount--; continue; } /* * Skip the non-shared swapbacked folio mapped solely by * the exiting or OOM-reaped process. This avoids redundant * swap-out followed by an immediate unmap. */ if ((!atomic_read(&vma->vm_mm->mm_users) || check_stable_address_space(vma->vm_mm)) && folio_test_anon(folio) && folio_test_swapbacked(folio) && !folio_maybe_mapped_shared(folio)) { pra->referenced = -1; page_vma_mapped_walk_done(&pvmw); return false; } if (lru_gen_enabled() && pvmw.pte) { if (lru_gen_look_around(&pvmw)) referenced++; } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (pmdp_clear_flush_young_notify(vma, address, pvmw.pmd)) referenced++; } else { /* unexpected pmd-mapped folio? */ WARN_ON_ONCE(1); } pra->mapcount--; } if ((vma->vm_flags & VM_LOCKED) && folio_test_large(folio) && folio_within_vma(folio, vma)) { unsigned long s_align, e_align; s_align = ALIGN_DOWN(start, PMD_SIZE); e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE); /* folio doesn't cross page table boundary and fully mapped */ if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) { /* Restore the mlock which got missed */ mlock_vma_folio(folio, vma); pra->vm_flags |= VM_LOCKED; return false; /* To break the loop */ } } if (referenced) folio_clear_idle(folio); if (folio_test_clear_young(folio)) referenced++; if (referenced) { pra->referenced++; pra->vm_flags |= vma->vm_flags & ~VM_LOCKED; } if (!pra->mapcount) return false; /* To break the loop */ return true; } static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) { struct folio_referenced_arg *pra = arg; struct mem_cgroup *memcg = pra->memcg; /* * Ignore references from this mapping if it has no recency. If the * folio has been used in another mapping, we will catch it; if this * other mapping is already gone, the unmap path will have set the * referenced flag or activated the folio in zap_pte_range(). */ if (!vma_has_recency(vma)) return true; /* * If we are reclaiming on behalf of a cgroup, skip counting on behalf * of references from different cgroups. */ if (memcg && !mm_match_cgroup(vma->vm_mm, memcg)) return true; return false; } /** * folio_referenced() - Test if the folio was referenced. * @folio: The folio to test. * @is_locked: Caller holds lock on the folio. * @memcg: target memory cgroup * @vm_flags: A combination of all the vma->vm_flags which referenced the folio. * * Quick test_and_clear_referenced for all mappings of a folio, * * Return: The number of mappings which referenced the folio. Return -1 if * the function bailed out due to rmap lock contention. */ int folio_referenced(struct folio *folio, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags) { bool we_locked = false; struct folio_referenced_arg pra = { .mapcount = folio_mapcount(folio), .memcg = memcg, }; struct rmap_walk_control rwc = { .rmap_one = folio_referenced_one, .arg = (void *)&pra, .anon_lock = folio_lock_anon_vma_read, .try_lock = true, .invalid_vma = invalid_folio_referenced_vma, }; *vm_flags = 0; if (!pra.mapcount) return 0; if (!folio_raw_mapping(folio)) return 0; if (!is_locked && (!folio_test_anon(folio) || folio_test_ksm(folio))) { we_locked = folio_trylock(folio); if (!we_locked) return 1; } rmap_walk(folio, &rwc); *vm_flags = pra.vm_flags; if (we_locked) folio_unlock(folio); return rwc.contended ? -1 : pra.referenced; } static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw) { int cleaned = 0; struct vm_area_struct *vma = pvmw->vma; struct mmu_notifier_range range; unsigned long address = pvmw->address; /* * We have to assume the worse case ie pmd for invalidation. Note that * the folio can not be freed from this function. */ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, 0, vma->vm_mm, address, vma_address_end(pvmw)); mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(pvmw)) { int ret = 0; address = pvmw->address; if (pvmw->pte) { pte_t *pte = pvmw->pte; pte_t entry = ptep_get(pte); /* * PFN swap PTEs, such as device-exclusive ones, that * actually map pages are clean and not writable from a * CPU perspective. The MMU notifier takes care of any * device aspects. */ if (!pte_present(entry)) continue; if (!pte_dirty(entry) && !pte_write(entry)) continue; flush_cache_page(vma, address, pte_pfn(entry)); entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(vma->vm_mm, address, pte, entry); ret = 1; } else { #ifdef CONFIG_TRANSPARENT_HUGEPAGE pmd_t *pmd = pvmw->pmd; pmd_t entry; if (!pmd_dirty(*pmd) && !pmd_write(*pmd)) continue; flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); entry = pmdp_invalidate(vma, address, pmd); entry = pmd_wrprotect(entry); entry = pmd_mkclean(entry); set_pmd_at(vma->vm_mm, address, pmd, entry); ret = 1; #else /* unexpected pmd-mapped folio? */ WARN_ON_ONCE(1); #endif } if (ret) cleaned++; } mmu_notifier_invalidate_range_end(&range); return cleaned; } static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC); int *cleaned = arg; *cleaned += page_vma_mkclean_one(&pvmw); return true; } static bool invalid_mkclean_vma(struct vm_area_struct *vma, void *arg) { if (vma->vm_flags & VM_SHARED) return false; return true; } int folio_mkclean(struct folio *folio) { int cleaned = 0; struct address_space *mapping; struct rmap_walk_control rwc = { .arg = (void *)&cleaned, .rmap_one = page_mkclean_one, .invalid_vma = invalid_mkclean_vma, }; BUG_ON(!folio_test_locked(folio)); if (!folio_mapped(folio)) return 0; mapping = folio_mapping(folio); if (!mapping) return 0; rmap_walk(folio, &rwc); return cleaned; } EXPORT_SYMBOL_GPL(folio_mkclean); struct wrprotect_file_state { int cleaned; pgoff_t pgoff; unsigned long pfn; unsigned long nr_pages; }; static bool mapping_wrprotect_range_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { struct wrprotect_file_state *state = (struct wrprotect_file_state *)arg; struct page_vma_mapped_walk pvmw = { .pfn = state->pfn, .nr_pages = state->nr_pages, .pgoff = state->pgoff, .vma = vma, .address = address, .flags = PVMW_SYNC, }; state->cleaned += page_vma_mkclean_one(&pvmw); return true; } static void __rmap_walk_file(struct folio *folio, struct address_space *mapping, pgoff_t pgoff_start, unsigned long nr_pages, struct rmap_walk_control *rwc, bool locked); /** * mapping_wrprotect_range() - Write-protect all mappings in a specified range. * * @mapping: The mapping whose reverse mapping should be traversed. * @pgoff: The page offset at which @pfn is mapped within @mapping. * @pfn: The PFN of the page mapped in @mapping at @pgoff. * @nr_pages: The number of physically contiguous base pages spanned. * * Traverses the reverse mapping, finding all VMAs which contain a shared * mapping of the pages in the specified range in @mapping, and write-protects * them (that is, updates the page tables to mark the mappings read-only such * that a write protection fault arises when the mappings are written to). * * The @pfn value need not refer to a folio, but rather can reference a kernel * allocation which is mapped into userland. We therefore do not require that * the page maps to a folio with a valid mapping or index field, rather the * caller specifies these in @mapping and @pgoff. * * Return: the number of write-protected PTEs, or an error. */ int mapping_wrprotect_range(struct address_space *mapping, pgoff_t pgoff, unsigned long pfn, unsigned long nr_pages) { struct wrprotect_file_state state = { .cleaned = 0, .pgoff = pgoff, .pfn = pfn, .nr_pages = nr_pages, }; struct rmap_walk_control rwc = { .arg = (void *)&state, .rmap_one = mapping_wrprotect_range_one, .invalid_vma = invalid_mkclean_vma, }; if (!mapping) return 0; __rmap_walk_file(/* folio = */NULL, mapping, pgoff, nr_pages, &rwc, /* locked = */false); return state.cleaned; } EXPORT_SYMBOL_GPL(mapping_wrprotect_range); /** * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of * [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff) * within the @vma of shared mappings. And since clean PTEs * should also be readonly, write protects them too. * @pfn: start pfn. * @nr_pages: number of physically contiguous pages srarting with @pfn. * @pgoff: page offset that the @pfn mapped with. * @vma: vma that @pfn mapped within. * * Returns the number of cleaned PTEs (including PMDs). */ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, struct vm_area_struct *vma) { struct page_vma_mapped_walk pvmw = { .pfn = pfn, .nr_pages = nr_pages, .pgoff = pgoff, .vma = vma, .flags = PVMW_SYNC, }; if (invalid_mkclean_vma(vma, NULL)) return 0; pvmw.address = vma_address(vma, pgoff, nr_pages); VM_BUG_ON_VMA(pvmw.address == -EFAULT, vma); return page_vma_mkclean_one(&pvmw); } static __always_inline unsigned int __folio_add_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, enum rmap_level level, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; const int orig_nr_pages = nr_pages; int first = 0, nr = 0; __folio_rmap_sanity_checks(folio, page, nr_pages, level); switch (level) { case RMAP_LEVEL_PTE: if (!folio_test_large(folio)) { nr = atomic_inc_and_test(&folio->_mapcount); break; } if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) { nr = folio_add_return_large_mapcount(folio, orig_nr_pages, vma); if (nr == orig_nr_pages) /* Was completely unmapped. */ nr = folio_large_nr_pages(folio); else nr = 0; break; } do { first += atomic_inc_and_test(&page->_mapcount); } while (page++, --nr_pages > 0); if (first && atomic_add_return_relaxed(first, mapped) < ENTIRELY_MAPPED) nr = first; folio_add_large_mapcount(folio, orig_nr_pages, vma); break; case RMAP_LEVEL_PMD: case RMAP_LEVEL_PUD: first = atomic_inc_and_test(&folio->_entire_mapcount); if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) { if (level == RMAP_LEVEL_PMD && first) *nr_pmdmapped = folio_large_nr_pages(folio); nr = folio_inc_return_large_mapcount(folio, vma); if (nr == 1) /* Was completely unmapped. */ nr = folio_large_nr_pages(folio); else nr = 0; break; } if (first) { nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped); if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) { nr_pages = folio_large_nr_pages(folio); /* * We only track PMD mappings of PMD-sized * folios separately. */ if (level == RMAP_LEVEL_PMD) *nr_pmdmapped = nr_pages; nr = nr_pages - (nr & FOLIO_PAGES_MAPPED); /* Raced ahead of a remove and another add? */ if (unlikely(nr < 0)) nr = 0; } else { /* Raced ahead of a remove of ENTIRELY_MAPPED */ nr = 0; } } folio_inc_large_mapcount(folio, vma); break; } return nr; } /** * folio_move_anon_rmap - move a folio to our anon_vma * @folio: The folio to move to our anon_vma * @vma: The vma the folio belongs to * * When a folio belongs exclusively to one process after a COW event, * that folio can be moved into the anon_vma that belongs to just that * process, so the rmap code will not search the parent or sibling processes. */ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma) { void *anon_vma = vma->anon_vma; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_VMA(!anon_vma, vma); anon_vma += PAGE_MAPPING_ANON; /* * Ensure that anon_vma and the PAGE_MAPPING_ANON bit are written * simultaneously, so a concurrent reader (eg folio_referenced()'s * folio_test_anon()) will not see one without the other. */ WRITE_ONCE(folio->mapping, anon_vma); } /** * __folio_set_anon - set up a new anonymous rmap for a folio * @folio: The folio to set up the new anonymous rmap for. * @vma: VM area to add the folio to. * @address: User virtual address of the mapping * @exclusive: Whether the folio is exclusive to the process. */ static void __folio_set_anon(struct folio *folio, struct vm_area_struct *vma, unsigned long address, bool exclusive) { struct anon_vma *anon_vma = vma->anon_vma; BUG_ON(!anon_vma); /* * If the folio isn't exclusive to this vma, we must use the _oldest_ * possible anon_vma for the folio mapping! */ if (!exclusive) anon_vma = anon_vma->root; /* * page_idle does a lockless/optimistic rmap scan on folio->mapping. * Make sure the compiler doesn't split the stores of anon_vma and * the PAGE_MAPPING_ANON type identifier, otherwise the rmap code * could mistake the mapping for a struct address_space and crash. */ anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; WRITE_ONCE(folio->mapping, (struct address_space *) anon_vma); folio->index = linear_page_index(vma, address); } /** * __page_check_anon_rmap - sanity check anonymous rmap addition * @folio: The folio containing @page. * @page: the page to check the mapping of * @vma: the vm area in which the mapping is added * @address: the user virtual address mapped */ static void __page_check_anon_rmap(const struct folio *folio, const struct page *page, struct vm_area_struct *vma, unsigned long address) { /* * The page's anon-rmap details (mapping and index) are guaranteed to * be set up correctly at this point. * * We have exclusion against folio_add_anon_rmap_*() because the caller * always holds the page locked. * * We have exclusion against folio_add_new_anon_rmap because those pages * are initially only visible via the pagetables, and the pte is locked * over the call to folio_add_new_anon_rmap. */ VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root != vma->anon_vma->root, folio); VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address), page); } static void __folio_mod_stat(struct folio *folio, int nr, int nr_pmdmapped) { int idx; if (nr) { idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED; __lruvec_stat_mod_folio(folio, idx, nr); } if (nr_pmdmapped) { if (folio_test_anon(folio)) { idx = NR_ANON_THPS; __lruvec_stat_mod_folio(folio, idx, nr_pmdmapped); } else { /* NR_*_PMDMAPPED are not maintained per-memcg */ idx = folio_test_swapbacked(folio) ? NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED; __mod_node_page_state(folio_pgdat(folio), idx, nr_pmdmapped); } } } static __always_inline void __folio_add_anon_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, unsigned long address, rmap_t flags, enum rmap_level level) { int i, nr, nr_pmdmapped = 0; VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped); if (likely(!folio_test_ksm(folio))) __page_check_anon_rmap(folio, page, vma, address); __folio_mod_stat(folio, nr, nr_pmdmapped); if (flags & RMAP_EXCLUSIVE) { switch (level) { case RMAP_LEVEL_PTE: for (i = 0; i < nr_pages; i++) SetPageAnonExclusive(page + i); break; case RMAP_LEVEL_PMD: SetPageAnonExclusive(page); break; case RMAP_LEVEL_PUD: /* * Keep the compiler happy, we don't support anonymous * PUD mappings. */ WARN_ON_ONCE(1); break; } } VM_WARN_ON_FOLIO(!folio_test_large(folio) && PageAnonExclusive(page) && atomic_read(&folio->_mapcount) > 0, folio); for (i = 0; i < nr_pages; i++) { struct page *cur_page = page + i; VM_WARN_ON_FOLIO(folio_test_large(folio) && folio_entire_mapcount(folio) > 1 && PageAnonExclusive(cur_page), folio); if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) continue; /* * While PTE-mapping a THP we have a PMD and a PTE * mapping. */ VM_WARN_ON_FOLIO(atomic_read(&cur_page->_mapcount) > 0 && PageAnonExclusive(cur_page), folio); } /* * For large folio, only mlock it if it's fully mapped to VMA. It's * not easy to check whether the large folio is fully mapped to VMA * here. Only mlock normal 4K folio and leave page reclaim to handle * large folio. */ if (!folio_test_large(folio)) mlock_vma_folio(folio, vma); } /** * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio * @folio: The folio to add the mappings to * @page: The first page to add * @nr_pages: The number of pages which will be mapped * @vma: The vm area in which the mappings are added * @address: The user virtual address of the first page to map * @flags: The rmap flags * * The page range of folio is defined by [first_page, first_page + nr_pages) * * The caller needs to hold the page table lock, and the page must be locked in * the anon_vma case: to serialize mapping,index checking after setting, * and to ensure that an anon folio is not being upgraded racily to a KSM folio * (but KSM folios are never downgraded). */ void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags, RMAP_LEVEL_PTE); } /** * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio * @folio: The folio to add the mapping to * @page: The first page to add * @vma: The vm area in which the mapping is added * @address: The user virtual address of the first page to map * @flags: The rmap flags * * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR) * * The caller needs to hold the page table lock, and the page must be locked in * the anon_vma case: to serialize mapping,index checking after setting. */ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags, RMAP_LEVEL_PMD); #else WARN_ON_ONCE(true); #endif } /** * folio_add_new_anon_rmap - Add mapping to a new anonymous folio. * @folio: The folio to add the mapping to. * @vma: the vm area in which the mapping is added * @address: the user virtual address mapped * @flags: The rmap flags * * Like folio_add_anon_rmap_*() but must only be called on *new* folios. * This means the inc-and-test can be bypassed. * The folio doesn't necessarily need to be locked while it's exclusive * unless two threads map it concurrently. However, the folio must be * locked if it's shared. * * If the folio is pmd-mappable, it is accounted as a THP. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { const bool exclusive = flags & RMAP_EXCLUSIVE; int nr = 1, nr_pmdmapped = 0; VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_WARN_ON_FOLIO(!exclusive && !folio_test_locked(folio), folio); /* * VM_DROPPABLE mappings don't swap; instead they're just dropped when * under memory pressure. */ if (!folio_test_swapbacked(folio) && !(vma->vm_flags & VM_DROPPABLE)) __folio_set_swapbacked(folio); __folio_set_anon(folio, vma, address, exclusive); if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); if (exclusive) SetPageAnonExclusive(&folio->page); } else if (!folio_test_pmd_mappable(folio)) { int i; nr = folio_large_nr_pages(folio); for (i = 0; i < nr; i++) { struct page *page = folio_page(folio, i); if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) /* increment count (starts at -1) */ atomic_set(&page->_mapcount, 0); if (exclusive) SetPageAnonExclusive(page); } folio_set_large_mapcount(folio, nr, vma); if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) atomic_set(&folio->_nr_pages_mapped, nr); } else { nr = folio_large_nr_pages(folio); /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); folio_set_large_mapcount(folio, 1, vma); if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED); if (exclusive) SetPageAnonExclusive(&folio->page); nr_pmdmapped = nr; } VM_WARN_ON_ONCE(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end); __folio_mod_stat(folio, nr, nr_pmdmapped); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); } static __always_inline void __folio_add_file_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, enum rmap_level level) { int nr, nr_pmdmapped = 0; VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped); __folio_mod_stat(folio, nr, nr_pmdmapped); /* See comments in folio_add_anon_rmap_*() */ if (!folio_test_large(folio)) mlock_vma_folio(folio, vma); } /** * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio * @folio: The folio to add the mappings to * @page: The first page to add * @nr_pages: The number of pages that will be mapped using PTEs * @vma: The vm area in which the mappings are added * * The page range of the folio is defined by [page, page + nr_pages) * * The caller needs to hold the page table lock. */ void folio_add_file_rmap_ptes(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma) { __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE); } /** * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio * @folio: The folio to add the mapping to * @page: The first page to add * @vma: The vm area in which the mapping is added * * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) * * The caller needs to hold the page table lock. */ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page, struct vm_area_struct *vma) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD); #else WARN_ON_ONCE(true); #endif } /** * folio_add_file_rmap_pud - add a PUD mapping to a page range of a folio * @folio: The folio to add the mapping to * @page: The first page to add * @vma: The vm area in which the mapping is added * * The page range of the folio is defined by [page, page + HPAGE_PUD_NR) * * The caller needs to hold the page table lock. */ void folio_add_file_rmap_pud(struct folio *folio, struct page *page, struct vm_area_struct *vma) { #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) __folio_add_file_rmap(folio, page, HPAGE_PUD_NR, vma, RMAP_LEVEL_PUD); #else WARN_ON_ONCE(true); #endif } static __always_inline void __folio_remove_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, enum rmap_level level) { atomic_t *mapped = &folio->_nr_pages_mapped; int last = 0, nr = 0, nr_pmdmapped = 0; bool partially_mapped = false; __folio_rmap_sanity_checks(folio, page, nr_pages, level); switch (level) { case RMAP_LEVEL_PTE: if (!folio_test_large(folio)) { nr = atomic_add_negative(-1, &folio->_mapcount); break; } if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) { nr = folio_sub_return_large_mapcount(folio, nr_pages, vma); if (!nr) { /* Now completely unmapped. */ nr = folio_nr_pages(folio); } else { partially_mapped = nr < folio_large_nr_pages(folio) && !folio_entire_mapcount(folio); nr = 0; } break; } folio_sub_large_mapcount(folio, nr_pages, vma); do { last += atomic_add_negative(-1, &page->_mapcount); } while (page++, --nr_pages > 0); if (last && atomic_sub_return_relaxed(last, mapped) < ENTIRELY_MAPPED) nr = last; partially_mapped = nr && atomic_read(mapped); break; case RMAP_LEVEL_PMD: case RMAP_LEVEL_PUD: if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) { last = atomic_add_negative(-1, &folio->_entire_mapcount); if (level == RMAP_LEVEL_PMD && last) nr_pmdmapped = folio_large_nr_pages(folio); nr = folio_dec_return_large_mapcount(folio, vma); if (!nr) { /* Now completely unmapped. */ nr = folio_large_nr_pages(folio); } else { partially_mapped = last && nr < folio_large_nr_pages(folio); nr = 0; } break; } folio_dec_large_mapcount(folio, vma); last = atomic_add_negative(-1, &folio->_entire_mapcount); if (last) { nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped); if (likely(nr < ENTIRELY_MAPPED)) { nr_pages = folio_large_nr_pages(folio); if (level == RMAP_LEVEL_PMD) nr_pmdmapped = nr_pages; nr = nr_pages - (nr & FOLIO_PAGES_MAPPED); /* Raced ahead of another remove and an add? */ if (unlikely(nr < 0)) nr = 0; } else { /* An add of ENTIRELY_MAPPED raced ahead */ nr = 0; } } partially_mapped = nr && nr < nr_pmdmapped; break; } /* * Queue anon large folio for deferred split if at least one page of * the folio is unmapped and at least one page is still mapped. * * Check partially_mapped first to ensure it is a large folio. */ if (partially_mapped && folio_test_anon(folio) && !folio_test_partially_mapped(folio)) deferred_split_folio(folio, true); __folio_mod_stat(folio, -nr, -nr_pmdmapped); /* * It would be tidy to reset folio_test_anon mapping when fully * unmapped, but that might overwrite a racing folio_add_anon_rmap_*() * which increments mapcount after us but sets mapping before us: * so leave the reset to free_pages_prepare, and remember that * it's only reliable while mapped. */ munlock_vma_folio(folio, vma); } /** * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio * @folio: The folio to remove the mappings from * @page: The first page to remove * @nr_pages: The number of pages that will be removed from the mapping * @vma: The vm area from which the mappings are removed * * The page range of the folio is defined by [page, page + nr_pages) * * The caller needs to hold the page table lock. */ void folio_remove_rmap_ptes(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma) { __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE); } /** * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio * @folio: The folio to remove the mapping from * @page: The first page to remove * @vma: The vm area from which the mapping is removed * * The page range of the folio is defined by [page, page + HPAGE_PMD_NR) * * The caller needs to hold the page table lock. */ void folio_remove_rmap_pmd(struct folio *folio, struct page *page, struct vm_area_struct *vma) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD); #else WARN_ON_ONCE(true); #endif } /** * folio_remove_rmap_pud - remove a PUD mapping from a page range of a folio * @folio: The folio to remove the mapping from * @page: The first page to remove * @vma: The vm area from which the mapping is removed * * The page range of the folio is defined by [page, page + HPAGE_PUD_NR) * * The caller needs to hold the page table lock. */ void folio_remove_rmap_pud(struct folio *folio, struct page *page, struct vm_area_struct *vma) { #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) __folio_remove_rmap(folio, page, HPAGE_PUD_NR, vma, RMAP_LEVEL_PUD); #else WARN_ON_ONCE(true); #endif } /* We support batch unmapping of PTEs for lazyfree large folios */ static inline bool can_batch_unmap_folio_ptes(unsigned long addr, struct folio *folio, pte_t *ptep) { const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; int max_nr = folio_nr_pages(folio); pte_t pte = ptep_get(ptep); if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) return false; if (pte_unused(pte)) return false; if (pte_pfn(pte) != folio_pfn(folio)) return false; return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, NULL, NULL) == max_nr; } /* * @arg: enum ttu_flags will be passed to this argument */ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); bool anon_exclusive, ret = true; pte_t pteval; struct page *subpage; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long nr_pages = 1, end_addr; unsigned long pfn; unsigned long hsz = 0; /* * When racing against e.g. zap_pte_range() on another cpu, * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_unmap() may return before page_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ if (flags & TTU_SYNC) pvmw.flags = PVMW_SYNC; /* * For THP, we have to assume the worse case ie pmd for invalidation. * For hugetlb, it could be much worse if we need to do pud * invalidation in the case of pmd sharing. * * Note that the folio can not be freed in this function as call of * try_to_unmap() must hold a reference on the folio. */ range.end = vma_address_end(&pvmw); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, address, range.end); if (folio_test_hugetlb(folio)) { /* * If sharing is possible, start and end will be adjusted * accordingly. */ adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); /* We need the huge page size for set_huge_pte_at() */ hsz = huge_page_size(hstate_vma(vma)); } mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { /* * If the folio is in an mlock()d vma, we must not swap it out. */ if (!(flags & TTU_IGNORE_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* Restore the mlock which got missed */ if (!folio_test_large(folio)) mlock_vma_folio(folio, vma); goto walk_abort; } if (!pvmw.pte) { if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) goto walk_done; /* * unmap_huge_pmd_locked has either already marked * the folio as swap-backed or decided to retain it * due to GUP or speculative references. */ goto walk_abort; } if (flags & TTU_SPLIT_HUGE_PMD) { /* * We temporarily have to drop the PTL and * restart so we can process the PTE-mapped THP. */ split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, false); flags &= ~TTU_SPLIT_HUGE_PMD; page_vma_mapped_walk_restart(&pvmw); continue; } } /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); /* * Handle PFN swap PTEs, such as device-exclusive ones, that * actually map pages. */ pteval = ptep_get(pvmw.pte); if (likely(pte_present(pteval))) { pfn = pte_pfn(pteval); } else { pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); /* * The try_to_unmap() is only passed a hugetlb page * in the case where the hugetlb page is poisoned. */ VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. * start/end were already adjusted above to cover this * range. */ flush_cache_range(vma, range.start, range.end); /* * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. * * We also must hold hugetlb vma_lock in write mode. * Lock order dictates acquiring vma_lock BEFORE * i_mmap_rwsem. We can only try lock here and fail * if unsuccessful. */ if (!anon) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (!hugetlb_vma_trylock_write(vma)) goto walk_abort; if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { hugetlb_vma_unlock_write(vma); flush_tlb_range(vma, range.start, range.end); /* * The ref count of the PMD page was * dropped which is part of the way map * counting is done for shared PMDs. * Return 'true' here. When there is * no other sharing, huge_pmd_unshare * returns false and we will unmap the * actual page and drop map count * to zero. */ goto walk_done; } hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); if (pte_dirty(pteval)) folio_mark_dirty(folio); } else if (likely(pte_present(pteval))) { if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) nr_pages = folio_nr_pages(folio); end_addr = address + nr_pages * PAGE_SIZE; flush_cache_range(vma, address, end_addr); /* Nuke the page table entry. */ pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); /* * We clear the PTE but do not flush so potentially * a remote CPU could still be writing to the folio. * If the entry was previously clean then the * architecture must guarantee that a clear->dirty * transition on a cached TLB entry is written through * and traps if the PTE is unmapped. */ if (should_defer_flush(mm, flags)) set_tlb_ubc_flush_pending(mm, pteval, address, end_addr); else flush_tlb_range(vma, address, end_addr); if (pte_dirty(pteval)) folio_mark_dirty(folio); } else { pte_clear(mm, address, pvmw.pte); } /* * Now the pte is cleared. If this pte was uffd-wp armed, * we may want to replace a none pte with a marker pte if * it's file-backed, so we don't lose the tracking info. */ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); } else { dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } } else if (likely(pte_present(pteval)) && pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan * will take care of the rest. * A future reference will then fault in a new zero * page. When userfaultfd is active, we must not drop * this page though, as its main user (postcopy * migration) will not expect userfaults on already * copied pages. */ dec_mm_counter(mm, mm_counter(folio)); } else if (folio_test_anon(folio)) { swp_entry_t entry = page_swap_entry(subpage); pte_t swp_pte; /* * Store the swap location in the pte. * See handle_pte_fault() ... */ if (unlikely(folio_test_swapbacked(folio) != folio_test_swapcache(folio))) { WARN_ON_ONCE(1); goto walk_abort; } /* MADV_FREE page check */ if (!folio_test_swapbacked(folio)) { int ref_count, map_count; /* * Synchronize with gup_pte_range(): * - clear PTE; barrier; read refcount * - inc refcount; barrier; read PTE */ smp_mb(); ref_count = folio_ref_count(folio); map_count = folio_mapcount(folio); /* * Order reads for page refcount and dirty flag * (see comments in __remove_mapping()). */ smp_rmb(); if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { /* * redirtied either using the page table or a previously * obtained GUP reference. */ set_ptes(mm, address, pvmw.pte, pteval, nr_pages); folio_set_swapbacked(folio); goto walk_abort; } else if (ref_count != 1 + map_count) { /* * Additional reference. Could be a GUP reference or any * speculative reference. GUP users must mark the folio * dirty if there was a modification. This folio cannot be * reclaimed right now either way, so act just like nothing * happened. * We'll come back here later and detect if the folio was * dirtied when the additional reference is gone. */ set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } if (swap_duplicate(entry) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } /* * arch_unmap_one() is expected to be a NOP on * architectures where we could have PFN swap PTEs, * so we'll not check/care. */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } /* See folio_try_share_anon_rmap(): clear PTE first. */ if (anon_exclusive && folio_try_share_anon_rmap_pte(folio, subpage)) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } if (list_empty(&mm->mmlist)) { spin_lock(&mmlist_lock); if (list_empty(&mm->mmlist)) list_add(&mm->mmlist, &init_mm.mmlist); spin_unlock(&mmlist_lock); } dec_mm_counter(mm, MM_ANONPAGES); inc_mm_counter(mm, MM_SWAPENTS); swp_pte = swp_entry_to_pte(entry); if (anon_exclusive) swp_pte = pte_swp_mkexclusive(swp_pte); if (likely(pte_present(pteval))) { if (pte_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); } else { if (pte_swp_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_swp_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { /* * This is a locked file-backed folio, * so it cannot be removed from the page * cache and replaced by a new folio before * mmu_notifier_invalidate_range_end, so no * concurrent thread might update its page table * to point at a new folio while a device is * still using this folio. * * See Documentation/mm/mmu_notifier.rst */ dec_mm_counter(mm, mm_counter_file(folio)); } discard: if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); } else { folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); folio_ref_sub(folio, nr_pages - 1); } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); /* We have already batched the entire folio */ if (nr_pages > 1) goto walk_done; continue; walk_abort: ret = false; walk_done: page_vma_mapped_walk_done(&pvmw); break; } mmu_notifier_invalidate_range_end(&range); return ret; } static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg) { return vma_is_temporary_stack(vma); } static int folio_not_mapped(struct folio *folio) { return !folio_mapped(folio); } /** * try_to_unmap - Try to remove all page table mappings to a folio. * @folio: The folio to unmap. * @flags: action and flags * * Tries to remove all the page table entries which are mapping this * folio. It is the caller's responsibility to check if the folio is * still mapped if needed (use TTU_SYNC to prevent accounting races). * * Context: Caller must hold the folio lock. */ void try_to_unmap(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, .arg = (void *)flags, .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); } /* * @arg: enum ttu_flags will be passed to this argument. * * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs * containing migration entries. */ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); bool anon_exclusive, writable, ret = true; pte_t pteval; struct page *subpage; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; /* * When racing against e.g. zap_pte_range() on another cpu, * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(), * try_to_migrate() may return before page_mapped() has become false, * if page table locking is skipped: use TTU_SYNC to wait for that. */ if (flags & TTU_SYNC) pvmw.flags = PVMW_SYNC; /* * For THP, we have to assume the worse case ie pmd for invalidation. * For hugetlb, it could be much worse if we need to do pud * invalidation in the case of pmd sharing. * * Note that the page can not be free in this function as call of * try_to_unmap() must hold a reference on the page. */ range.end = vma_address_end(&pvmw); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, address, range.end); if (folio_test_hugetlb(folio)) { /* * If sharing is possible, start and end will be adjusted * accordingly. */ adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); /* We need the huge page size for set_huge_pte_at() */ hsz = huge_page_size(hstate_vma(vma)); } mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { /* PMD-mapped THP migration entry */ if (!pvmw.pte) { if (flags & TTU_SPLIT_HUGE_PMD) { split_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, true); ret = false; page_vma_mapped_walk_done(&pvmw); break; } #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION subpage = folio_page(folio, pmd_pfn(*pvmw.pmd) - folio_pfn(folio)); VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || !folio_test_pmd_mappable(folio), folio); if (set_pmd_migration_entry(&pvmw, subpage)) { ret = false; page_vma_mapped_walk_done(&pvmw); break; } continue; #endif } /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); /* * Handle PFN swap PTEs, such as device-exclusive ones, that * actually map pages. */ pteval = ptep_get(pvmw.pte); if (likely(pte_present(pteval))) { pfn = pte_pfn(pteval); } else { pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. * start/end were already adjusted above to cover this * range. */ flush_cache_range(vma, range.start, range.end); /* * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. * * We also must hold hugetlb vma_lock in write mode. * Lock order dictates acquiring vma_lock BEFORE * i_mmap_rwsem. We can only try lock here and * fail if unsuccessful. */ if (!anon) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; break; } if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { hugetlb_vma_unlock_write(vma); flush_tlb_range(vma, range.start, range.end); /* * The ref count of the PMD page was * dropped which is part of the way map * counting is done for shared PMDs. * Return 'true' here. When there is * no other sharing, huge_pmd_unshare * returns false and we will unmap the * actual page and drop map count * to zero. */ page_vma_mapped_walk_done(&pvmw); break; } hugetlb_vma_unlock_write(vma); } /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); if (pte_dirty(pteval)) folio_mark_dirty(folio); writable = pte_write(pteval); } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { /* * We clear the PTE but do not flush so potentially * a remote CPU could still be writing to the folio. * If the entry was previously clean then the * architecture must guarantee that a clear->dirty * transition on a cached TLB entry is written through * and traps if the PTE is unmapped. */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } if (pte_dirty(pteval)) folio_mark_dirty(folio); writable = pte_write(pteval); } else { pte_clear(mm, address, pvmw.pte); writable = is_writable_device_private_entry(pte_to_swp_entry(pteval)); } VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && !anon_exclusive, folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); if (PageHWPoison(subpage)) { VM_WARN_ON_FOLIO(folio_is_device_private(folio), folio); pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); } else { dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } } else if (likely(pte_present(pteval)) && pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan * will take care of the rest. * A future reference will then fault in a new zero * page. When userfaultfd is active, we must not drop * this page though, as its main user (postcopy * migration) will not expect userfaults on already * copied pages. */ dec_mm_counter(mm, mm_counter(folio)); } else { swp_entry_t entry; pte_t swp_pte; /* * arch_unmap_one() is expected to be a NOP on * architectures where we could have PFN swap PTEs, * so we'll not check/care. */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); else set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; } /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { if (anon_exclusive && hugetlb_try_share_anon_rmap(folio)) { set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); ret = false; page_vma_mapped_walk_done(&pvmw); break; } } else if (anon_exclusive && folio_try_share_anon_rmap_pte(folio, subpage)) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; } /* * Store the pfn of the page in a special migration * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ if (writable) entry = make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) entry = make_readable_exclusive_migration_entry( page_to_pfn(subpage)); else entry = make_readable_migration_entry( page_to_pfn(subpage)); if (likely(pte_present(pteval))) { if (pte_young(pteval)) entry = make_migration_entry_young(entry); if (pte_dirty(pteval)) entry = make_migration_entry_dirty(entry); swp_pte = swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); } else { swp_pte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_swp_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz); else set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), folio_order(folio)); /* * No need to invalidate here it will synchronize on * against the special swap migration pte. */ } if (unlikely(folio_test_hugetlb(folio))) hugetlb_remove_rmap(folio); else folio_remove_rmap_pte(folio, subpage, vma); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); } mmu_notifier_invalidate_range_end(&range); return ret; } /** * try_to_migrate - try to replace all page table mappings with swap entries * @folio: the folio to replace page table entries for * @flags: action and flags * * Tries to remove all the page table entries which are mapping this folio and * replace them with special swap entries. Caller must hold the folio lock. */ void try_to_migrate(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_migrate_one, .arg = (void *)flags, .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; /* * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags. */ if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC | TTU_BATCH_FLUSH))) return; if (folio_is_zone_device(folio) && (!folio_is_device_private(folio) && !folio_is_device_coherent(folio))) return; /* * During exec, a temporary VMA is setup and later moved. * The VMA is moved under the anon_vma lock but not the * page tables leading to a race where migration cannot * find the migration ptes. Rather than increasing the * locking requirements of exec(), migration skips * temporary VMAs until after exec() completes. */ if (!folio_test_ksm(folio) && folio_test_anon(folio)) rwc.invalid_vma = invalid_migration_vma; if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); } #ifdef CONFIG_DEVICE_PRIVATE /** * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering * @foliop: folio pointer will be stored here on success. * * This function looks up the page mapped at the given address, grabs a * folio reference, locks the folio and replaces the PTE with special * device-exclusive PFN swap entry, preventing access through the process * page tables. The function will return with the folio locked and referenced. * * On fault, the device-exclusive entries are replaced with the original PTE * under folio lock, after calling MMU notifiers. * * Only anonymous non-hugetlb folios are supported and the VMA must have * write permissions such that we can fault in the anonymous page writable * in order to mark it exclusive. The caller must hold the mmap_lock in read * mode. * * A driver using this to program access from a device must use a mmu notifier * critical section to hold a device specific lock during programming. Once * programming is complete it should drop the folio lock and reference after * which point CPU access to the page will revoke the exclusive access. * * Notes: * #. This function always operates on individual PTEs mapping individual * pages. PMD-sized THPs are first remapped to be mapped by PTEs before * the conversion happens on a single PTE corresponding to @addr. * #. While concurrent access through the process page tables is prevented, * concurrent access through other page references (e.g., earlier GUP * invocation) is not handled and not supported. * #. device-exclusive entries are considered "clean" and "old" by core-mm. * Device drivers must update the folio state when informed by MMU * notifiers. * * Returns: pointer to mapped page on success, otherwise a negative error. */ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, void *owner, struct folio **foliop) { struct mmu_notifier_range range; struct folio *folio, *fw_folio; struct vm_area_struct *vma; struct folio_walk fw; struct page *page; swp_entry_t entry; pte_t swp_pte; int ret; mmap_assert_locked(mm); addr = PAGE_ALIGN_DOWN(addr); /* * Fault in the page writable and try to lock it; note that if the * address would already be marked for exclusive use by a device, * the GUP call would undo that first by triggering a fault. * * If any other device would already map this page exclusively, the * fault will trigger a conversion to an ordinary * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ retry: page = get_user_page_vma_remote(mm, addr, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, &vma); if (IS_ERR(page)) return page; folio = page_folio(page); if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { folio_put(folio); return ERR_PTR(-EOPNOTSUPP); } ret = folio_lock_killable(folio); if (ret) { folio_put(folio); return ERR_PTR(ret); } /* * Inform secondary MMUs that we are going to convert this PTE to * device-exclusive, such that they unmap it now. Note that the * caller must filter this event out to prevent livelocks. */ mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, mm, addr, addr + PAGE_SIZE, owner); mmu_notifier_invalidate_range_start(&range); /* * Let's do a second walk and make sure we still find the same page * mapped writable. Note that any page of an anonymous folio can * only be mapped writable using exactly one PTE ("exclusive"), so * there cannot be other mappings. */ fw_folio = folio_walk_start(&fw, vma, addr, 0); if (fw_folio != folio || fw.page != page || fw.level != FW_LEVEL_PTE || !pte_write(fw.pte)) { if (fw_folio) folio_walk_end(&fw, vma); mmu_notifier_invalidate_range_end(&range); folio_unlock(folio); folio_put(folio); goto retry; } /* Nuke the page table entry so we get the uptodate dirty bit. */ flush_cache_page(vma, addr, page_to_pfn(page)); fw.pte = ptep_clear_flush(vma, addr, fw.ptep); /* Set the dirty flag on the folio now the PTE is gone. */ if (pte_dirty(fw.pte)) folio_mark_dirty(folio); /* * Store the pfn of the page in a special device-exclusive PFN swap PTE. * do_swap_page() will trigger the conversion back while holding the * folio lock. */ entry = make_device_exclusive_entry(page_to_pfn(page)); swp_pte = swp_entry_to_pte(entry); if (pte_soft_dirty(fw.pte)) swp_pte = pte_swp_mksoft_dirty(swp_pte); /* The pte is writable, uffd-wp does not apply. */ set_pte_at(mm, addr, fw.ptep, swp_pte); folio_walk_end(&fw, vma); mmu_notifier_invalidate_range_end(&range); *foliop = folio; return page; } EXPORT_SYMBOL_GPL(make_device_exclusive); #endif void __put_anon_vma(struct anon_vma *anon_vma) { struct anon_vma *root = anon_vma->root; anon_vma_free(anon_vma); if (root != anon_vma && atomic_dec_and_test(&root->refcount)) anon_vma_free(root); } static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio, struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; if (rwc->anon_lock) return rwc->anon_lock(folio, rwc); /* * Note: remove_migration_ptes() cannot use folio_lock_anon_vma_read() * because that depends on page_mapped(); but not all its usages * are holding mmap_lock. Users without mmap_lock are required to * take a reference count to prevent the anon_vma disappearing */ anon_vma = folio_anon_vma(folio); if (!anon_vma) return NULL; if (anon_vma_trylock_read(anon_vma)) goto out; if (rwc->try_lock) { anon_vma = NULL; rwc->contended = true; goto out; } anon_vma_lock_read(anon_vma); out: return anon_vma; } /* * rmap_walk_anon - do something to anonymous page using the object-based * rmap method * @folio: the folio to be handled * @rwc: control variable according to each walk type * @locked: caller holds relevant rmap lock * * Find all the mappings of a folio using the mapping pointer and the vma * chains contained in the anon_vma struct it points to. */ static void rmap_walk_anon(struct folio *folio, struct rmap_walk_control *rwc, bool locked) { struct anon_vma *anon_vma; pgoff_t pgoff_start, pgoff_end; struct anon_vma_chain *avc; if (locked) { anon_vma = folio_anon_vma(folio); /* anon_vma disappear under us? */ VM_BUG_ON_FOLIO(!anon_vma, folio); } else { anon_vma = rmap_walk_anon_lock(folio, rwc); } if (!anon_vma) return; pgoff_start = folio_pgoff(folio); pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff_start, pgoff_end) { struct vm_area_struct *vma = avc->vma; unsigned long address = vma_address(vma, pgoff_start, folio_nr_pages(folio)); VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) continue; if (!rwc->rmap_one(folio, vma, address, rwc->arg)) break; if (rwc->done && rwc->done(folio)) break; } if (!locked) anon_vma_unlock_read(anon_vma); } /** * __rmap_walk_file() - Traverse the reverse mapping for a file-backed mapping * of a page mapped within a specified page cache object at a specified offset. * * @folio: Either the folio whose mappings to traverse, or if NULL, * the callbacks specified in @rwc will be configured such * as to be able to look up mappings correctly. * @mapping: The page cache object whose mapping VMAs we intend to * traverse. If @folio is non-NULL, this should be equal to * folio_mapping(folio). * @pgoff_start: The offset within @mapping of the page which we are * looking up. If @folio is non-NULL, this should be equal * to folio_pgoff(folio). * @nr_pages: The number of pages mapped by the mapping. If @folio is * non-NULL, this should be equal to folio_nr_pages(folio). * @rwc: The reverse mapping walk control object describing how * the traversal should proceed. * @locked: Is the @mapping already locked? If not, we acquire the * lock. */ static void __rmap_walk_file(struct folio *folio, struct address_space *mapping, pgoff_t pgoff_start, unsigned long nr_pages, struct rmap_walk_control *rwc, bool locked) { pgoff_t pgoff_end = pgoff_start + nr_pages - 1; struct vm_area_struct *vma; VM_WARN_ON_FOLIO(folio && mapping != folio_mapping(folio), folio); VM_WARN_ON_FOLIO(folio && pgoff_start != folio_pgoff(folio), folio); VM_WARN_ON_FOLIO(folio && nr_pages != folio_nr_pages(folio), folio); if (!locked) { if (i_mmap_trylock_read(mapping)) goto lookup; if (rwc->try_lock) { rwc->contended = true; return; } i_mmap_lock_read(mapping); } lookup: vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { unsigned long address = vma_address(vma, pgoff_start, nr_pages); VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) continue; if (!rwc->rmap_one(folio, vma, address, rwc->arg)) goto done; if (rwc->done && rwc->done(folio)) goto done; } done: if (!locked) i_mmap_unlock_read(mapping); } /* * rmap_walk_file - do something to file page using the object-based rmap method * @folio: the folio to be handled * @rwc: control variable according to each walk type * @locked: caller holds relevant rmap lock * * Find all the mappings of a folio using the mapping pointer and the vma chains * contained in the address_space struct it points to. */ static void rmap_walk_file(struct folio *folio, struct rmap_walk_control *rwc, bool locked) { /* * The folio lock not only makes sure that folio->mapping cannot * suddenly be NULLified by truncation, it makes sure that the structure * at mapping cannot be freed and reused yet, so we can safely take * mapping->i_mmap_rwsem. */ VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); if (!folio->mapping) return; __rmap_walk_file(folio, folio->mapping, folio->index, folio_nr_pages(folio), rwc, locked); } void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc) { if (unlikely(folio_test_ksm(folio))) rmap_walk_ksm(folio, rwc); else if (folio_test_anon(folio)) rmap_walk_anon(folio, rwc, false); else rmap_walk_file(folio, rwc, false); } /* Like rmap_walk, but caller holds relevant rmap lock */ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) { /* no ksm support for now */ VM_BUG_ON_FOLIO(folio_test_ksm(folio), folio); if (folio_test_anon(folio)) rmap_walk_anon(folio, rwc, true); else rmap_walk_file(folio, rwc, true); } #ifdef CONFIG_HUGETLB_PAGE /* * The following two functions are for anonymous (private mapped) hugepages. * Unlike common anonymous pages, anonymous hugepages have no accounting code * and no lru code, because we handle hugepages differently from common pages. */ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); atomic_inc(&folio->_entire_mapcount); atomic_inc(&folio->_large_mapcount); if (flags & RMAP_EXCLUSIVE) SetPageAnonExclusive(&folio->page); VM_WARN_ON_FOLIO(folio_entire_mapcount(folio) > 1 && PageAnonExclusive(&folio->page), folio); } void hugetlb_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); BUG_ON(address < vma->vm_start || address >= vma->vm_end); /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_large_mapcount, 0); folio_clear_hugetlb_restore_reserve(folio); __folio_set_anon(folio, vma, address, true); SetPageAnonExclusive(&folio->page); } #endif /* CONFIG_HUGETLB_PAGE */ |
| 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright (c) International Business Machines Corp., 2006 * * Author: Artem Bityutskiy (Битюцкий Артём) */ #ifndef __UBI_DEBUG_H__ #define __UBI_DEBUG_H__ void ubi_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len); void ubi_dump_ec_hdr(const struct ubi_ec_hdr *ec_hdr); void ubi_dump_vid_hdr(const struct ubi_vid_hdr *vid_hdr); #include <linux/random.h> #define ubi_assert(expr) do { \ if (unlikely(!(expr))) { \ pr_crit("UBI assert failed in %s at %u (pid %d)\n", \ __func__, __LINE__, current->pid); \ dump_stack(); \ } \ } while (0) #define ubi_dbg_print_hex_dump(l, ps, pt, r, g, b, len, a) \ print_hex_dump(l, ps, pt, r, g, b, len, a) #define ubi_dbg_msg(type, fmt, ...) \ pr_debug("UBI DBG " type " (pid %d): " fmt "\n", current->pid, \ ##__VA_ARGS__) /* General debugging messages */ #define dbg_gen(fmt, ...) ubi_dbg_msg("gen", fmt, ##__VA_ARGS__) /* Messages from the eraseblock association sub-system */ #define dbg_eba(fmt, ...) ubi_dbg_msg("eba", fmt, ##__VA_ARGS__) /* Messages from the wear-leveling sub-system */ #define dbg_wl(fmt, ...) ubi_dbg_msg("wl", fmt, ##__VA_ARGS__) /* Messages from the input/output sub-system */ #define dbg_io(fmt, ...) ubi_dbg_msg("io", fmt, ##__VA_ARGS__) /* Initialization and build messages */ #define dbg_bld(fmt, ...) ubi_dbg_msg("bld", fmt, ##__VA_ARGS__) void ubi_dump_vol_info(const struct ubi_volume *vol); void ubi_dump_vtbl_record(const struct ubi_vtbl_record *r, int idx); void ubi_dump_av(const struct ubi_ainf_volume *av); void ubi_dump_aeb(const struct ubi_ainf_peb *aeb, int type); void ubi_dump_mkvol_req(const struct ubi_mkvol_req *req); int ubi_self_check_all_ff(struct ubi_device *ubi, int pnum, int offset, int len); int ubi_debugfs_init(void); void ubi_debugfs_exit(void); int ubi_debugfs_init_dev(struct ubi_device *ubi); void ubi_debugfs_exit_dev(struct ubi_device *ubi); /** * The following function is a legacy implementation of UBI fault-injection * hook. When using more powerful fault injection capabilities, the legacy * fault injection interface should be retained. */ int ubi_dbg_power_cut(struct ubi_device *ubi, int caller); static inline int ubi_dbg_bitflip(const struct ubi_device *ubi) { if (ubi->dbg.emulate_bitflips) return !get_random_u32_below(200); return 0; } static inline int ubi_dbg_write_failure(const struct ubi_device *ubi) { if (ubi->dbg.emulate_io_failures) return !get_random_u32_below(500); return 0; } static inline int ubi_dbg_erase_failure(const struct ubi_device *ubi) { if (ubi->dbg.emulate_io_failures) return !get_random_u32_below(400); return 0; } /** * MASK_XXX: Mask for emulate_failures in ubi_debug_info.The mask is used to * precisely control the type and process of fault injection. */ /* Emulate a power cut when writing EC/VID header */ #define MASK_POWER_CUT_EC (1 << 0) #define MASK_POWER_CUT_VID (1 << 1) /* Emulate a power cut when writing data*/ #define MASK_POWER_CUT_DATA (1 << 2) /* Emulate bit-flips */ #define MASK_BITFLIPS (1 << 3) /* Emulate ecc error */ #define MASK_ECCERR (1 << 4) /* Emulates -EIO during data read */ #define MASK_READ_FAILURE (1 << 5) #define MASK_READ_FAILURE_EC (1 << 6) #define MASK_READ_FAILURE_VID (1 << 7) /* Emulates -EIO during data write */ #define MASK_WRITE_FAILURE (1 << 8) /* Emulates -EIO during erase a PEB*/ #define MASK_ERASE_FAILURE (1 << 9) /* Return UBI_IO_FF when reading EC/VID header */ #define MASK_IO_FF_EC (1 << 10) #define MASK_IO_FF_VID (1 << 11) /* Return UBI_IO_FF_BITFLIPS when reading EC/VID header */ #define MASK_IO_FF_BITFLIPS_EC (1 << 12) #define MASK_IO_FF_BITFLIPS_VID (1 << 13) /* Return UBI_IO_BAD_HDR when reading EC/VID header */ #define MASK_BAD_HDR_EC (1 << 14) #define MASK_BAD_HDR_VID (1 << 15) /* Return UBI_IO_BAD_HDR_EBADMSG when reading EC/VID header */ #define MASK_BAD_HDR_EBADMSG_EC (1 << 16) #define MASK_BAD_HDR_EBADMSG_VID (1 << 17) #ifdef CONFIG_MTD_UBI_FAULT_INJECTION extern bool should_fail_eccerr(void); extern bool should_fail_bitflips(void); extern bool should_fail_read_failure(void); extern bool should_fail_write_failure(void); extern bool should_fail_erase_failure(void); extern bool should_fail_power_cut(void); extern bool should_fail_io_ff(void); extern bool should_fail_io_ff_bitflips(void); extern bool should_fail_bad_hdr(void); extern bool should_fail_bad_hdr_ebadmsg(void); static inline bool ubi_dbg_fail_bitflip(const struct ubi_device *ubi) { if (ubi->dbg.emulate_failures & MASK_BITFLIPS) return should_fail_bitflips(); return false; } static inline bool ubi_dbg_fail_write(const struct ubi_device *ubi) { if (ubi->dbg.emulate_failures & MASK_WRITE_FAILURE) return should_fail_write_failure(); return false; } static inline bool ubi_dbg_fail_erase(const struct ubi_device *ubi) { if (ubi->dbg.emulate_failures & MASK_ERASE_FAILURE) return should_fail_erase_failure(); return false; } static inline bool ubi_dbg_fail_power_cut(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_power_cut(); return false; } static inline bool ubi_dbg_fail_read(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_read_failure(); return false; } static inline bool ubi_dbg_fail_eccerr(const struct ubi_device *ubi) { if (ubi->dbg.emulate_failures & MASK_ECCERR) return should_fail_eccerr(); return false; } static inline bool ubi_dbg_fail_ff(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_io_ff(); return false; } static inline bool ubi_dbg_fail_ff_bitflips(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_io_ff_bitflips(); return false; } static inline bool ubi_dbg_fail_bad_hdr(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_bad_hdr(); return false; } static inline bool ubi_dbg_fail_bad_hdr_ebadmsg(const struct ubi_device *ubi, unsigned int caller) { if (ubi->dbg.emulate_failures & caller) return should_fail_bad_hdr_ebadmsg(); return false; } #else /* CONFIG_MTD_UBI_FAULT_INJECTION */ #define ubi_dbg_fail_bitflip(u) false #define ubi_dbg_fail_write(u) false #define ubi_dbg_fail_erase(u) false #define ubi_dbg_fail_power_cut(u, c) false #define ubi_dbg_fail_read(u, c) false #define ubi_dbg_fail_eccerr(u) false #define ubi_dbg_fail_ff(u, c) false #define ubi_dbg_fail_ff_bitflips(u, v) false #define ubi_dbg_fail_bad_hdr(u, c) false #define ubi_dbg_fail_bad_hdr_ebadmsg(u, c) false #endif /** * ubi_dbg_is_power_cut - if it is time to emulate power cut. * @ubi: UBI device description object * * Returns true if power cut should be emulated, otherwise returns false. */ static inline bool ubi_dbg_is_power_cut(struct ubi_device *ubi, unsigned int caller) { if (ubi_dbg_power_cut(ubi, caller)) return true; return ubi_dbg_fail_power_cut(ubi, caller); } /** * ubi_dbg_is_bitflip - if it is time to emulate a bit-flip. * @ubi: UBI device description object * * Returns true if a bit-flip should be emulated, otherwise returns false. */ static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) { if (ubi_dbg_bitflip(ubi)) return true; return ubi_dbg_fail_bitflip(ubi); } /** * ubi_dbg_is_write_failure - if it is time to emulate a write failure. * @ubi: UBI device description object * * Returns true if a write failure should be emulated, otherwise returns * false. */ static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) { if (ubi_dbg_write_failure(ubi)) return true; return ubi_dbg_fail_write(ubi); } /** * ubi_dbg_is_erase_failure - if its time to emulate an erase failure. * @ubi: UBI device description object * * Returns true if an erase failure should be emulated, otherwise returns * false. */ static inline bool ubi_dbg_is_erase_failure(const struct ubi_device *ubi) { if (ubi_dbg_erase_failure(ubi)) return true; return ubi_dbg_fail_erase(ubi); } /** * ubi_dbg_is_eccerr - if it is time to emulate ECC error. * @ubi: UBI device description object * * Returns true if a ECC error should be emulated, otherwise returns false. */ static inline bool ubi_dbg_is_eccerr(const struct ubi_device *ubi) { return ubi_dbg_fail_eccerr(ubi); } /** * ubi_dbg_is_read_failure - if it is time to emulate a read failure. * @ubi: UBI device description object * * Returns true if a read failure should be emulated, otherwise returns * false. */ static inline bool ubi_dbg_is_read_failure(const struct ubi_device *ubi, unsigned int caller) { return ubi_dbg_fail_read(ubi, caller); } /** * ubi_dbg_is_ff - if it is time to emulate that read region is only 0xFF. * @ubi: UBI device description object * * Returns true if read region should be emulated 0xFF, otherwise * returns false. */ static inline bool ubi_dbg_is_ff(const struct ubi_device *ubi, unsigned int caller) { return ubi_dbg_fail_ff(ubi, caller); } /** * ubi_dbg_is_ff_bitflips - if it is time to emulate that read region is only 0xFF * with error reported by the MTD driver * * @ubi: UBI device description object * * Returns true if read region should be emulated 0xFF and error * reported by the MTD driver, otherwise returns false. */ static inline bool ubi_dbg_is_ff_bitflips(const struct ubi_device *ubi, unsigned int caller) { return ubi_dbg_fail_ff_bitflips(ubi, caller); } /** * ubi_dbg_is_bad_hdr - if it is time to emulate a bad header * @ubi: UBI device description object * * Returns true if a bad header error should be emulated, otherwise * returns false. */ static inline bool ubi_dbg_is_bad_hdr(const struct ubi_device *ubi, unsigned int caller) { return ubi_dbg_fail_bad_hdr(ubi, caller); } /** * ubi_dbg_is_bad_hdr_ebadmsg - if it is time to emulate a bad header with * ECC error. * * @ubi: UBI device description object * * Returns true if a bad header with ECC error should be emulated, otherwise * returns false. */ static inline bool ubi_dbg_is_bad_hdr_ebadmsg(const struct ubi_device *ubi, unsigned int caller) { return ubi_dbg_fail_bad_hdr_ebadmsg(ubi, caller); } /** * ubi_dbg_is_bgt_disabled - if the background thread is disabled. * @ubi: UBI device description object * * Returns non-zero if the UBI background thread is disabled for testing * purposes. */ static inline int ubi_dbg_is_bgt_disabled(const struct ubi_device *ubi) { return ubi->dbg.disable_bgt; } static inline int ubi_dbg_chk_io(const struct ubi_device *ubi) { return ubi->dbg.chk_io; } static inline int ubi_dbg_chk_gen(const struct ubi_device *ubi) { return ubi->dbg.chk_gen; } static inline int ubi_dbg_chk_fastmap(const struct ubi_device *ubi) { return ubi->dbg.chk_fastmap; } static inline void ubi_enable_dbg_chk_fastmap(struct ubi_device *ubi) { ubi->dbg.chk_fastmap = 1; } #endif /* !__UBI_DEBUG_H__ */ |
| 60 1 20 1051 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 | /* SPDX-License-Identifier: GPL-2.0 */ /* * linux/cgroup-defs.h - basic definitions for cgroup * * This file provides basic type and interface. Include this file directly * only if necessary to avoid cyclic dependencies. */ #ifndef _LINUX_CGROUP_DEFS_H #define _LINUX_CGROUP_DEFS_H #include <linux/limits.h> #include <linux/list.h> #include <linux/idr.h> #include <linux/wait.h> #include <linux/mutex.h> #include <linux/rcupdate.h> #include <linux/refcount.h> #include <linux/percpu-refcount.h> #include <linux/percpu-rwsem.h> #include <linux/u64_stats_sync.h> #include <linux/workqueue.h> #include <linux/bpf-cgroup-defs.h> #include <linux/psi_types.h> #ifdef CONFIG_CGROUPS struct cgroup; struct cgroup_root; struct cgroup_subsys; struct cgroup_taskset; struct kernfs_node; struct kernfs_ops; struct kernfs_open_file; struct seq_file; struct poll_table_struct; #define MAX_CGROUP_TYPE_NAMELEN 32 #define MAX_CGROUP_ROOT_NAMELEN 64 #define MAX_CFTYPE_NAME 64 /* define the enumeration of all cgroup subsystems */ #define SUBSYS(_x) _x ## _cgrp_id, enum cgroup_subsys_id { #include <linux/cgroup_subsys.h> CGROUP_SUBSYS_COUNT, }; #undef SUBSYS /* bits in struct cgroup_subsys_state flags field */ enum { CSS_NO_REF = (1 << 0), /* no reference counting for this css */ CSS_ONLINE = (1 << 1), /* between ->css_online() and ->css_offline() */ CSS_RELEASED = (1 << 2), /* refcnt reached zero, released */ CSS_VISIBLE = (1 << 3), /* css is visible to userland */ CSS_DYING = (1 << 4), /* css is dying */ }; /* bits in struct cgroup flags field */ enum { /* Control Group requires release notifications to userspace */ CGRP_NOTIFY_ON_RELEASE, /* * Clone the parent's configuration when creating a new child * cpuset cgroup. For historical reasons, this option can be * specified at mount time and thus is implemented here. */ CGRP_CPUSET_CLONE_CHILDREN, /* Control group has to be frozen. */ CGRP_FREEZE, /* Cgroup is frozen. */ CGRP_FROZEN, }; /* cgroup_root->flags */ enum { CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named prefix */ CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */ /* * Consider namespaces as delegation boundaries. If this flag is * set, controller specific interface files in a namespace root * aren't writeable from inside the namespace. */ CGRP_ROOT_NS_DELEGATE = (1 << 3), /* * Reduce latencies on dynamic cgroup modifications such as task * migrations and controller on/offs by disabling percpu operation on * cgroup_threadgroup_rwsem. This makes hot path operations such as * forks and exits into the slow path and more expensive. * * The static usage pattern of creating a cgroup, enabling controllers, * and then seeding it with CLONE_INTO_CGROUP doesn't require write * locking cgroup_threadgroup_rwsem and thus doesn't benefit from * favordynmod. */ CGRP_ROOT_FAVOR_DYNMODS = (1 << 4), /* * Enable cpuset controller in v1 cgroup to use v2 behavior. */ CGRP_ROOT_CPUSET_V2_MODE = (1 << 16), /* * Enable legacy local memory.events. */ CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 17), /* * Enable recursive subtree protection */ CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 18), /* * Enable hugetlb accounting for the memory controller. */ CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19), /* * Enable legacy local pids.events. */ CGRP_ROOT_PIDS_LOCAL_EVENTS = (1 << 20), }; /* cftype->flags */ enum { CFTYPE_ONLY_ON_ROOT = (1 << 0), /* only create on root cgrp */ CFTYPE_NOT_ON_ROOT = (1 << 1), /* don't create on root cgrp */ CFTYPE_NS_DELEGATABLE = (1 << 2), /* writeable beyond delegation boundaries */ CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */ CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */ CFTYPE_DEBUG = (1 << 5), /* create when cgroup_debug */ /* internal flags, do not use outside cgroup core proper */ __CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */ __CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */ __CFTYPE_ADDED = (1 << 18), }; /* * cgroup_file is the handle for a file instance created in a cgroup which * is used, for example, to generate file changed notifications. This can * be obtained by setting cftype->file_offset. */ struct cgroup_file { /* do not access any fields from outside cgroup core */ struct kernfs_node *kn; unsigned long notified_at; struct timer_list notify_timer; }; /* * Per-subsystem/per-cgroup state maintained by the system. This is the * fundamental structural building block that controllers deal with. * * Fields marked with "PI:" are public and immutable and may be accessed * directly without synchronization. */ struct cgroup_subsys_state { /* PI: the cgroup that this css is attached to */ struct cgroup *cgroup; /* PI: the cgroup subsystem that this css is attached to */ struct cgroup_subsys *ss; /* reference count - access via css_[try]get() and css_put() */ struct percpu_ref refcnt; /* * Depending on the context, this field is initialized * via css_rstat_init() at different places: * * when css is associated with cgroup::self * when css->cgroup is the root cgroup * performed in cgroup_init() * when css->cgroup is not the root cgroup * performed in cgroup_create() * when css is associated with a subsystem * when css->cgroup is the root cgroup * performed in cgroup_init_subsys() in the non-early path * when css->cgroup is not the root cgroup * performed in css_create() */ struct css_rstat_cpu __percpu *rstat_cpu; /* * siblings list anchored at the parent's ->children * * linkage is protected by cgroup_mutex or RCU */ struct list_head sibling; struct list_head children; /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). */ int id; unsigned int flags; /* * Monotonically increasing unique serial number which defines a * uniform order among all csses. It's guaranteed that all * ->children lists are in the ascending order of ->serial_nr and * used to allow interrupting and resuming iterations. */ u64 serial_nr; /* * Incremented by online self and children. Used to guarantee that * parents are not offlined before their children. */ atomic_t online_cnt; /* percpu_ref killing and RCU release */ struct work_struct destroy_work; struct rcu_work destroy_rwork; /* * PI: the parent css. Placed here for cache proximity to following * fields of the containing structure. */ struct cgroup_subsys_state *parent; /* * Keep track of total numbers of visible descendant CSSes. * The total number of dying CSSes is tracked in * css->cgroup->nr_dying_subsys[ssid]. * Protected by cgroup_mutex. */ int nr_descendants; /* * A singly-linked list of css structures to be rstat flushed. * This is a scratch field to be used exclusively by * css_rstat_flush(). * * Protected by rstat_base_lock when css is cgroup::self. * Protected by css->ss->rstat_ss_lock otherwise. */ struct cgroup_subsys_state *rstat_flush_next; }; /* * A css_set is a structure holding pointers to a set of * cgroup_subsys_state objects. This saves space in the task struct * object and speeds up fork()/exit(), since a single inc/dec and a * list_add()/del() can bump the reference count on the entire cgroup * set for a task. */ struct css_set { /* * Set of subsystem states, one for each subsystem. This array is * immutable after creation apart from the init_css_set during * subsystem registration (at boot time). */ struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT]; /* reference count */ refcount_t refcount; /* * For a domain cgroup, the following points to self. If threaded, * to the matching cset of the nearest domain ancestor. The * dom_cset provides access to the domain cgroup and its csses to * which domain level resource consumptions should be charged. */ struct css_set *dom_cset; /* the default cgroup associated with this css_set */ struct cgroup *dfl_cgrp; /* internal task count, protected by css_set_lock */ int nr_tasks; /* * Lists running through all tasks using this cgroup group. * mg_tasks lists tasks which belong to this cset but are in the * process of being migrated out or in. Protected by * css_set_lock, but, during migration, once tasks are moved to * mg_tasks, it can be read safely while holding cgroup_mutex. */ struct list_head tasks; struct list_head mg_tasks; struct list_head dying_tasks; /* all css_task_iters currently walking this cset */ struct list_head task_iters; /* * On the default hierarchy, ->subsys[ssid] may point to a css * attached to an ancestor instead of the cgroup this css_set is * associated with. The following node is anchored at * ->subsys[ssid]->cgroup->e_csets[ssid] and provides a way to * iterate through all css's attached to a given cgroup. */ struct list_head e_cset_node[CGROUP_SUBSYS_COUNT]; /* all threaded csets whose ->dom_cset points to this cset */ struct list_head threaded_csets; struct list_head threaded_csets_node; /* * List running through all cgroup groups in the same hash * slot. Protected by css_set_lock */ struct hlist_node hlist; /* * List of cgrp_cset_links pointing at cgroups referenced from this * css_set. Protected by css_set_lock. */ struct list_head cgrp_links; /* * List of csets participating in the on-going migration either as * source or destination. Protected by cgroup_mutex. */ struct list_head mg_src_preload_node; struct list_head mg_dst_preload_node; struct list_head mg_node; /* * If this cset is acting as the source of migration the following * two fields are set. mg_src_cgrp and mg_dst_cgrp are * respectively the source and destination cgroups of the on-going * migration. mg_dst_cset is the destination cset the target tasks * on this cset should be migrated to. Protected by cgroup_mutex. */ struct cgroup *mg_src_cgrp; struct cgroup *mg_dst_cgrp; struct css_set *mg_dst_cset; /* dead and being drained, ignore for migration */ bool dead; /* For RCU-protected deletion */ struct rcu_head rcu_head; }; struct cgroup_base_stat { struct task_cputime cputime; #ifdef CONFIG_SCHED_CORE u64 forceidle_sum; #endif u64 ntime; }; /* * rstat - cgroup scalable recursive statistics. Accounting is done * per-cpu in css_rstat_cpu which is then lazily propagated up the * hierarchy on reads. * * When a stat gets updated, the css_rstat_cpu and its ancestors are * linked into the updated tree. On the following read, propagation only * considers and consumes the updated tree. This makes reading O(the * number of descendants which have been active since last read) instead of * O(the total number of descendants). * * This is important because there can be a lot of (draining) cgroups which * aren't active and stat may be read frequently. The combination can * become very expensive. By propagating selectively, increasing reading * frequency decreases the cost of each read. * * This struct hosts both the fields which implement the above - * updated_children and updated_next. */ struct css_rstat_cpu { /* * Child cgroups with stat updates on this cpu since the last read * are linked on the parent's ->updated_children through * ->updated_next. updated_children is terminated by its container css. * * In addition to being more compact, singly-linked list pointing to * the css makes it unnecessary for each per-cpu struct to point back * to the associated css. * * Protected by per-cpu css->ss->rstat_ss_cpu_lock. */ struct cgroup_subsys_state *updated_children; struct cgroup_subsys_state *updated_next; /* NULL if not on the list */ }; /* * This struct hosts the fields which track basic resource statistics on * top of it - bsync, bstat and last_bstat. */ struct cgroup_rstat_base_cpu { /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. */ struct u64_stats_sync bsync; struct cgroup_base_stat bstat; /* * Snapshots at the last reading. These are used to calculate the * deltas to propagate to the global counters. */ struct cgroup_base_stat last_bstat; /* * This field is used to record the cumulative per-cpu time of * the cgroup and its descendants. Currently it can be read via * eBPF/drgn etc, and we are still trying to determine how to * expose it in the cgroupfs interface. */ struct cgroup_base_stat subtree_bstat; /* * Snapshots at the last reading. These are used to calculate the * deltas to propagate to the per-cpu subtree_bstat. */ struct cgroup_base_stat last_subtree_bstat; }; struct cgroup_freezer_state { /* Should the cgroup and its descendants be frozen. */ bool freeze; /* Should the cgroup actually be frozen? */ bool e_freeze; /* Fields below are protected by css_set_lock */ /* Number of frozen descendant cgroups */ int nr_frozen_descendants; /* * Number of tasks, which are counted as frozen: * frozen, SIGSTOPped, and PTRACEd. */ int nr_frozen_tasks; }; struct cgroup { /* self css with NULL ->ss, points back to this cgroup */ struct cgroup_subsys_state self; unsigned long flags; /* "unsigned long" so bitops work */ /* * The depth this cgroup is at. The root is at depth zero and each * step down the hierarchy increments the level. This along with * ancestors[] can determine whether a given cgroup is a * descendant of another without traversing the hierarchy. */ int level; /* Maximum allowed descent tree depth */ int max_depth; /* * Keep track of total numbers of visible and dying descent cgroups. * Dying cgroups are cgroups which were deleted by a user, * but are still existing because someone else is holding a reference. * max_descendants is a maximum allowed number of descent cgroups. * * nr_descendants and nr_dying_descendants are protected * by cgroup_mutex and css_set_lock. It's fine to read them holding * any of cgroup_mutex and css_set_lock; for writing both locks * should be held. */ int nr_descendants; int nr_dying_descendants; int max_descendants; /* * Each non-empty css_set associated with this cgroup contributes * one to nr_populated_csets. The counter is zero iff this cgroup * doesn't have any tasks. * * All children which have non-zero nr_populated_csets and/or * nr_populated_children of their own contribute one to either * nr_populated_domain_children or nr_populated_threaded_children * depending on their type. Each counter is zero iff all cgroups * of the type in the subtree proper don't have any tasks. */ int nr_populated_csets; int nr_populated_domain_children; int nr_populated_threaded_children; int nr_threaded_children; /* # of live threaded child cgroups */ /* sequence number for cgroup.kill, serialized by css_set_lock. */ unsigned int kill_seq; struct kernfs_node *kn; /* cgroup kernfs entry */ struct cgroup_file procs_file; /* handle for "cgroup.procs" */ struct cgroup_file events_file; /* handle for "cgroup.events" */ /* handles for "{cpu,memory,io,irq}.pressure" */ struct cgroup_file psi_files[NR_PSI_RESOURCES]; /* * The bitmask of subsystems enabled on the child cgroups. * ->subtree_control is the one configured through * "cgroup.subtree_control" while ->subtree_ss_mask is the effective * one which may have more subsystems enabled. Controller knobs * are made available iff it's enabled in ->subtree_control. */ u16 subtree_control; u16 subtree_ss_mask; u16 old_subtree_control; u16 old_subtree_ss_mask; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; /* * Keep track of total number of dying CSSes at and below this cgroup. * Protected by cgroup_mutex. */ int nr_dying_subsys[CGROUP_SUBSYS_COUNT]; struct cgroup_root *root; /* * List of cgrp_cset_links pointing at css_sets with tasks in this * cgroup. Protected by css_set_lock. */ struct list_head cset_links; /* * On the default hierarchy, a css_set for a cgroup with some * susbsys disabled will point to css's which are associated with * the closest ancestor which has the subsys enabled. The * following lists all css_sets which point to this cgroup's css * for the given subsystem. */ struct list_head e_csets[CGROUP_SUBSYS_COUNT]; /* * If !threaded, self. If threaded, it points to the nearest * domain ancestor. Inside a threaded subtree, cgroups are exempt * from process granularity and no-internal-task constraint. * Domain level resource consumptions which aren't tied to a * specific task are charged to the dom_cgrp. */ struct cgroup *dom_cgrp; struct cgroup *old_dom_cgrp; /* used while enabling threaded */ /* * Depending on the context, this field is initialized via * css_rstat_init() at different places: * * when cgroup is the root cgroup * performed in cgroup_setup_root() * otherwise * performed in cgroup_create() */ struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu; /* * Add padding to keep the read mostly rstat per-cpu pointer on a * different cacheline than the following *bstat fields which can have * frequent updates. */ CACHELINE_PADDING(_pad_); /* cgroup basic resource statistics */ struct cgroup_base_stat last_bstat; struct cgroup_base_stat bstat; struct prev_cputime prev_cputime; /* for printing out cputime */ /* * list of pidlists, up to two for each namespace (one for procs, one * for tasks); created on demand. */ struct list_head pidlists; struct mutex pidlist_mutex; /* used to wait for offlining of csses */ wait_queue_head_t offline_waitq; /* used to schedule release agent */ struct work_struct release_agent_work; /* used to track pressure stalls */ struct psi_group *psi; /* used to store eBPF programs */ struct cgroup_bpf bpf; /* Used to store internal freezer state */ struct cgroup_freezer_state freezer; #ifdef CONFIG_BPF_SYSCALL struct bpf_local_storage __rcu *bpf_cgrp_storage; #endif /* All ancestors including self */ struct cgroup *ancestors[]; }; /* * A cgroup_root represents the root of a cgroup hierarchy, and may be * associated with a kernfs_root to form an active hierarchy. This is * internal to cgroup core. Don't access directly from controllers. */ struct cgroup_root { struct kernfs_root *kf_root; /* The bitmask of subsystems attached to this hierarchy */ unsigned int subsys_mask; /* Unique id for this hierarchy. */ int hierarchy_id; /* A list running through the active hierarchies */ struct list_head root_list; struct rcu_head rcu; /* Must be near the top */ /* * The root cgroup. The containing cgroup_root will be destroyed on its * release. cgrp->ancestors[0] will be used overflowing into the * following field. cgrp_ancestor_storage must immediately follow. */ struct cgroup cgrp; /* must follow cgrp for cgrp->ancestors[0], see above */ struct cgroup *cgrp_ancestor_storage; /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps; /* Hierarchy-specific flags */ unsigned int flags; /* The path to use for release notifications. */ char release_agent_path[PATH_MAX]; /* The name for this hierarchy - may be empty */ char name[MAX_CGROUP_ROOT_NAMELEN]; }; /* * struct cftype: handler definitions for cgroup control files * * When reading/writing to a file: * - the cgroup to use is file->f_path.dentry->d_parent->d_fsdata * - the 'cftype' of the file is file->f_path.dentry->d_fsdata */ struct cftype { /* * Name of the subsystem is prepended in cgroup_file_name(). * Zero length string indicates end of cftype array. */ char name[MAX_CFTYPE_NAME]; unsigned long private; /* * The maximum length of string, excluding trailing nul, that can * be passed to write. If < PAGE_SIZE-1, PAGE_SIZE-1 is assumed. */ size_t max_write_len; /* CFTYPE_* flags */ unsigned int flags; /* * If non-zero, should contain the offset from the start of css to * a struct cgroup_file field. cgroup will record the handle of * the created file into it. The recorded handle can be used as * long as the containing css remains accessible. */ unsigned int file_offset; /* * Fields used for internal bookkeeping. Initialized automatically * during registration. */ struct cgroup_subsys *ss; /* NULL for cgroup core files */ struct list_head node; /* anchored at ss->cfts */ struct kernfs_ops *kf_ops; int (*open)(struct kernfs_open_file *of); void (*release)(struct kernfs_open_file *of); /* * read_u64() is a shortcut for the common case of returning a * single integer. Use it in place of read() */ u64 (*read_u64)(struct cgroup_subsys_state *css, struct cftype *cft); /* * read_s64() is a signed version of read_u64() */ s64 (*read_s64)(struct cgroup_subsys_state *css, struct cftype *cft); /* generic seq_file read interface */ int (*seq_show)(struct seq_file *sf, void *v); /* optional ops, implement all or none */ void *(*seq_start)(struct seq_file *sf, loff_t *ppos); void *(*seq_next)(struct seq_file *sf, void *v, loff_t *ppos); void (*seq_stop)(struct seq_file *sf, void *v); /* * write_u64() is a shortcut for the common case of accepting * a single integer (as parsed by simple_strtoull) from * userspace. Use in place of write(); return 0 or error. */ int (*write_u64)(struct cgroup_subsys_state *css, struct cftype *cft, u64 val); /* * write_s64() is a signed version of write_u64() */ int (*write_s64)(struct cgroup_subsys_state *css, struct cftype *cft, s64 val); /* * write() is the generic write callback which maps directly to * kernfs write operation and overrides all other operations. * Maximum write size is determined by ->max_write_len. Use * of_css/cft() to access the associated css and cft. */ ssize_t (*write)(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); __poll_t (*poll)(struct kernfs_open_file *of, struct poll_table_struct *pt); struct lock_class_key lockdep_key; }; /* * Control Group subsystem type. * See Documentation/admin-guide/cgroup-v1/cgroups.rst for details */ struct cgroup_subsys { struct cgroup_subsys_state *(*css_alloc)(struct cgroup_subsys_state *parent_css); int (*css_online)(struct cgroup_subsys_state *css); void (*css_offline)(struct cgroup_subsys_state *css); void (*css_released)(struct cgroup_subsys_state *css); void (*css_free)(struct cgroup_subsys_state *css); void (*css_reset)(struct cgroup_subsys_state *css); void (*css_killed)(struct cgroup_subsys_state *css); void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu); int (*css_extra_stat_show)(struct seq_file *seq, struct cgroup_subsys_state *css); int (*css_local_stat_show)(struct seq_file *seq, struct cgroup_subsys_state *css); int (*can_attach)(struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup_taskset *tset); void (*attach)(struct cgroup_taskset *tset); void (*post_attach)(void); int (*can_fork)(struct task_struct *task, struct css_set *cset); void (*cancel_fork)(struct task_struct *task, struct css_set *cset); void (*fork)(struct task_struct *task); void (*exit)(struct task_struct *task); void (*release)(struct task_struct *task); void (*bind)(struct cgroup_subsys_state *root_css); bool early_init:1; /* * If %true, the controller, on the default hierarchy, doesn't show * up in "cgroup.controllers" or "cgroup.subtree_control", is * implicitly enabled on all cgroups on the default hierarchy, and * bypasses the "no internal process" constraint. This is for * utility type controllers which is transparent to userland. * * An implicit controller can be stolen from the default hierarchy * anytime and thus must be okay with offline csses from previous * hierarchies coexisting with csses for the current one. */ bool implicit_on_dfl:1; /* * If %true, the controller, supports threaded mode on the default * hierarchy. In a threaded subtree, both process granularity and * no-internal-process constraint are ignored and a threaded * controllers should be able to handle that. * * Note that as an implicit controller is automatically enabled on * all cgroups on the default hierarchy, it should also be * threaded. implicit && !threaded is not supported. */ bool threaded:1; /* the following two fields are initialized automatically during boot */ int id; const char *name; /* optional, initialized automatically during boot if not set */ const char *legacy_name; /* link to parent, protected by cgroup_lock() */ struct cgroup_root *root; /* idr for css->id */ struct idr css_idr; /* * List of cftypes. Each entry is the first entry of an array * terminated by zero length name. */ struct list_head cfts; /* * Base cftypes which are automatically registered. The two can * point to the same array. */ struct cftype *dfl_cftypes; /* for the default hierarchy */ struct cftype *legacy_cftypes; /* for the legacy hierarchies */ /* * A subsystem may depend on other subsystems. When such subsystem * is enabled on a cgroup, the depended-upon subsystems are enabled * together if available. Subsystems enabled due to dependency are * not visible to userland until explicitly enabled. The following * specifies the mask of subsystems that this one depends on. */ unsigned int depends_on; spinlock_t rstat_ss_lock; raw_spinlock_t __percpu *rstat_ss_cpu_lock; }; extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; struct cgroup_of_peak { unsigned long value; struct list_head list; }; /** * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups * @tsk: target task * * Allows cgroup operations to synchronize against threadgroup changes * using a percpu_rw_semaphore. */ static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) { percpu_down_read(&cgroup_threadgroup_rwsem); } /** * cgroup_threadgroup_change_end - threadgroup exclusion for cgroups * @tsk: target task * * Counterpart of cgroup_threadcgroup_change_begin(). */ static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) { percpu_up_read(&cgroup_threadgroup_rwsem); } #else /* CONFIG_CGROUPS */ #define CGROUP_SUBSYS_COUNT 0 static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) { might_sleep(); } static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {} #endif /* CONFIG_CGROUPS */ #ifdef CONFIG_SOCK_CGROUP_DATA /* * sock_cgroup_data is embedded at sock->sk_cgrp_data and contains * per-socket cgroup information except for memcg association. * * On legacy hierarchies, net_prio and net_cls controllers directly * set attributes on each sock which can then be tested by the network * layer. On the default hierarchy, each sock is associated with the * cgroup it was created in and the networking layer can match the * cgroup directly. */ struct sock_cgroup_data { struct cgroup *cgroup; /* v2 */ #ifdef CONFIG_CGROUP_NET_CLASSID u32 classid; /* v1 */ #endif #ifdef CONFIG_CGROUP_NET_PRIO u16 prioidx; /* v1 */ #endif }; static inline u16 sock_cgroup_prioidx(const struct sock_cgroup_data *skcd) { #ifdef CONFIG_CGROUP_NET_PRIO return READ_ONCE(skcd->prioidx); #else return 1; #endif } static inline u32 sock_cgroup_classid(const struct sock_cgroup_data *skcd) { #ifdef CONFIG_CGROUP_NET_CLASSID return READ_ONCE(skcd->classid); #else return 0; #endif } static inline void sock_cgroup_set_prioidx(struct sock_cgroup_data *skcd, u16 prioidx) { #ifdef CONFIG_CGROUP_NET_PRIO WRITE_ONCE(skcd->prioidx, prioidx); #endif } static inline void sock_cgroup_set_classid(struct sock_cgroup_data *skcd, u32 classid) { #ifdef CONFIG_CGROUP_NET_CLASSID WRITE_ONCE(skcd->classid, classid); #endif } #else /* CONFIG_SOCK_CGROUP_DATA */ struct sock_cgroup_data { }; #endif /* CONFIG_SOCK_CGROUP_DATA */ #endif /* _LINUX_CGROUP_DEFS_H */ |
| 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPV6 GSO/GRO offload support * Linux INET6 implementation * * TCPv6 GSO/GRO support */ #include <linux/indirect_call_wrapper.h> #include <linux/skbuff.h> #include <net/inet6_hashtables.h> #include <net/gro.h> #include <net/protocol.h> #include <net/tcp.h> #include <net/ip6_checksum.h> #include "ip6_offload.h" static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, struct tcphdr *th) { #if IS_ENABLED(CONFIG_IPV6) const struct ipv6hdr *hdr; struct sk_buff *p; struct sock *sk; struct net *net; int iif, sdif; if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST))) return; p = tcp_gro_lookup(head, th); if (p) { NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist; return; } inet6_get_iif_sdif(skb, &iif, &sdif); hdr = skb_gro_network_header(skb); net = dev_net_rcu(skb->dev); sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, &hdr->saddr, th->source, &hdr->daddr, ntohs(th->dest), iif, sdif); NAPI_GRO_CB(skb)->is_flist = !sk; if (sk) sock_gen_put(sk); #endif /* IS_ENABLED(CONFIG_IPV6) */ } INDIRECT_CALLABLE_SCOPE struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) { struct tcphdr *th; /* Don't bother verifying checksum if we're going to flush anyway. */ if (!NAPI_GRO_CB(skb)->flush && skb_gro_checksum_validate(skb, IPPROTO_TCP, ip6_gro_compute_pseudo)) goto flush; th = tcp_gro_pull_header(skb); if (!th) goto flush; tcp6_check_fraglist_gro(head, skb, th); return tcp_gro_receive(head, skb, th); flush: NAPI_GRO_CB(skb)->flush = 1; return NULL; } INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) { const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); struct tcphdr *th = tcp_hdr(skb); if (unlikely(NAPI_GRO_CB(skb)->is_flist)) { skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV6; skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count; __skb_incr_checksum_unnecessary(skb); return 0; } th->check = ~tcp_v6_check(skb->len - thoff, &iph->saddr, &iph->daddr, 0); skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV6; tcp_gro_complete(skb); return 0; } static void __tcpv6_gso_segment_csum(struct sk_buff *seg, struct in6_addr *oldip, const struct in6_addr *newip, __be16 *oldport, __be16 newport) { struct tcphdr *th = tcp_hdr(seg); if (!ipv6_addr_equal(oldip, newip)) { inet_proto_csum_replace16(&th->check, seg, oldip->s6_addr32, newip->s6_addr32, true); *oldip = *newip; } if (*oldport == newport) return; inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false); *oldport = newport; } static struct sk_buff *__tcpv6_gso_segment_list_csum(struct sk_buff *segs) { const struct tcphdr *th; const struct ipv6hdr *iph; struct sk_buff *seg; struct tcphdr *th2; struct ipv6hdr *iph2; seg = segs; th = tcp_hdr(seg); iph = ipv6_hdr(seg); th2 = tcp_hdr(seg->next); iph2 = ipv6_hdr(seg->next); if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) && ipv6_addr_equal(&iph->saddr, &iph2->saddr) && ipv6_addr_equal(&iph->daddr, &iph2->daddr)) return segs; while ((seg = seg->next)) { th2 = tcp_hdr(seg); iph2 = ipv6_hdr(seg); __tcpv6_gso_segment_csum(seg, &iph2->saddr, &iph->saddr, &th2->source, th->source); __tcpv6_gso_segment_csum(seg, &iph2->daddr, &iph->daddr, &th2->dest, th->dest); } return segs; } static struct sk_buff *__tcp6_gso_segment_list(struct sk_buff *skb, netdev_features_t features) { skb = skb_segment_list(skb, features, skb_mac_header_len(skb)); if (IS_ERR(skb)) return skb; return __tcpv6_gso_segment_list_csum(skb); } static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb, netdev_features_t features) { struct tcphdr *th; if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)) return ERR_PTR(-EINVAL); if (!pskb_may_pull(skb, sizeof(*th))) return ERR_PTR(-EINVAL); if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) { struct tcphdr *th = tcp_hdr(skb); if (skb_pagelen(skb) - th->doff * 4 == skb_shinfo(skb)->gso_size) return __tcp6_gso_segment_list(skb, features); skb->ip_summed = CHECKSUM_NONE; } if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) { const struct ipv6hdr *ipv6h = ipv6_hdr(skb); struct tcphdr *th = tcp_hdr(skb); /* Set up pseudo header, usually expect stack to have done * this. */ th->check = 0; skb->ip_summed = CHECKSUM_PARTIAL; __tcp_v6_send_check(skb, &ipv6h->saddr, &ipv6h->daddr); } return tcp_gso_segment(skb, features); } int __init tcpv6_offload_init(void) { net_hotdata.tcpv6_offload = (struct net_offload) { .callbacks = { .gso_segment = tcp6_gso_segment, .gro_receive = tcp6_gro_receive, .gro_complete = tcp6_gro_complete, }, }; return inet6_add_offload(&net_hotdata.tcpv6_offload, IPPROTO_TCP); } |
| 12 293 13713 13963 13713 16 1064 345 29 26 1170 2739 59 90 15 49 13 19 11 10 344 194 167 199 1055 321 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_CGROUP_H #define _LINUX_CGROUP_H /* * cgroup interface * * Copyright (C) 2003 BULL SA * Copyright (C) 2004-2006 Silicon Graphics, Inc. * */ #include <linux/sched.h> #include <linux/nodemask.h> #include <linux/list.h> #include <linux/rculist.h> #include <linux/cgroupstats.h> #include <linux/fs.h> #include <linux/seq_file.h> #include <linux/kernfs.h> #include <linux/jump_label.h> #include <linux/types.h> #include <linux/notifier.h> #include <linux/ns_common.h> #include <linux/nsproxy.h> #include <linux/user_namespace.h> #include <linux/refcount.h> #include <linux/kernel_stat.h> #include <linux/cgroup-defs.h> struct kernel_clone_args; /* * All weight knobs on the default hierarchy should use the following min, * default and max values. The default value is the logarithmic center of * MIN and MAX and allows 100x to be expressed in both directions. */ #define CGROUP_WEIGHT_MIN 1 #define CGROUP_WEIGHT_DFL 100 #define CGROUP_WEIGHT_MAX 10000 #ifdef CONFIG_CGROUPS enum css_task_iter_flags { CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */ CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */ CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */ }; /* a css_task_iter should be treated as an opaque object */ struct css_task_iter { struct cgroup_subsys *ss; unsigned int flags; struct list_head *cset_pos; struct list_head *cset_head; struct list_head *tcset_pos; struct list_head *tcset_head; struct list_head *task_pos; struct list_head *cur_tasks_head; struct css_set *cur_cset; struct css_set *cur_dcset; struct task_struct *cur_task; struct list_head iters_node; /* css_set->task_iters */ }; enum cgroup_lifetime_events { CGROUP_LIFETIME_ONLINE, CGROUP_LIFETIME_OFFLINE, }; extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set; extern spinlock_t css_set_lock; extern struct blocking_notifier_head cgroup_lifetime_notifier; #define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys; #include <linux/cgroup_subsys.h> #undef SUBSYS #define SUBSYS(_x) \ extern struct static_key_true _x ## _cgrp_subsys_enabled_key; \ extern struct static_key_true _x ## _cgrp_subsys_on_dfl_key; #include <linux/cgroup_subsys.h> #undef SUBSYS /** * cgroup_subsys_enabled - fast test on whether a subsys is enabled * @ss: subsystem in question */ #define cgroup_subsys_enabled(ss) \ static_branch_likely(&ss ## _enabled_key) /** * cgroup_subsys_on_dfl - fast test on whether a subsys is on default hierarchy * @ss: subsystem in question */ #define cgroup_subsys_on_dfl(ss) \ static_branch_likely(&ss ## _on_dfl_key) bool css_has_online_children(struct cgroup_subsys_state *css); struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss); struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgroup, struct cgroup_subsys *ss); struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgroup, struct cgroup_subsys *ss); struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry, struct cgroup_subsys *ss); struct cgroup *cgroup_get_from_path(const char *path); struct cgroup *cgroup_get_from_fd(int fd); struct cgroup *cgroup_v1v2_get_from_fd(int fd); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); int cgroup_rm_cftypes(struct cftype *cfts); void cgroup_file_notify(struct cgroup_file *cfile); void cgroup_file_show(struct cgroup_file *cfile, bool show); int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); void cgroup_fork(struct task_struct *p); extern int cgroup_can_fork(struct task_struct *p, struct kernel_clone_args *kargs); extern void cgroup_cancel_fork(struct task_struct *p, struct kernel_clone_args *kargs); extern void cgroup_post_fork(struct task_struct *p, struct kernel_clone_args *kargs); void cgroup_exit(struct task_struct *p); void cgroup_release(struct task_struct *p); void cgroup_free(struct task_struct *p); int cgroup_init_early(void); int cgroup_init(void); int cgroup_parse_float(const char *input, unsigned dec_shift, s64 *v); /* * Iteration helpers and macros. */ struct cgroup_subsys_state *css_next_child(struct cgroup_subsys_state *pos, struct cgroup_subsys_state *parent); struct cgroup_subsys_state *css_next_descendant_pre(struct cgroup_subsys_state *pos, struct cgroup_subsys_state *css); struct cgroup_subsys_state *css_rightmost_descendant(struct cgroup_subsys_state *pos); struct cgroup_subsys_state *css_next_descendant_post(struct cgroup_subsys_state *pos, struct cgroup_subsys_state *css); struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset, struct cgroup_subsys_state **dst_cssp); struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset, struct cgroup_subsys_state **dst_cssp); void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, struct css_task_iter *it); struct task_struct *css_task_iter_next(struct css_task_iter *it); void css_task_iter_end(struct css_task_iter *it); /** * css_for_each_child - iterate through children of a css * @pos: the css * to use as the loop cursor * @parent: css whose children to walk * * Walk @parent's children. Must be called under rcu_read_lock(). * * If a subsystem synchronizes ->css_online() and the start of iteration, a * css which finished ->css_online() is guaranteed to be visible in the * future iterations and will stay visible until the last reference is put. * A css which hasn't finished ->css_online() or already finished * ->css_offline() may show up during traversal. It's each subsystem's * responsibility to synchronize against on/offlining. * * It is allowed to temporarily drop RCU read lock during iteration. The * caller is responsible for ensuring that @pos remains accessible until * the start of the next iteration by, for example, bumping the css refcnt. */ #define css_for_each_child(pos, parent) \ for ((pos) = css_next_child(NULL, (parent)); (pos); \ (pos) = css_next_child((pos), (parent))) /** * css_for_each_descendant_pre - pre-order walk of a css's descendants * @pos: the css * to use as the loop cursor * @root: css whose descendants to walk * * Walk @root's descendants. @root is included in the iteration and the * first node to be visited. Must be called under rcu_read_lock(). * * If a subsystem synchronizes ->css_online() and the start of iteration, a * css which finished ->css_online() is guaranteed to be visible in the * future iterations and will stay visible until the last reference is put. * A css which hasn't finished ->css_online() or already finished * ->css_offline() may show up during traversal. It's each subsystem's * responsibility to synchronize against on/offlining. * * For example, the following guarantees that a descendant can't escape * state updates of its ancestors. * * my_online(@css) * { * Lock @css's parent and @css; * Inherit state from the parent; * Unlock both. * } * * my_update_state(@css) * { * css_for_each_descendant_pre(@pos, @css) { * Lock @pos; * if (@pos == @css) * Update @css's state; * else * Verify @pos is alive and inherit state from its parent; * Unlock @pos; * } * } * * As long as the inheriting step, including checking the parent state, is * enclosed inside @pos locking, double-locking the parent isn't necessary * while inheriting. The state update to the parent is guaranteed to be * visible by walking order and, as long as inheriting operations to the * same @pos are atomic to each other, multiple updates racing each other * still result in the correct state. It's guaranateed that at least one * inheritance happens for any css after the latest update to its parent. * * If checking parent's state requires locking the parent, each inheriting * iteration should lock and unlock both @pos->parent and @pos. * * Alternatively, a subsystem may choose to use a single global lock to * synchronize ->css_online() and ->css_offline() against tree-walking * operations. * * It is allowed to temporarily drop RCU read lock during iteration. The * caller is responsible for ensuring that @pos remains accessible until * the start of the next iteration by, for example, bumping the css refcnt. */ #define css_for_each_descendant_pre(pos, css) \ for ((pos) = css_next_descendant_pre(NULL, (css)); (pos); \ (pos) = css_next_descendant_pre((pos), (css))) /** * css_for_each_descendant_post - post-order walk of a css's descendants * @pos: the css * to use as the loop cursor * @css: css whose descendants to walk * * Similar to css_for_each_descendant_pre() but performs post-order * traversal instead. @root is included in the iteration and the last * node to be visited. * * If a subsystem synchronizes ->css_online() and the start of iteration, a * css which finished ->css_online() is guaranteed to be visible in the * future iterations and will stay visible until the last reference is put. * A css which hasn't finished ->css_online() or already finished * ->css_offline() may show up during traversal. It's each subsystem's * responsibility to synchronize against on/offlining. * * Note that the walk visibility guarantee example described in pre-order * walk doesn't apply the same to post-order walks. */ #define css_for_each_descendant_post(pos, css) \ for ((pos) = css_next_descendant_post(NULL, (css)); (pos); \ (pos) = css_next_descendant_post((pos), (css))) /** * cgroup_taskset_for_each - iterate cgroup_taskset * @task: the loop cursor * @dst_css: the destination css * @tset: taskset to iterate * * @tset may contain multiple tasks and they may belong to multiple * processes. * * On the v2 hierarchy, there may be tasks from multiple processes and they * may not share the source or destination csses. * * On traditional hierarchies, when there are multiple tasks in @tset, if a * task of a process is in @tset, all tasks of the process are in @tset. * Also, all are guaranteed to share the same source and destination csses. * * Iteration is not in any specific order. */ #define cgroup_taskset_for_each(task, dst_css, tset) \ for ((task) = cgroup_taskset_first((tset), &(dst_css)); \ (task); \ (task) = cgroup_taskset_next((tset), &(dst_css))) /** * cgroup_taskset_for_each_leader - iterate group leaders in a cgroup_taskset * @leader: the loop cursor * @dst_css: the destination css * @tset: taskset to iterate * * Iterate threadgroup leaders of @tset. For single-task migrations, @tset * may not contain any. */ #define cgroup_taskset_for_each_leader(leader, dst_css, tset) \ for ((leader) = cgroup_taskset_first((tset), &(dst_css)); \ (leader); \ (leader) = cgroup_taskset_next((tset), &(dst_css))) \ if ((leader) != (leader)->group_leader) \ ; \ else /* * Inline functions. */ #ifdef CONFIG_DEBUG_CGROUP_REF void css_get(struct cgroup_subsys_state *css); void css_get_many(struct cgroup_subsys_state *css, unsigned int n); bool css_tryget(struct cgroup_subsys_state *css); bool css_tryget_online(struct cgroup_subsys_state *css); void css_put(struct cgroup_subsys_state *css); void css_put_many(struct cgroup_subsys_state *css, unsigned int n); #else #define CGROUP_REF_FN_ATTRS static inline #define CGROUP_REF_EXPORT(fn) #include <linux/cgroup_refcnt.h> #endif static inline u64 cgroup_id(const struct cgroup *cgrp) { return cgrp->kn->id; } /** * css_is_dying - test whether the specified css is dying * @css: target css * * Test whether @css is in the process of offlining or already offline. In * most cases, ->css_online() and ->css_offline() callbacks should be * enough; however, the actual offline operations are RCU delayed and this * test returns %true also when @css is scheduled to be offlined. * * This is useful, for example, when the use case requires synchronous * behavior with respect to cgroup removal. cgroup removal schedules css * offlining but the css can seem alive while the operation is being * delayed. If the delay affects user visible semantics, this test can be * used to resolve the situation. */ static inline bool css_is_dying(struct cgroup_subsys_state *css) { return css->flags & CSS_DYING; } static inline bool css_is_self(struct cgroup_subsys_state *css) { if (css == &css->cgroup->self) { /* cgroup::self should not have subsystem association */ WARN_ON(css->ss != NULL); return true; } return false; } static inline void cgroup_get(struct cgroup *cgrp) { css_get(&cgrp->self); } static inline bool cgroup_tryget(struct cgroup *cgrp) { return css_tryget(&cgrp->self); } static inline void cgroup_put(struct cgroup *cgrp) { css_put(&cgrp->self); } extern struct mutex cgroup_mutex; static inline void cgroup_lock(void) { mutex_lock(&cgroup_mutex); } static inline void cgroup_unlock(void) { mutex_unlock(&cgroup_mutex); } /** * task_css_set_check - obtain a task's css_set with extra access conditions * @task: the task to obtain css_set for * @__c: extra condition expression to be passed to rcu_dereference_check() * * A task's css_set is RCU protected, initialized and exited while holding * task_lock(), and can only be modified while holding both cgroup_mutex * and task_lock() while the task is alive. This macro verifies that the * caller is inside proper critical section and returns @task's css_set. * * The caller can also specify additional allowed conditions via @__c, such * as locks used during the cgroup_subsys::attach() methods. */ #ifdef CONFIG_PROVE_RCU #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ rcu_read_lock_sched_held() || \ lockdep_is_held(&cgroup_mutex) || \ lockdep_is_held(&css_set_lock) || \ ((task)->flags & PF_EXITING) || (__c)) #else #define task_css_set_check(task, __c) \ rcu_dereference((task)->cgroups) #endif /** * task_css_check - obtain css for (task, subsys) w/ extra access conds * @task: the target task * @subsys_id: the target subsystem ID * @__c: extra condition expression to be passed to rcu_dereference_check() * * Return the cgroup_subsys_state for the (@task, @subsys_id) pair. The * synchronization rules are the same as task_css_set_check(). */ #define task_css_check(task, subsys_id, __c) \ task_css_set_check((task), (__c))->subsys[(subsys_id)] /** * task_css_set - obtain a task's css_set * @task: the task to obtain css_set for * * See task_css_set_check(). */ static inline struct css_set *task_css_set(struct task_struct *task) { return task_css_set_check(task, false); } /** * task_css - obtain css for (task, subsys) * @task: the target task * @subsys_id: the target subsystem ID * * See task_css_check(). */ static inline struct cgroup_subsys_state *task_css(struct task_struct *task, int subsys_id) { return task_css_check(task, subsys_id, false); } /** * task_get_css - find and get the css for (task, subsys) * @task: the target task * @subsys_id: the target subsystem ID * * Find the css for the (@task, @subsys_id) combination, increment a * reference on and return it. This function is guaranteed to return a * valid css. The returned css may already have been offlined. */ static inline struct cgroup_subsys_state * task_get_css(struct task_struct *task, int subsys_id) { struct cgroup_subsys_state *css; rcu_read_lock(); while (true) { css = task_css(task, subsys_id); /* * Can't use css_tryget_online() here. A task which has * PF_EXITING set may stay associated with an offline css. * If such task calls this function, css_tryget_online() * will keep failing. */ if (likely(css_tryget(css))) break; cpu_relax(); } rcu_read_unlock(); return css; } /** * task_css_is_root - test whether a task belongs to the root css * @task: the target task * @subsys_id: the target subsystem ID * * Test whether @task belongs to the root css on the specified subsystem. * May be invoked in any context. */ static inline bool task_css_is_root(struct task_struct *task, int subsys_id) { return task_css_check(task, subsys_id, true) == init_css_set.subsys[subsys_id]; } static inline struct cgroup *task_cgroup(struct task_struct *task, int subsys_id) { return task_css(task, subsys_id)->cgroup; } static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) { return task_css_set(task)->dfl_cgrp; } static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) { struct cgroup_subsys_state *parent_css = cgrp->self.parent; if (parent_css) return container_of(parent_css, struct cgroup, self); return NULL; } /** * cgroup_is_descendant - test ancestry * @cgrp: the cgroup to be tested * @ancestor: possible ancestor of @cgrp * * Test whether @cgrp is a descendant of @ancestor. It also returns %true * if @cgrp == @ancestor. This function is safe to call as long as @cgrp * and @ancestor are accessible. */ static inline bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor) { if (cgrp->root != ancestor->root || cgrp->level < ancestor->level) return false; return cgrp->ancestors[ancestor->level] == ancestor; } /** * cgroup_ancestor - find ancestor of cgroup * @cgrp: cgroup to find ancestor of * @ancestor_level: level of ancestor to find starting from root * * Find ancestor of cgroup at specified level starting from root if it exists * and return pointer to it. Return NULL if @cgrp doesn't have ancestor at * @ancestor_level. * * This function is safe to call as long as @cgrp is accessible. */ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp, int ancestor_level) { if (ancestor_level < 0 || ancestor_level > cgrp->level) return NULL; return cgrp->ancestors[ancestor_level]; } /** * task_under_cgroup_hierarchy - test task's membership of cgroup ancestry * @task: the task to be tested * @ancestor: possible ancestor of @task's cgroup * * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor. * It follows all the same rules as cgroup_is_descendant, and only applies * to the default hierarchy. */ static inline bool task_under_cgroup_hierarchy(struct task_struct *task, struct cgroup *ancestor) { struct css_set *cset = task_css_set(task); return cgroup_is_descendant(cset->dfl_cgrp, ancestor); } /* no synchronization, the result can only be used as a hint */ static inline bool cgroup_is_populated(struct cgroup *cgrp) { return cgrp->nr_populated_csets + cgrp->nr_populated_domain_children + cgrp->nr_populated_threaded_children; } /* returns ino associated with a cgroup */ static inline ino_t cgroup_ino(struct cgroup *cgrp) { return kernfs_ino(cgrp->kn); } /* cft/css accessors for cftype->write() operation */ static inline struct cftype *of_cft(struct kernfs_open_file *of) { return of->kn->priv; } struct cgroup_subsys_state *of_css(struct kernfs_open_file *of); /* cft/css accessors for cftype->seq_*() operations */ static inline struct cftype *seq_cft(struct seq_file *seq) { return of_cft(seq->private); } static inline struct cgroup_subsys_state *seq_css(struct seq_file *seq) { return of_css(seq->private); } /* * Name / path handling functions. All are thin wrappers around the kernfs * counterparts and can be called under any context. */ static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen) { return kernfs_name(cgrp->kn, buf, buflen); } static inline int cgroup_path(struct cgroup *cgrp, char *buf, size_t buflen) { return kernfs_path(cgrp->kn, buf, buflen); } static inline void pr_cont_cgroup_name(struct cgroup *cgrp) { pr_cont_kernfs_name(cgrp->kn); } static inline void pr_cont_cgroup_path(struct cgroup *cgrp) { pr_cont_kernfs_path(cgrp->kn); } bool cgroup_psi_enabled(void); static inline void cgroup_init_kthreadd(void) { /* * kthreadd is inherited by all kthreads, keep it in the root so * that the new kthreads are guaranteed to stay in the root until * initialization is finished. */ current->no_cgroup_migration = 1; } static inline void cgroup_kthread_ready(void) { /* * This kthread finished initialization. The creator should have * set PF_NO_SETAFFINITY if this kthread should stay in the root. */ current->no_cgroup_migration = 0; } void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen); struct cgroup *cgroup_get_from_id(u64 id); #else /* !CONFIG_CGROUPS */ struct cgroup_subsys_state; struct cgroup; static inline u64 cgroup_id(const struct cgroup *cgrp) { return 1; } static inline void css_get(struct cgroup_subsys_state *css) {} static inline void css_put(struct cgroup_subsys_state *css) {} static inline void cgroup_lock(void) {} static inline void cgroup_unlock(void) {} static inline int cgroup_attach_task_all(struct task_struct *from, struct task_struct *t) { return 0; } static inline int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry) { return -EINVAL; } static inline void cgroup_fork(struct task_struct *p) {} static inline int cgroup_can_fork(struct task_struct *p, struct kernel_clone_args *kargs) { return 0; } static inline void cgroup_cancel_fork(struct task_struct *p, struct kernel_clone_args *kargs) {} static inline void cgroup_post_fork(struct task_struct *p, struct kernel_clone_args *kargs) {} static inline void cgroup_exit(struct task_struct *p) {} static inline void cgroup_release(struct task_struct *p) {} static inline void cgroup_free(struct task_struct *p) {} static inline int cgroup_init_early(void) { return 0; } static inline int cgroup_init(void) { return 0; } static inline void cgroup_init_kthreadd(void) {} static inline void cgroup_kthread_ready(void) {} static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) { return NULL; } static inline bool cgroup_psi_enabled(void) { return false; } static inline bool task_under_cgroup_hierarchy(struct task_struct *task, struct cgroup *ancestor) { return true; } static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) {} #endif /* !CONFIG_CGROUPS */ #ifdef CONFIG_CGROUPS /* * cgroup scalable recursive statistics. */ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu); void css_rstat_flush(struct cgroup_subsys_state *css); /* * Basic resource stats. */ #ifdef CONFIG_CGROUP_CPUACCT void cpuacct_charge(struct task_struct *tsk, u64 cputime); void cpuacct_account_field(struct task_struct *tsk, int index, u64 val); #else static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {} static inline void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) {} #endif void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec); void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec); static inline void cgroup_account_cputime(struct task_struct *task, u64 delta_exec) { struct cgroup *cgrp; cpuacct_charge(task, delta_exec); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); } static inline void cgroup_account_cputime_field(struct task_struct *task, enum cpu_usage_stat index, u64 delta_exec) { struct cgroup *cgrp; cpuacct_account_field(task, index, delta_exec); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); } #else /* CONFIG_CGROUPS */ static inline void cgroup_account_cputime(struct task_struct *task, u64 delta_exec) {} static inline void cgroup_account_cputime_field(struct task_struct *task, enum cpu_usage_stat index, u64 delta_exec) {} #endif /* CONFIG_CGROUPS */ /* * sock->sk_cgrp_data handling. For more info, see sock_cgroup_data * definition in cgroup-defs.h. */ #ifdef CONFIG_SOCK_CGROUP_DATA void cgroup_sk_alloc(struct sock_cgroup_data *skcd); void cgroup_sk_clone(struct sock_cgroup_data *skcd); void cgroup_sk_free(struct sock_cgroup_data *skcd); static inline struct cgroup *sock_cgroup_ptr(struct sock_cgroup_data *skcd) { return skcd->cgroup; } #else /* CONFIG_CGROUP_DATA */ static inline void cgroup_sk_alloc(struct sock_cgroup_data *skcd) {} static inline void cgroup_sk_clone(struct sock_cgroup_data *skcd) {} static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {} #endif /* CONFIG_CGROUP_DATA */ struct cgroup_namespace { struct ns_common ns; struct user_namespace *user_ns; struct ucounts *ucounts; struct css_set *root_cset; }; extern struct cgroup_namespace init_cgroup_ns; #ifdef CONFIG_CGROUPS void free_cgroup_ns(struct cgroup_namespace *ns); struct cgroup_namespace *copy_cgroup_ns(unsigned long flags, struct user_namespace *user_ns, struct cgroup_namespace *old_ns); int cgroup_path_ns(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns); static inline void get_cgroup_ns(struct cgroup_namespace *ns) { refcount_inc(&ns->ns.count); } static inline void put_cgroup_ns(struct cgroup_namespace *ns) { if (refcount_dec_and_test(&ns->ns.count)) free_cgroup_ns(ns); } #else /* !CONFIG_CGROUPS */ static inline void free_cgroup_ns(struct cgroup_namespace *ns) { } static inline struct cgroup_namespace * copy_cgroup_ns(unsigned long flags, struct user_namespace *user_ns, struct cgroup_namespace *old_ns) { return old_ns; } static inline void get_cgroup_ns(struct cgroup_namespace *ns) { } static inline void put_cgroup_ns(struct cgroup_namespace *ns) { } #endif /* !CONFIG_CGROUPS */ #ifdef CONFIG_CGROUPS void cgroup_enter_frozen(void); void cgroup_leave_frozen(bool always_leave); void cgroup_update_frozen(struct cgroup *cgrp); void cgroup_freeze(struct cgroup *cgrp, bool freeze); void cgroup_freezer_migrate_task(struct task_struct *task, struct cgroup *src, struct cgroup *dst); static inline bool cgroup_task_frozen(struct task_struct *task) { return task->frozen; } #else /* !CONFIG_CGROUPS */ static inline void cgroup_enter_frozen(void) { } static inline void cgroup_leave_frozen(bool always_leave) { } static inline bool cgroup_task_frozen(struct task_struct *task) { return false; } #endif /* !CONFIG_CGROUPS */ #ifdef CONFIG_CGROUP_BPF static inline void cgroup_bpf_get(struct cgroup *cgrp) { percpu_ref_get(&cgrp->bpf.refcnt); } static inline void cgroup_bpf_put(struct cgroup *cgrp) { percpu_ref_put(&cgrp->bpf.refcnt); } #else /* CONFIG_CGROUP_BPF */ static inline void cgroup_bpf_get(struct cgroup *cgrp) {} static inline void cgroup_bpf_put(struct cgroup *cgrp) {} #endif /* CONFIG_CGROUP_BPF */ struct cgroup *task_get_cgroup1(struct task_struct *tsk, int hierarchy_id); struct cgroup_of_peak *of_peak(struct kernfs_open_file *of); #endif /* _LINUX_CGROUP_H */ |
| 44 44 33 42 44 49 48 48 48 49 49 49 49 49 49 49 49 49 44 29 29 29 29 29 44 44 12 44 44 44 44 44 44 44 44 44 44 44 43 44 44 43 44 44 44 35 33 33 2 35 44 44 44 28 28 28 28 28 28 28 28 28 28 28 44 44 44 44 44 44 44 44 44 44 2 43 44 44 44 44 44 44 44 33 25 25 33 25 33 33 33 33 33 33 33 33 33 33 33 33 44 43 33 7 7 33 33 33 32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 | /* SPDX-License-Identifier: GPL-2.0 * * page_pool.c * Author: Jesper Dangaard Brouer <netoptimizer@brouer.com> * Copyright (C) 2016 Red Hat, Inc. */ #include <linux/error-injection.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/device.h> #include <net/netdev_lock.h> #include <net/netdev_rx_queue.h> #include <net/page_pool/helpers.h> #include <net/page_pool/memory_provider.h> #include <net/xdp.h> #include <linux/dma-direction.h> #include <linux/dma-mapping.h> #include <linux/page-flags.h> #include <linux/mm.h> /* for put_page() */ #include <linux/poison.h> #include <linux/ethtool.h> #include <linux/netdevice.h> #include <trace/events/page_pool.h> #include "dev.h" #include "mp_dmabuf_devmem.h" #include "netmem_priv.h" #include "page_pool_priv.h" DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) #define BIAS_MAX (LONG_MAX >> 1) #ifdef CONFIG_PAGE_POOL_STATS static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats); /* alloc_stat_inc is intended to be used in softirq context */ #define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++) /* recycle_stat_inc is safe to use when preemption is possible. */ #define recycle_stat_inc(pool, __stat) \ do { \ struct page_pool_recycle_stats __percpu *s = pool->recycle_stats; \ this_cpu_inc(s->__stat); \ } while (0) #define recycle_stat_add(pool, __stat, val) \ do { \ struct page_pool_recycle_stats __percpu *s = pool->recycle_stats; \ this_cpu_add(s->__stat, val); \ } while (0) static const char pp_stats[][ETH_GSTRING_LEN] = { "rx_pp_alloc_fast", "rx_pp_alloc_slow", "rx_pp_alloc_slow_ho", "rx_pp_alloc_empty", "rx_pp_alloc_refill", "rx_pp_alloc_waive", "rx_pp_recycle_cached", "rx_pp_recycle_cache_full", "rx_pp_recycle_ring", "rx_pp_recycle_ring_full", "rx_pp_recycle_released_ref", }; /** * page_pool_get_stats() - fetch page pool stats * @pool: pool from which page was allocated * @stats: struct page_pool_stats to fill in * * Retrieve statistics about the page_pool. This API is only available * if the kernel has been configured with ``CONFIG_PAGE_POOL_STATS=y``. * A pointer to a caller allocated struct page_pool_stats structure * is passed to this API which is filled in. The caller can then report * those stats to the user (perhaps via ethtool, debugfs, etc.). */ bool page_pool_get_stats(const struct page_pool *pool, struct page_pool_stats *stats) { int cpu = 0; if (!stats) return false; /* The caller is responsible to initialize stats. */ stats->alloc_stats.fast += pool->alloc_stats.fast; stats->alloc_stats.slow += pool->alloc_stats.slow; stats->alloc_stats.slow_high_order += pool->alloc_stats.slow_high_order; stats->alloc_stats.empty += pool->alloc_stats.empty; stats->alloc_stats.refill += pool->alloc_stats.refill; stats->alloc_stats.waive += pool->alloc_stats.waive; for_each_possible_cpu(cpu) { const struct page_pool_recycle_stats *pcpu = per_cpu_ptr(pool->recycle_stats, cpu); stats->recycle_stats.cached += pcpu->cached; stats->recycle_stats.cache_full += pcpu->cache_full; stats->recycle_stats.ring += pcpu->ring; stats->recycle_stats.ring_full += pcpu->ring_full; stats->recycle_stats.released_refcnt += pcpu->released_refcnt; } return true; } EXPORT_SYMBOL(page_pool_get_stats); u8 *page_pool_ethtool_stats_get_strings(u8 *data) { int i; for (i = 0; i < ARRAY_SIZE(pp_stats); i++) { memcpy(data, pp_stats[i], ETH_GSTRING_LEN); data += ETH_GSTRING_LEN; } return data; } EXPORT_SYMBOL(page_pool_ethtool_stats_get_strings); int page_pool_ethtool_stats_get_count(void) { return ARRAY_SIZE(pp_stats); } EXPORT_SYMBOL(page_pool_ethtool_stats_get_count); u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats) { const struct page_pool_stats *pool_stats = stats; *data++ = pool_stats->alloc_stats.fast; *data++ = pool_stats->alloc_stats.slow; *data++ = pool_stats->alloc_stats.slow_high_order; *data++ = pool_stats->alloc_stats.empty; *data++ = pool_stats->alloc_stats.refill; *data++ = pool_stats->alloc_stats.waive; *data++ = pool_stats->recycle_stats.cached; *data++ = pool_stats->recycle_stats.cache_full; *data++ = pool_stats->recycle_stats.ring; *data++ = pool_stats->recycle_stats.ring_full; *data++ = pool_stats->recycle_stats.released_refcnt; return data; } EXPORT_SYMBOL(page_pool_ethtool_stats_get); #else #define alloc_stat_inc(...) do { } while (0) #define recycle_stat_inc(...) do { } while (0) #define recycle_stat_add(...) do { } while (0) #endif static bool page_pool_producer_lock(struct page_pool *pool) __acquires(&pool->ring.producer_lock) { bool in_softirq = in_softirq(); if (in_softirq) spin_lock(&pool->ring.producer_lock); else spin_lock_bh(&pool->ring.producer_lock); return in_softirq; } static void page_pool_producer_unlock(struct page_pool *pool, bool in_softirq) __releases(&pool->ring.producer_lock) { if (in_softirq) spin_unlock(&pool->ring.producer_lock); else spin_unlock_bh(&pool->ring.producer_lock); } static void page_pool_struct_check(void) { CACHELINE_ASSERT_GROUP_MEMBER(struct page_pool, frag, frag_users); CACHELINE_ASSERT_GROUP_MEMBER(struct page_pool, frag, frag_page); CACHELINE_ASSERT_GROUP_MEMBER(struct page_pool, frag, frag_offset); CACHELINE_ASSERT_GROUP_SIZE(struct page_pool, frag, PAGE_POOL_FRAG_GROUP_ALIGN); } static int page_pool_init(struct page_pool *pool, const struct page_pool_params *params, int cpuid) { unsigned int ring_qsize = 1024; /* Default */ struct netdev_rx_queue *rxq; int err; page_pool_struct_check(); memcpy(&pool->p, ¶ms->fast, sizeof(pool->p)); memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow)); pool->cpuid = cpuid; pool->dma_sync_for_cpu = true; /* Validate only known flags were used */ if (pool->slow.flags & ~PP_FLAG_ALL) return -EINVAL; if (pool->p.pool_size) ring_qsize = pool->p.pool_size; /* Sanity limit mem that can be pinned down */ if (ring_qsize > 32768) return -E2BIG; /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. * DMA_BIDIRECTIONAL is for allowing page used for DMA sending, * which is the XDP_TX use-case. */ if (pool->slow.flags & PP_FLAG_DMA_MAP) { if ((pool->p.dma_dir != DMA_FROM_DEVICE) && (pool->p.dma_dir != DMA_BIDIRECTIONAL)) return -EINVAL; pool->dma_map = true; } if (pool->slow.flags & PP_FLAG_DMA_SYNC_DEV) { /* In order to request DMA-sync-for-device the page * needs to be mapped */ if (!(pool->slow.flags & PP_FLAG_DMA_MAP)) return -EINVAL; if (!pool->p.max_len) return -EINVAL; pool->dma_sync = true; /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data */ } pool->has_init_callback = !!pool->slow.init_callback; #ifdef CONFIG_PAGE_POOL_STATS if (!(pool->slow.flags & PP_FLAG_SYSTEM_POOL)) { pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats); if (!pool->recycle_stats) return -ENOMEM; } else { /* For system page pool instance we use a singular stats object * instead of allocating a separate percpu variable for each * (also percpu) page pool instance. */ pool->recycle_stats = &pp_system_recycle_stats; pool->system = true; } #endif if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0) { #ifdef CONFIG_PAGE_POOL_STATS if (!pool->system) free_percpu(pool->recycle_stats); #endif return -ENOMEM; } atomic_set(&pool->pages_state_release_cnt, 0); /* Driver calling page_pool_create() also call page_pool_destroy() */ refcount_set(&pool->user_cnt, 1); xa_init_flags(&pool->dma_mapped, XA_FLAGS_ALLOC1); if (pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) { netdev_assert_locked(pool->slow.netdev); rxq = __netif_get_rx_queue(pool->slow.netdev, pool->slow.queue_idx); pool->mp_priv = rxq->mp_params.mp_priv; pool->mp_ops = rxq->mp_params.mp_ops; } if (pool->mp_ops) { if (!pool->dma_map || !pool->dma_sync) return -EOPNOTSUPP; if (WARN_ON(!is_kernel_rodata((unsigned long)pool->mp_ops))) { err = -EFAULT; goto free_ptr_ring; } err = pool->mp_ops->init(pool); if (err) { pr_warn("%s() mem-provider init failed %d\n", __func__, err); goto free_ptr_ring; } static_branch_inc(&page_pool_mem_providers); } return 0; free_ptr_ring: ptr_ring_cleanup(&pool->ring, NULL); #ifdef CONFIG_PAGE_POOL_STATS if (!pool->system) free_percpu(pool->recycle_stats); #endif return err; } static void page_pool_uninit(struct page_pool *pool) { ptr_ring_cleanup(&pool->ring, NULL); xa_destroy(&pool->dma_mapped); #ifdef CONFIG_PAGE_POOL_STATS if (!pool->system) free_percpu(pool->recycle_stats); #endif } /** * page_pool_create_percpu() - create a page pool for a given cpu. * @params: parameters, see struct page_pool_params * @cpuid: cpu identifier */ struct page_pool * page_pool_create_percpu(const struct page_pool_params *params, int cpuid) { struct page_pool *pool; int err; pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, params->nid); if (!pool) return ERR_PTR(-ENOMEM); err = page_pool_init(pool, params, cpuid); if (err < 0) goto err_free; err = page_pool_list(pool); if (err) goto err_uninit; return pool; err_uninit: page_pool_uninit(pool); err_free: pr_warn("%s() gave up with errno %d\n", __func__, err); kfree(pool); return ERR_PTR(err); } EXPORT_SYMBOL(page_pool_create_percpu); /** * page_pool_create() - create a page pool * @params: parameters, see struct page_pool_params */ struct page_pool *page_pool_create(const struct page_pool_params *params) { return page_pool_create_percpu(params, -1); } EXPORT_SYMBOL(page_pool_create); static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem); static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) { struct ptr_ring *r = &pool->ring; netmem_ref netmem; int pref_nid; /* preferred NUMA node */ /* Quicker fallback, avoid locks when ring is empty */ if (__ptr_ring_empty(r)) { alloc_stat_inc(pool, empty); return 0; } /* Softirq guarantee CPU and thus NUMA node is stable. This, * assumes CPU refilling driver RX-ring will also run RX-NAPI. */ #ifdef CONFIG_NUMA pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : pool->p.nid; #else /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */ pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */ #endif /* Refill alloc array, but only if NUMA match */ do { netmem = (__force netmem_ref)__ptr_ring_consume(r); if (unlikely(!netmem)) break; if (likely(netmem_is_pref_nid(netmem, pref_nid))) { pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; * (1) release 1 page to page-allocator and * (2) break out to fallthrough to alloc_pages_node. * This limit stress on page buddy alloactor. */ page_pool_return_page(pool, netmem); alloc_stat_inc(pool, waive); netmem = 0; break; } } while (pool->alloc.count < PP_ALLOC_CACHE_REFILL); /* Return last page */ if (likely(pool->alloc.count > 0)) { netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, refill); } return netmem; } /* fast path */ static netmem_ref __page_pool_get_cached(struct page_pool *pool) { netmem_ref netmem; /* Caller MUST guarantee safe non-concurrent access, e.g. softirq */ if (likely(pool->alloc.count)) { /* Fast-path */ netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, fast); } else { netmem = page_pool_refill_alloc_cache(pool); } return netmem; } static void __page_pool_dma_sync_for_device(const struct page_pool *pool, netmem_ref netmem, u32 dma_sync_size) { #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); dma_sync_size = min(dma_sync_size, pool->p.max_len); __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset, dma_sync_size, pool->p.dma_dir); #endif } static __always_inline void page_pool_dma_sync_for_device(const struct page_pool *pool, netmem_ref netmem, u32 dma_sync_size) { if (pool->dma_sync && dma_dev_need_sync(pool->p.dev)) { rcu_read_lock(); /* re-check under rcu_read_lock() to sync with page_pool_scrub() */ if (pool->dma_sync) __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); rcu_read_unlock(); } } static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t gfp) { dma_addr_t dma; int err; u32 id; /* Setup DMA mapping: use 'struct page' area for storing DMA-addr * since dma_addr_t can be either 32 or 64 bits and does not always fit * into page private data (i.e 32bit cpu with 64bit DMA caps) * This mapping is kept for lifetime of page, until leaving pool. */ dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, (PAGE_SIZE << pool->p.order), pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(pool->p.dev, dma)) return false; if (page_pool_set_dma_addr_netmem(netmem, dma)) { WARN_ONCE(1, "unexpected DMA address, please report to netdev@"); goto unmap_failed; } if (in_softirq()) err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem), PP_DMA_INDEX_LIMIT, gfp); else err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem), PP_DMA_INDEX_LIMIT, gfp); if (err) { WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@"); goto unset_failed; } netmem_set_dma_index(netmem, id); page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len); return true; unset_failed: page_pool_set_dma_addr_netmem(netmem, 0); unmap_failed: dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); return false; } static struct page *__page_pool_alloc_page_order(struct page_pool *pool, gfp_t gfp) { struct page *page; gfp |= __GFP_COMP; page = alloc_pages_node(pool->p.nid, gfp, pool->p.order); if (unlikely(!page)) return NULL; if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page), gfp))) { put_page(page); return NULL; } alloc_stat_inc(pool, slow_high_order); page_pool_set_pp_info(pool, page_to_netmem(page)); /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; trace_page_pool_state_hold(pool, page_to_netmem(page), pool->pages_state_hold_cnt); return page; } /* slow path */ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool, gfp_t gfp) { const int bulk = PP_ALLOC_CACHE_REFILL; unsigned int pp_order = pool->p.order; bool dma_map = pool->dma_map; netmem_ref netmem; int i, nr_pages; /* Don't support bulk alloc for high-order pages */ if (unlikely(pp_order)) return page_to_netmem(__page_pool_alloc_page_order(pool, gfp)); /* Unnecessary as alloc cache is empty, but guarantees zero count */ if (unlikely(pool->alloc.count > 0)) return pool->alloc.cache[--pool->alloc.count]; /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk */ memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); nr_pages = alloc_pages_bulk_node(gfp, pool->p.nid, bulk, (struct page **)pool->alloc.cache); if (unlikely(!nr_pages)) return 0; /* Pages have been filled into alloc.cache array, but count is zero and * page element have not been (possibly) DMA mapped. */ for (i = 0; i < nr_pages; i++) { netmem = pool->alloc.cache[i]; if (dma_map && unlikely(!page_pool_dma_map(pool, netmem, gfp))) { put_page(netmem_to_page(netmem)); continue; } page_pool_set_pp_info(pool, netmem); pool->alloc.cache[pool->alloc.count++] = netmem; /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); } /* Return last page */ if (likely(pool->alloc.count > 0)) { netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, slow); } else { netmem = 0; } /* When page just alloc'ed is should/must have refcnt 1. */ return netmem; } /* For using page_pool replace: alloc_pages() API calls, but provide * synchronization guarantee for allocation side. */ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp) { netmem_ref netmem; /* Fast-path: Get a page from cache */ netmem = __page_pool_get_cached(pool); if (netmem) return netmem; /* Slow-path: cache empty, do real allocation */ if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) netmem = pool->mp_ops->alloc_netmems(pool, gfp); else netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; } EXPORT_SYMBOL(page_pool_alloc_netmems); ALLOW_ERROR_INJECTION(page_pool_alloc_netmems, NULL); struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) { return netmem_to_page(page_pool_alloc_netmems(pool, gfp)); } EXPORT_SYMBOL(page_pool_alloc_pages); /* Calculate distance between two u32 values, valid if distance is below 2^(31) * https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution */ #define _distance(a, b) (s32)((a) - (b)) s32 page_pool_inflight(const struct page_pool *pool, bool strict) { u32 release_cnt = atomic_read(&pool->pages_state_release_cnt); u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt); s32 inflight; inflight = _distance(hold_cnt, release_cnt); if (strict) { trace_page_pool_release(pool, inflight, hold_cnt, release_cnt); WARN(inflight < 0, "Negative(%d) inflight packet-pages", inflight); } else { inflight = max(0, inflight); } return inflight; } void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { netmem_set_pp(netmem, pool); netmem_or_pp_magic(netmem, PP_SIGNATURE); /* Ensuring all pages have been split into one fragment initially: * page_pool_set_pp_info() is only called once for every page when it * is allocated from the page allocator and page_pool_fragment_page() * is dirtying the same cache line as the page->pp_magic above, so * the overhead is negligible. */ page_pool_fragment_netmem(netmem, 1); if (pool->has_init_callback) pool->slow.init_callback(netmem, pool->slow.init_arg); } void page_pool_clear_pp_info(netmem_ref netmem) { netmem_clear_pp_magic(netmem); netmem_set_pp(netmem, NULL); } static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, netmem_ref netmem) { struct page *old, *page = netmem_to_page(netmem); unsigned long id; dma_addr_t dma; if (!pool->dma_map) /* Always account for inflight pages, even if we didn't * map them */ return; id = netmem_get_dma_index(netmem); if (!id) return; if (in_softirq()) old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0); else old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0); if (old != page) return; dma = page_pool_get_dma_addr_netmem(netmem); /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); page_pool_set_dma_addr_netmem(netmem, 0); netmem_set_dma_index(netmem, 0); } /* Disconnects a page (from a page_pool). API users can have a need * to disconnect a page (from a page_pool), to allow it to be used as * a regular page (that will eventually be returned to the normal * page-allocator via put_page). */ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; bool put; put = true; if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) put = pool->mp_ops->release_netmem(pool, netmem); else __page_pool_release_page_dma(pool, netmem); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. */ count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); trace_page_pool_state_release(pool, netmem, count); if (put) { page_pool_clear_pp_info(netmem); put_page(netmem_to_page(netmem)); } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). */ } static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) { bool in_softirq, ret; /* BH protection not needed if current is softirq */ in_softirq = page_pool_producer_lock(pool); ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem); if (ret) recycle_stat_inc(pool, ring); page_pool_producer_unlock(pool, in_softirq); return ret; } /* Only allow direct recycling in special circumstances, into the * alloc side cache. E.g. during RX-NAPI processing for XDP_DROP use-case. * * Caller must provide appropriate safe context. */ static bool page_pool_recycle_in_cache(netmem_ref netmem, struct page_pool *pool) { if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) { recycle_stat_inc(pool, cache_full); return false; } /* Caller MUST have verified/know (page_ref_count(page) == 1) */ pool->alloc.cache[pool->alloc.count++] = netmem; recycle_stat_inc(pool, cached); return true; } static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { return netmem_is_net_iov(netmem) || (page_ref_count(netmem_to_page(netmem)) == 1 && !page_is_pfmemalloc(netmem_to_page(netmem))); } /* If the page refcnt == 1, this will try to recycle the page. * If pool->dma_sync is set, we'll try to sync the DMA area for * the configured size min(dma_sync_size, pool->max_len). * If the page refcnt != 1, then the page will be returned to memory * subsystem. */ static __always_inline netmem_ref __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, unsigned int dma_sync_size, bool allow_direct) { lockdep_assert_no_hardirq(); /* This allocator is optimized for the XDP mode that uses * one-frame-per-page, but have fallbacks that act like the * regular page allocator APIs. * * refcnt == 1 means page_pool owns page, and can recycle it. * * page is NOT reusable when allocated when system is under * some pressure. (page_is_pfmemalloc) */ if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); if (allow_direct && page_pool_recycle_in_cache(netmem, pool)) return 0; /* Page found as candidate for recycling */ return netmem; } /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some * want to keep doing this to save memory and do refcnt based * recycling. Support this use case too, to ease drivers * switching between XDP/non-XDP. * * In-case page_pool maintains the DMA mapping, API user must * call page_pool_put_page once. In this elevated refcnt * case, the DMA is unmapped/released, as driver is likely * doing refcnt based recycle tricks, meaning another process * will be invoking put_page. */ recycle_stat_inc(pool, released_refcnt); page_pool_return_page(pool, netmem); return 0; } static bool page_pool_napi_local(const struct page_pool *pool) { const struct napi_struct *napi; u32 cpuid; /* On PREEMPT_RT the softirq can be preempted by the consumer */ if (IS_ENABLED(CONFIG_PREEMPT_RT)) return false; if (unlikely(!in_softirq())) return false; /* Allow direct recycle if we have reasons to believe that we are * in the same context as the consumer would run, so there's * no possible race. * __page_pool_put_page() makes sure we're not in hardirq context * and interrupts are enabled prior to accessing the cache. */ cpuid = smp_processor_id(); if (READ_ONCE(pool->cpuid) == cpuid) return true; napi = READ_ONCE(pool->p.napi); return napi && READ_ONCE(napi->list_owner) == cpuid; } void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, unsigned int dma_sync_size, bool allow_direct) { if (!allow_direct) allow_direct = page_pool_napi_local(pool); netmem = __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct); if (netmem && !page_pool_recycle_in_ring(pool, netmem)) { /* Cache full, fallback to free pages */ recycle_stat_inc(pool, ring_full); page_pool_return_page(pool, netmem); } } EXPORT_SYMBOL(page_pool_put_unrefed_netmem); void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, unsigned int dma_sync_size, bool allow_direct) { page_pool_put_unrefed_netmem(pool, page_to_netmem(page), dma_sync_size, allow_direct); } EXPORT_SYMBOL(page_pool_put_unrefed_page); static void page_pool_recycle_ring_bulk(struct page_pool *pool, netmem_ref *bulk, u32 bulk_len) { bool in_softirq; u32 i; /* Bulk produce into ptr_ring page_pool cache */ in_softirq = page_pool_producer_lock(pool); for (i = 0; i < bulk_len; i++) { if (__ptr_ring_produce(&pool->ring, (__force void *)bulk[i])) { /* ring full */ recycle_stat_inc(pool, ring_full); break; } } page_pool_producer_unlock(pool, in_softirq); recycle_stat_add(pool, ring, i); /* Hopefully all pages were returned into ptr_ring */ if (likely(i == bulk_len)) return; /* * ptr_ring cache is full, free remaining pages outside producer lock * since put_page() with refcnt == 1 can be an expensive operation. */ for (; i < bulk_len; i++) page_pool_return_page(pool, bulk[i]); } /** * page_pool_put_netmem_bulk() - release references on multiple netmems * @data: array holding netmem references * @count: number of entries in @data * * Tries to refill a number of netmems into the ptr_ring cache holding ptr_ring * producer lock. If the ptr_ring is full, page_pool_put_netmem_bulk() * will release leftover netmems to the memory provider. * page_pool_put_netmem_bulk() is suitable to be run inside the driver NAPI tx * completion loop for the XDP_REDIRECT use case. * * Please note the caller must not use data area after running * page_pool_put_netmem_bulk(), as this function overwrites it. */ void page_pool_put_netmem_bulk(netmem_ref *data, u32 count) { u32 bulk_len = 0; for (u32 i = 0; i < count; i++) { netmem_ref netmem = netmem_compound_head(data[i]); if (page_pool_unref_and_test(netmem)) data[bulk_len++] = netmem; } count = bulk_len; while (count) { netmem_ref bulk[XDP_BULK_QUEUE_SIZE]; struct page_pool *pool = NULL; bool allow_direct; u32 foreign = 0; bulk_len = 0; for (u32 i = 0; i < count; i++) { struct page_pool *netmem_pp; netmem_ref netmem = data[i]; netmem_pp = netmem_get_pp(netmem); if (unlikely(!pool)) { pool = netmem_pp; allow_direct = page_pool_napi_local(pool); } else if (netmem_pp != pool) { /* * If the netmem belongs to a different * page_pool, save it for another round. */ data[foreign++] = netmem; continue; } netmem = __page_pool_put_page(pool, netmem, -1, allow_direct); /* Approved for bulk recycling in ptr_ring cache */ if (netmem) bulk[bulk_len++] = netmem; } if (bulk_len) page_pool_recycle_ring_bulk(pool, bulk, bulk_len); count = foreign; } } EXPORT_SYMBOL(page_pool_put_netmem_bulk); static netmem_ref page_pool_drain_frag(struct page_pool *pool, netmem_ref netmem) { long drain_count = BIAS_MAX - pool->frag_users; /* Some user is still using the page frag */ if (likely(page_pool_unref_netmem(netmem, drain_count))) return 0; if (__page_pool_page_can_be_recycled(netmem)) { page_pool_dma_sync_for_device(pool, netmem, -1); return netmem; } page_pool_return_page(pool, netmem); return 0; } static void page_pool_free_frag(struct page_pool *pool) { long drain_count = BIAS_MAX - pool->frag_users; netmem_ref netmem = pool->frag_page; pool->frag_page = 0; if (!netmem || page_pool_unref_netmem(netmem, drain_count)) return; page_pool_return_page(pool, netmem); } netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp) { unsigned int max_size = PAGE_SIZE << pool->p.order; netmem_ref netmem = pool->frag_page; if (WARN_ON(size > max_size)) return 0; size = ALIGN(size, dma_get_cache_alignment()); *offset = pool->frag_offset; if (netmem && *offset + size > max_size) { netmem = page_pool_drain_frag(pool, netmem); if (netmem) { recycle_stat_inc(pool, cached); alloc_stat_inc(pool, fast); goto frag_reset; } } if (!netmem) { netmem = page_pool_alloc_netmems(pool, gfp); if (unlikely(!netmem)) { pool->frag_page = 0; return 0; } pool->frag_page = netmem; frag_reset: pool->frag_users = 1; *offset = 0; pool->frag_offset = size; page_pool_fragment_netmem(netmem, BIAS_MAX); return netmem; } pool->frag_users++; pool->frag_offset = *offset + size; return netmem; } EXPORT_SYMBOL(page_pool_alloc_frag_netmem); struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp) { return netmem_to_page(page_pool_alloc_frag_netmem(pool, offset, size, gfp)); } EXPORT_SYMBOL(page_pool_alloc_frag); static void page_pool_empty_ring(struct page_pool *pool) { netmem_ref netmem; /* Empty recycle ring */ while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ if (!(netmem_ref_count(netmem) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", __func__, netmem_ref_count(netmem)); page_pool_return_page(pool, netmem); } } static void __page_pool_destroy(struct page_pool *pool) { if (pool->disconnect) pool->disconnect(pool); page_pool_unlist(pool); page_pool_uninit(pool); if (pool->mp_ops) { pool->mp_ops->destroy(pool); static_branch_dec(&page_pool_mem_providers); } kfree(pool); } static void page_pool_empty_alloc_cache_once(struct page_pool *pool) { netmem_ref netmem; if (pool->destroy_cnt) return; /* Empty alloc cache, assume caller made sure this is * no-longer in use, and page_pool_alloc_pages() cannot be * call concurrently. */ while (pool->alloc.count) { netmem = pool->alloc.cache[--pool->alloc.count]; page_pool_return_page(pool, netmem); } } static void page_pool_scrub(struct page_pool *pool) { unsigned long id; void *ptr; page_pool_empty_alloc_cache_once(pool); if (!pool->destroy_cnt++ && pool->dma_map) { if (pool->dma_sync) { /* Disable page_pool_dma_sync_for_device() */ pool->dma_sync = false; /* Make sure all concurrent returns that may see the old * value of dma_sync (and thus perform a sync) have * finished before doing the unmapping below. Skip the * wait if the device doesn't actually need syncing, or * if there are no outstanding mapped pages. */ if (dma_dev_need_sync(pool->p.dev) && !xa_empty(&pool->dma_mapped)) synchronize_net(); } xa_for_each(&pool->dma_mapped, id, ptr) __page_pool_release_page_dma(pool, page_to_netmem(ptr)); } /* No more consumers should exist, but producers could still * be in-flight. */ page_pool_empty_ring(pool); } static int page_pool_release(struct page_pool *pool) { bool in_softirq; int inflight; page_pool_scrub(pool); inflight = page_pool_inflight(pool, true); /* Acquire producer lock to make sure producers have exited. */ in_softirq = page_pool_producer_lock(pool); page_pool_producer_unlock(pool, in_softirq); if (!inflight) __page_pool_destroy(pool); return inflight; } static void page_pool_release_retry(struct work_struct *wq) { struct delayed_work *dwq = to_delayed_work(wq); struct page_pool *pool = container_of(dwq, typeof(*pool), release_dw); void *netdev; int inflight; inflight = page_pool_release(pool); /* In rare cases, a driver bug may cause inflight to go negative. * Don't reschedule release if inflight is 0 or negative. * - If 0, the page_pool has been destroyed * - if negative, we will never recover * in both cases no reschedule is necessary. */ if (inflight <= 0) return; /* Periodic warning for page pools the user can't see */ netdev = READ_ONCE(pool->slow.netdev); if (time_after_eq(jiffies, pool->defer_warn) && (!netdev || netdev == NET_PTR_POISON)) { int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ; pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n", __func__, pool->user.id, inflight, sec); pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; } /* Still not ready to be disconnected, retry later */ schedule_delayed_work(&pool->release_dw, DEFER_TIME); } void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), const struct xdp_mem_info *mem) { refcount_inc(&pool->user_cnt); pool->disconnect = disconnect; pool->xdp_mem_id = mem->id; } void page_pool_disable_direct_recycling(struct page_pool *pool) { /* Disable direct recycling based on pool->cpuid. * Paired with READ_ONCE() in page_pool_napi_local(). */ WRITE_ONCE(pool->cpuid, -1); if (!pool->p.napi) return; napi_assert_will_not_race(pool->p.napi); mutex_lock(&page_pools_lock); WRITE_ONCE(pool->p.napi, NULL); mutex_unlock(&page_pools_lock); } EXPORT_SYMBOL(page_pool_disable_direct_recycling); void page_pool_destroy(struct page_pool *pool) { if (!pool) return; if (!page_pool_put(pool)) return; page_pool_disable_direct_recycling(pool); page_pool_free_frag(pool); if (!page_pool_release(pool)) return; page_pool_detached(pool); pool->defer_start = jiffies; pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; INIT_DELAYED_WORK(&pool->release_dw, page_pool_release_retry); schedule_delayed_work(&pool->release_dw, DEFER_TIME); } EXPORT_SYMBOL(page_pool_destroy); /* Caller must provide appropriate safe context, e.g. NAPI. */ void page_pool_update_nid(struct page_pool *pool, int new_nid) { netmem_ref netmem; trace_page_pool_update_nid(pool, new_nid); pool->p.nid = new_nid; /* Flush pool alloc cache, as refill will check NUMA node */ while (pool->alloc.count) { netmem = pool->alloc.cache[--pool->alloc.count]; page_pool_return_page(pool, netmem); } } EXPORT_SYMBOL(page_pool_update_nid); bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr) { return page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), addr); } /* Associate a niov with a page pool. Should follow with a matching * net_mp_niov_clear_page_pool() */ void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov) { netmem_ref netmem = net_iov_to_netmem(niov); page_pool_set_pp_info(pool, netmem); pool->pages_state_hold_cnt++; trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); } /* Disassociate a niov from a page pool. Should only be used in the * ->release_netmem() path. */ void net_mp_niov_clear_page_pool(struct net_iov *niov) { netmem_ref netmem = net_iov_to_netmem(niov); page_pool_clear_pp_info(netmem); } |
| 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 | // SPDX-License-Identifier: GPL-2.0 /* * Driver for Meywa-Denki & KAYAC YUREX * * Copyright (C) 2010 Tomoki Sekiyama (tomoki.sekiyama@gmail.com) */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/slab.h> #include <linux/module.h> #include <linux/kref.h> #include <linux/mutex.h> #include <linux/uaccess.h> #include <linux/usb.h> #include <linux/hid.h> #define DRIVER_AUTHOR "Tomoki Sekiyama" #define DRIVER_DESC "Driver for Meywa-Denki & KAYAC YUREX" #define YUREX_VENDOR_ID 0x0c45 #define YUREX_PRODUCT_ID 0x1010 #define CMD_ACK '!' #define CMD_ANIMATE 'A' #define CMD_COUNT 'C' #define CMD_LED 'L' #define CMD_READ 'R' #define CMD_SET 'S' #define CMD_VERSION 'V' #define CMD_EOF 0x0d #define CMD_PADDING 0xff #define YUREX_BUF_SIZE 8 #define YUREX_WRITE_TIMEOUT (HZ*2) /* table of devices that work with this driver */ static struct usb_device_id yurex_table[] = { { USB_DEVICE(YUREX_VENDOR_ID, YUREX_PRODUCT_ID) }, { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, yurex_table); #ifdef CONFIG_USB_DYNAMIC_MINORS #define YUREX_MINOR_BASE 0 #else #define YUREX_MINOR_BASE 192 #endif /* Structure to hold all of our device specific stuff */ struct usb_yurex { struct usb_device *udev; struct usb_interface *interface; __u8 int_in_endpointAddr; struct urb *urb; /* URB for interrupt in */ unsigned char *int_buffer; /* buffer for intterupt in */ struct urb *cntl_urb; /* URB for control msg */ struct usb_ctrlrequest *cntl_req; /* req for control msg */ unsigned char *cntl_buffer; /* buffer for control msg */ struct kref kref; struct mutex io_mutex; unsigned long disconnected:1; struct fasync_struct *async_queue; wait_queue_head_t waitq; spinlock_t lock; __s64 bbu; /* BBU from device */ }; #define to_yurex_dev(d) container_of(d, struct usb_yurex, kref) static struct usb_driver yurex_driver; static const struct file_operations yurex_fops; static void yurex_control_callback(struct urb *urb) { struct usb_yurex *dev = urb->context; int status = urb->status; if (status) { dev_err(&urb->dev->dev, "%s - control failed: %d\n", __func__, status); wake_up_interruptible(&dev->waitq); return; } /* on success, sender woken up by CMD_ACK int in, or timeout */ } static void yurex_delete(struct kref *kref) { struct usb_yurex *dev = to_yurex_dev(kref); dev_dbg(&dev->interface->dev, "%s\n", __func__); if (dev->cntl_urb) { usb_kill_urb(dev->cntl_urb); kfree(dev->cntl_req); usb_free_coherent(dev->udev, YUREX_BUF_SIZE, dev->cntl_buffer, dev->cntl_urb->transfer_dma); usb_free_urb(dev->cntl_urb); } if (dev->urb) { usb_kill_urb(dev->urb); usb_free_coherent(dev->udev, YUREX_BUF_SIZE, dev->int_buffer, dev->urb->transfer_dma); usb_free_urb(dev->urb); } usb_put_intf(dev->interface); usb_put_dev(dev->udev); kfree(dev); } /* * usb class driver info in order to get a minor number from the usb core, * and to have the device registered with the driver core */ static struct usb_class_driver yurex_class = { .name = "yurex%d", .fops = &yurex_fops, .minor_base = YUREX_MINOR_BASE, }; static void yurex_interrupt(struct urb *urb) { struct usb_yurex *dev = urb->context; unsigned char *buf = dev->int_buffer; int status = urb->status; unsigned long flags; int retval, i; switch (status) { case 0: /*success*/ break; /* The device is terminated or messed up, give up */ case -EOVERFLOW: dev_err(&dev->interface->dev, "%s - overflow with length %d, actual length is %d\n", __func__, YUREX_BUF_SIZE, dev->urb->actual_length); return; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: case -EILSEQ: case -EPROTO: case -ETIME: return; default: dev_err(&dev->interface->dev, "%s - unknown status received: %d\n", __func__, status); return; } /* handle received message */ switch (buf[0]) { case CMD_COUNT: case CMD_READ: if (buf[6] == CMD_EOF) { spin_lock_irqsave(&dev->lock, flags); dev->bbu = 0; for (i = 1; i < 6; i++) { dev->bbu += buf[i]; if (i != 5) dev->bbu <<= 8; } dev_dbg(&dev->interface->dev, "%s count: %lld\n", __func__, dev->bbu); spin_unlock_irqrestore(&dev->lock, flags); kill_fasync(&dev->async_queue, SIGIO, POLL_IN); } else dev_dbg(&dev->interface->dev, "data format error - no EOF\n"); break; case CMD_ACK: dev_dbg(&dev->interface->dev, "%s ack: %c\n", __func__, buf[1]); wake_up_interruptible(&dev->waitq); break; } retval = usb_submit_urb(dev->urb, GFP_ATOMIC); if (retval) { dev_err(&dev->interface->dev, "%s - usb_submit_urb failed: %d\n", __func__, retval); } } static int yurex_probe(struct usb_interface *interface, const struct usb_device_id *id) { struct usb_yurex *dev; struct usb_host_interface *iface_desc; struct usb_endpoint_descriptor *endpoint; int retval = -ENOMEM; DEFINE_WAIT(wait); int res; /* allocate memory for our device state and initialize it */ dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) goto error; kref_init(&dev->kref); mutex_init(&dev->io_mutex); spin_lock_init(&dev->lock); init_waitqueue_head(&dev->waitq); dev->udev = usb_get_dev(interface_to_usbdev(interface)); dev->interface = usb_get_intf(interface); /* set up the endpoint information */ iface_desc = interface->cur_altsetting; res = usb_find_int_in_endpoint(iface_desc, &endpoint); if (res) { dev_err(&interface->dev, "Could not find endpoints\n"); retval = res; goto error; } dev->int_in_endpointAddr = endpoint->bEndpointAddress; /* allocate control URB */ dev->cntl_urb = usb_alloc_urb(0, GFP_KERNEL); if (!dev->cntl_urb) goto error; /* allocate buffer for control req */ dev->cntl_req = kmalloc(YUREX_BUF_SIZE, GFP_KERNEL); if (!dev->cntl_req) goto error; /* allocate buffer for control msg */ dev->cntl_buffer = usb_alloc_coherent(dev->udev, YUREX_BUF_SIZE, GFP_KERNEL, &dev->cntl_urb->transfer_dma); if (!dev->cntl_buffer) { dev_err(&interface->dev, "Could not allocate cntl_buffer\n"); goto error; } /* configure control URB */ dev->cntl_req->bRequestType = USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_INTERFACE; dev->cntl_req->bRequest = HID_REQ_SET_REPORT; dev->cntl_req->wValue = cpu_to_le16((HID_OUTPUT_REPORT + 1) << 8); dev->cntl_req->wIndex = cpu_to_le16(iface_desc->desc.bInterfaceNumber); dev->cntl_req->wLength = cpu_to_le16(YUREX_BUF_SIZE); usb_fill_control_urb(dev->cntl_urb, dev->udev, usb_sndctrlpipe(dev->udev, 0), (void *)dev->cntl_req, dev->cntl_buffer, YUREX_BUF_SIZE, yurex_control_callback, dev); dev->cntl_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; /* allocate interrupt URB */ dev->urb = usb_alloc_urb(0, GFP_KERNEL); if (!dev->urb) goto error; /* allocate buffer for interrupt in */ dev->int_buffer = usb_alloc_coherent(dev->udev, YUREX_BUF_SIZE, GFP_KERNEL, &dev->urb->transfer_dma); if (!dev->int_buffer) { dev_err(&interface->dev, "Could not allocate int_buffer\n"); goto error; } /* configure interrupt URB */ usb_fill_int_urb(dev->urb, dev->udev, usb_rcvintpipe(dev->udev, dev->int_in_endpointAddr), dev->int_buffer, YUREX_BUF_SIZE, yurex_interrupt, dev, 1); dev->urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; if (usb_submit_urb(dev->urb, GFP_KERNEL)) { retval = -EIO; dev_err(&interface->dev, "Could not submitting URB\n"); goto error; } /* save our data pointer in this interface device */ usb_set_intfdata(interface, dev); dev->bbu = -1; /* we can register the device now, as it is ready */ retval = usb_register_dev(interface, &yurex_class); if (retval) { dev_err(&interface->dev, "Not able to get a minor for this device.\n"); usb_set_intfdata(interface, NULL); goto error; } dev_info(&interface->dev, "USB YUREX device now attached to Yurex #%d\n", interface->minor); return 0; error: if (dev) /* this frees allocated memory */ kref_put(&dev->kref, yurex_delete); return retval; } static void yurex_disconnect(struct usb_interface *interface) { struct usb_yurex *dev; int minor = interface->minor; dev = usb_get_intfdata(interface); usb_set_intfdata(interface, NULL); /* give back our minor */ usb_deregister_dev(interface, &yurex_class); /* prevent more I/O from starting */ usb_poison_urb(dev->urb); usb_poison_urb(dev->cntl_urb); mutex_lock(&dev->io_mutex); dev->disconnected = 1; mutex_unlock(&dev->io_mutex); /* wakeup waiters */ kill_fasync(&dev->async_queue, SIGIO, POLL_IN); wake_up_interruptible(&dev->waitq); /* decrement our usage count */ kref_put(&dev->kref, yurex_delete); dev_info(&interface->dev, "USB YUREX #%d now disconnected\n", minor); } static struct usb_driver yurex_driver = { .name = "yurex", .probe = yurex_probe, .disconnect = yurex_disconnect, .id_table = yurex_table, }; static int yurex_fasync(int fd, struct file *file, int on) { struct usb_yurex *dev; dev = file->private_data; return fasync_helper(fd, file, on, &dev->async_queue); } static int yurex_open(struct inode *inode, struct file *file) { struct usb_yurex *dev; struct usb_interface *interface; int subminor; int retval = 0; subminor = iminor(inode); interface = usb_find_interface(&yurex_driver, subminor); if (!interface) { printk(KERN_ERR "%s - error, can't find device for minor %d", __func__, subminor); retval = -ENODEV; goto exit; } dev = usb_get_intfdata(interface); if (!dev) { retval = -ENODEV; goto exit; } /* increment our usage count for the device */ kref_get(&dev->kref); /* save our object in the file's private structure */ mutex_lock(&dev->io_mutex); file->private_data = dev; mutex_unlock(&dev->io_mutex); exit: return retval; } static int yurex_release(struct inode *inode, struct file *file) { struct usb_yurex *dev; dev = file->private_data; if (dev == NULL) return -ENODEV; /* decrement the count on our device */ kref_put(&dev->kref, yurex_delete); return 0; } static ssize_t yurex_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) { struct usb_yurex *dev; int len; char in_buffer[20]; unsigned long flags; dev = file->private_data; mutex_lock(&dev->io_mutex); if (dev->disconnected) { /* already disconnected */ mutex_unlock(&dev->io_mutex); return -ENODEV; } spin_lock_irqsave(&dev->lock, flags); len = snprintf(in_buffer, 20, "%lld\n", dev->bbu); spin_unlock_irqrestore(&dev->lock, flags); mutex_unlock(&dev->io_mutex); if (WARN_ON_ONCE(len >= sizeof(in_buffer))) return -EIO; return simple_read_from_buffer(buffer, count, ppos, in_buffer, len); } static ssize_t yurex_write(struct file *file, const char __user *user_buffer, size_t count, loff_t *ppos) { struct usb_yurex *dev; int i, set = 0, retval = 0; char buffer[16 + 1]; char *data = buffer; unsigned long long c, c2 = 0; signed long timeout = 0; DEFINE_WAIT(wait); count = min(sizeof(buffer) - 1, count); dev = file->private_data; /* verify that we actually have some data to write */ if (count == 0) goto error; retval = mutex_lock_interruptible(&dev->io_mutex); if (retval < 0) return -EINTR; if (dev->disconnected) { /* already disconnected */ mutex_unlock(&dev->io_mutex); retval = -ENODEV; goto error; } if (copy_from_user(buffer, user_buffer, count)) { mutex_unlock(&dev->io_mutex); retval = -EFAULT; goto error; } buffer[count] = 0; memset(dev->cntl_buffer, CMD_PADDING, YUREX_BUF_SIZE); switch (buffer[0]) { case CMD_ANIMATE: case CMD_LED: dev->cntl_buffer[0] = buffer[0]; dev->cntl_buffer[1] = buffer[1]; dev->cntl_buffer[2] = CMD_EOF; break; case CMD_READ: case CMD_VERSION: dev->cntl_buffer[0] = buffer[0]; dev->cntl_buffer[1] = 0x00; dev->cntl_buffer[2] = CMD_EOF; break; case CMD_SET: data++; fallthrough; case '0' ... '9': set = 1; c = c2 = simple_strtoull(data, NULL, 0); dev->cntl_buffer[0] = CMD_SET; for (i = 1; i < 6; i++) { dev->cntl_buffer[i] = (c>>32) & 0xff; c <<= 8; } buffer[6] = CMD_EOF; break; default: mutex_unlock(&dev->io_mutex); return -EINVAL; } /* send the data as the control msg */ prepare_to_wait(&dev->waitq, &wait, TASK_INTERRUPTIBLE); dev_dbg(&dev->interface->dev, "%s - submit %c\n", __func__, dev->cntl_buffer[0]); retval = usb_submit_urb(dev->cntl_urb, GFP_ATOMIC); if (retval >= 0) timeout = schedule_timeout(YUREX_WRITE_TIMEOUT); finish_wait(&dev->waitq, &wait); /* make sure URB is idle after timeout or (spurious) CMD_ACK */ usb_kill_urb(dev->cntl_urb); mutex_unlock(&dev->io_mutex); if (retval < 0) { dev_err(&dev->interface->dev, "%s - failed to send bulk msg, error %d\n", __func__, retval); goto error; } if (set && timeout) { spin_lock_irq(&dev->lock); dev->bbu = c2; spin_unlock_irq(&dev->lock); } return timeout ? count : -EIO; error: return retval; } static const struct file_operations yurex_fops = { .owner = THIS_MODULE, .read = yurex_read, .write = yurex_write, .open = yurex_open, .release = yurex_release, .fasync = yurex_fasync, .llseek = default_llseek, }; module_usb_driver(yurex_driver); MODULE_DESCRIPTION("USB YUREX driver support"); MODULE_LICENSE("GPL"); |
| 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 | // SPDX-License-Identifier: GPL-2.0-only /* * net/sched/sch_mqprio.c * * Copyright (c) 2010 John Fastabend <john.r.fastabend@intel.com> */ #include <linux/ethtool_netlink.h> #include <linux/types.h> #include <linux/slab.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/module.h> #include <net/netlink.h> #include <net/pkt_sched.h> #include <net/sch_generic.h> #include <net/pkt_cls.h> #include "sch_mqprio_lib.h" struct mqprio_sched { struct Qdisc **qdiscs; u16 mode; u16 shaper; int hw_offload; u32 flags; u64 min_rate[TC_QOPT_MAX_QUEUE]; u64 max_rate[TC_QOPT_MAX_QUEUE]; u32 fp[TC_QOPT_MAX_QUEUE]; }; static int mqprio_enable_offload(struct Qdisc *sch, const struct tc_mqprio_qopt *qopt, struct netlink_ext_ack *extack) { struct mqprio_sched *priv = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); struct tc_mqprio_qopt_offload mqprio = { .qopt = *qopt, .extack = extack, }; int err, i; switch (priv->mode) { case TC_MQPRIO_MODE_DCB: if (priv->shaper != TC_MQPRIO_SHAPER_DCB) return -EINVAL; break; case TC_MQPRIO_MODE_CHANNEL: mqprio.flags = priv->flags; if (priv->flags & TC_MQPRIO_F_MODE) mqprio.mode = priv->mode; if (priv->flags & TC_MQPRIO_F_SHAPER) mqprio.shaper = priv->shaper; if (priv->flags & TC_MQPRIO_F_MIN_RATE) for (i = 0; i < mqprio.qopt.num_tc; i++) mqprio.min_rate[i] = priv->min_rate[i]; if (priv->flags & TC_MQPRIO_F_MAX_RATE) for (i = 0; i < mqprio.qopt.num_tc; i++) mqprio.max_rate[i] = priv->max_rate[i]; break; default: return -EINVAL; } mqprio_fp_to_offload(priv->fp, &mqprio); err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO, &mqprio); if (err) return err; priv->hw_offload = mqprio.qopt.hw; return 0; } static void mqprio_disable_offload(struct Qdisc *sch) { struct tc_mqprio_qopt_offload mqprio = { { 0 } }; struct mqprio_sched *priv = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); switch (priv->mode) { case TC_MQPRIO_MODE_DCB: case TC_MQPRIO_MODE_CHANNEL: dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO, &mqprio); break; } } static void mqprio_destroy(struct Qdisc *sch) { struct net_device *dev = qdisc_dev(sch); struct mqprio_sched *priv = qdisc_priv(sch); unsigned int ntx; if (priv->qdiscs) { for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++) qdisc_put(priv->qdiscs[ntx]); kfree(priv->qdiscs); } if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) mqprio_disable_offload(sch); else netdev_set_num_tc(dev, 0); } static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, const struct tc_mqprio_caps *caps, struct netlink_ext_ack *extack) { int err; /* Limit qopt->hw to maximum supported offload value. Drivers have * the option of overriding this later if they don't support the a * given offload type. */ if (qopt->hw > TC_MQPRIO_HW_OFFLOAD_MAX) qopt->hw = TC_MQPRIO_HW_OFFLOAD_MAX; /* If hardware offload is requested, we will leave 3 options to the * device driver: * - populate the queue counts itself (and ignore what was requested) * - validate the provided queue counts by itself (and apply them) * - request queue count validation here (and apply them) */ err = mqprio_validate_qopt(dev, qopt, !qopt->hw || caps->validate_queue_counts, false, extack); if (err) return err; /* If ndo_setup_tc is not present then hardware doesn't support offload * and we should return an error. */ if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) { NL_SET_ERR_MSG(extack, "Device does not support hardware offload"); return -EINVAL; } return 0; } static const struct nla_policy mqprio_tc_entry_policy[TCA_MQPRIO_TC_ENTRY_MAX + 1] = { [TCA_MQPRIO_TC_ENTRY_INDEX] = NLA_POLICY_MAX(NLA_U32, TC_QOPT_MAX_QUEUE), [TCA_MQPRIO_TC_ENTRY_FP] = NLA_POLICY_RANGE(NLA_U32, TC_FP_EXPRESS, TC_FP_PREEMPTIBLE), }; static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = { [TCA_MQPRIO_MODE] = { .len = sizeof(u16) }, [TCA_MQPRIO_SHAPER] = { .len = sizeof(u16) }, [TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED }, [TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED }, [TCA_MQPRIO_TC_ENTRY] = { .type = NLA_NESTED }, }; static int mqprio_parse_tc_entry(u32 fp[TC_QOPT_MAX_QUEUE], struct nlattr *opt, unsigned long *seen_tcs, struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_MQPRIO_TC_ENTRY_MAX + 1]; int err, tc; err = nla_parse_nested(tb, TCA_MQPRIO_TC_ENTRY_MAX, opt, mqprio_tc_entry_policy, extack); if (err < 0) return err; if (NL_REQ_ATTR_CHECK(extack, opt, tb, TCA_MQPRIO_TC_ENTRY_INDEX)) { NL_SET_ERR_MSG(extack, "TC entry index missing"); return -EINVAL; } tc = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_INDEX]); if (*seen_tcs & BIT(tc)) { NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_TC_ENTRY_INDEX], "Duplicate tc entry"); return -EINVAL; } *seen_tcs |= BIT(tc); if (tb[TCA_MQPRIO_TC_ENTRY_FP]) fp[tc] = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_FP]); return 0; } static int mqprio_parse_tc_entries(struct Qdisc *sch, struct nlattr *nlattr_opt, int nlattr_opt_len, struct netlink_ext_ack *extack) { struct mqprio_sched *priv = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); bool have_preemption = false; unsigned long seen_tcs = 0; u32 fp[TC_QOPT_MAX_QUEUE]; struct nlattr *n; int tc, rem; int err = 0; for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) fp[tc] = priv->fp[tc]; nla_for_each_attr_type(n, TCA_MQPRIO_TC_ENTRY, nlattr_opt, nlattr_opt_len, rem) { err = mqprio_parse_tc_entry(fp, n, &seen_tcs, extack); if (err) goto out; } for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { priv->fp[tc] = fp[tc]; if (fp[tc] == TC_FP_PREEMPTIBLE) have_preemption = true; } if (have_preemption && !ethtool_dev_mm_supported(dev)) { NL_SET_ERR_MSG(extack, "Device does not support preemption"); return -EOPNOTSUPP; } out: return err; } /* Parse the other netlink attributes that represent the payload of * TCA_OPTIONS, which are appended right after struct tc_mqprio_qopt. */ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt, struct nlattr *opt, struct netlink_ext_ack *extack) { struct nlattr *nlattr_opt = nla_data(opt) + NLA_ALIGN(sizeof(*qopt)); int nlattr_opt_len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); struct mqprio_sched *priv = qdisc_priv(sch); struct nlattr *tb[TCA_MQPRIO_MAX + 1] = {}; struct nlattr *attr; int i, rem, err; if (nlattr_opt_len >= nla_attr_size(0)) { err = nla_parse_deprecated(tb, TCA_MQPRIO_MAX, nlattr_opt, nlattr_opt_len, mqprio_policy, NULL); if (err < 0) return err; } if (!qopt->hw) { NL_SET_ERR_MSG(extack, "mqprio TCA_OPTIONS can only contain netlink attributes in hardware mode"); return -EINVAL; } if (tb[TCA_MQPRIO_MODE]) { priv->flags |= TC_MQPRIO_F_MODE; priv->mode = nla_get_u16(tb[TCA_MQPRIO_MODE]); } if (tb[TCA_MQPRIO_SHAPER]) { priv->flags |= TC_MQPRIO_F_SHAPER; priv->shaper = nla_get_u16(tb[TCA_MQPRIO_SHAPER]); } if (tb[TCA_MQPRIO_MIN_RATE64]) { if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) { NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MIN_RATE64], "min_rate accepted only when shaper is in bw_rlimit mode"); return -EINVAL; } i = 0; nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64], rem) { if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) { NL_SET_ERR_MSG_ATTR(extack, attr, "Attribute type expected to be TCA_MQPRIO_MIN_RATE64"); return -EINVAL; } if (nla_len(attr) != sizeof(u64)) { NL_SET_ERR_MSG_ATTR(extack, attr, "Attribute TCA_MQPRIO_MIN_RATE64 expected to have 8 bytes length"); return -EINVAL; } if (i >= qopt->num_tc) break; priv->min_rate[i] = nla_get_u64(attr); i++; } priv->flags |= TC_MQPRIO_F_MIN_RATE; } if (tb[TCA_MQPRIO_MAX_RATE64]) { if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) { NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MAX_RATE64], "max_rate accepted only when shaper is in bw_rlimit mode"); return -EINVAL; } i = 0; nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64], rem) { if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) { NL_SET_ERR_MSG_ATTR(extack, attr, "Attribute type expected to be TCA_MQPRIO_MAX_RATE64"); return -EINVAL; } if (nla_len(attr) != sizeof(u64)) { NL_SET_ERR_MSG_ATTR(extack, attr, "Attribute TCA_MQPRIO_MAX_RATE64 expected to have 8 bytes length"); return -EINVAL; } if (i >= qopt->num_tc) break; priv->max_rate[i] = nla_get_u64(attr); i++; } priv->flags |= TC_MQPRIO_F_MAX_RATE; } if (tb[TCA_MQPRIO_TC_ENTRY]) { err = mqprio_parse_tc_entries(sch, nlattr_opt, nlattr_opt_len, extack); if (err) return err; } return 0; } static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct net_device *dev = qdisc_dev(sch); struct mqprio_sched *priv = qdisc_priv(sch); struct netdev_queue *dev_queue; struct Qdisc *qdisc; int i, err = -EOPNOTSUPP; struct tc_mqprio_qopt *qopt = NULL; struct tc_mqprio_caps caps; int len, tc; BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE); BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK); if (sch->parent != TC_H_ROOT) return -EOPNOTSUPP; if (!netif_is_multiqueue(dev)) return -EOPNOTSUPP; /* make certain can allocate enough classids to handle queues */ if (dev->num_tx_queues >= TC_H_MIN_PRIORITY) return -ENOMEM; if (!opt || nla_len(opt) < sizeof(*qopt)) return -EINVAL; for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) priv->fp[tc] = TC_FP_EXPRESS; qdisc_offload_query_caps(dev, TC_SETUP_QDISC_MQPRIO, &caps, sizeof(caps)); qopt = nla_data(opt); if (mqprio_parse_opt(dev, qopt, &caps, extack)) return -EINVAL; len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); if (len > 0) { err = mqprio_parse_nlattr(sch, qopt, opt, extack); if (err) return err; } /* pre-allocate qdisc, attachment can't fail */ priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]), GFP_KERNEL); if (!priv->qdiscs) return -ENOMEM; for (i = 0; i < dev->num_tx_queues; i++) { dev_queue = netdev_get_tx_queue(dev, i); qdisc = qdisc_create_dflt(dev_queue, get_default_qdisc_ops(dev, i), TC_H_MAKE(TC_H_MAJ(sch->handle), TC_H_MIN(i + 1)), extack); if (!qdisc) return -ENOMEM; priv->qdiscs[i] = qdisc; qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; } /* If the mqprio options indicate that hardware should own * the queue mapping then run ndo_setup_tc otherwise use the * supplied and verified mapping */ if (qopt->hw) { err = mqprio_enable_offload(sch, qopt, extack); if (err) return err; } else { netdev_set_num_tc(dev, qopt->num_tc); for (i = 0; i < qopt->num_tc; i++) netdev_set_tc_queue(dev, i, qopt->count[i], qopt->offset[i]); } /* Always use supplied priority mappings */ for (i = 0; i < TC_BITMASK + 1; i++) netdev_set_prio_tc_map(dev, i, qopt->prio_tc_map[i]); sch->flags |= TCQ_F_MQROOT; return 0; } static void mqprio_attach(struct Qdisc *sch) { struct net_device *dev = qdisc_dev(sch); struct mqprio_sched *priv = qdisc_priv(sch); struct Qdisc *qdisc, *old; unsigned int ntx; /* Attach underlying qdisc */ for (ntx = 0; ntx < dev->num_tx_queues; ntx++) { qdisc = priv->qdiscs[ntx]; old = dev_graft_qdisc(qdisc->dev_queue, qdisc); if (old) qdisc_put(old); if (ntx < dev->real_num_tx_queues) qdisc_hash_add(qdisc, false); } kfree(priv->qdiscs); priv->qdiscs = NULL; } static struct netdev_queue *mqprio_queue_get(struct Qdisc *sch, unsigned long cl) { struct net_device *dev = qdisc_dev(sch); unsigned long ntx = cl - 1; if (ntx >= dev->num_tx_queues) return NULL; return netdev_get_tx_queue(dev, ntx); } static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new, struct Qdisc **old, struct netlink_ext_ack *extack) { struct net_device *dev = qdisc_dev(sch); struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl); if (!dev_queue) return -EINVAL; if (dev->flags & IFF_UP) dev_deactivate(dev); *old = dev_graft_qdisc(dev_queue, new); if (new) new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; if (dev->flags & IFF_UP) dev_activate(dev); return 0; } static int dump_rates(struct mqprio_sched *priv, struct tc_mqprio_qopt *opt, struct sk_buff *skb) { struct nlattr *nest; int i; if (priv->flags & TC_MQPRIO_F_MIN_RATE) { nest = nla_nest_start_noflag(skb, TCA_MQPRIO_MIN_RATE64); if (!nest) goto nla_put_failure; for (i = 0; i < opt->num_tc; i++) { if (nla_put(skb, TCA_MQPRIO_MIN_RATE64, sizeof(priv->min_rate[i]), &priv->min_rate[i])) goto nla_put_failure; } nla_nest_end(skb, nest); } if (priv->flags & TC_MQPRIO_F_MAX_RATE) { nest = nla_nest_start_noflag(skb, TCA_MQPRIO_MAX_RATE64); if (!nest) goto nla_put_failure; for (i = 0; i < opt->num_tc; i++) { if (nla_put(skb, TCA_MQPRIO_MAX_RATE64, sizeof(priv->max_rate[i]), &priv->max_rate[i])) goto nla_put_failure; } nla_nest_end(skb, nest); } return 0; nla_put_failure: nla_nest_cancel(skb, nest); return -1; } static int mqprio_dump_tc_entries(struct mqprio_sched *priv, struct sk_buff *skb) { struct nlattr *n; int tc; for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { n = nla_nest_start(skb, TCA_MQPRIO_TC_ENTRY); if (!n) return -EMSGSIZE; if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_INDEX, tc)) goto nla_put_failure; if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_FP, priv->fp[tc])) goto nla_put_failure; nla_nest_end(skb, n); } return 0; nla_put_failure: nla_nest_cancel(skb, n); return -EMSGSIZE; } static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) { struct net_device *dev = qdisc_dev(sch); struct mqprio_sched *priv = qdisc_priv(sch); struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb); struct tc_mqprio_qopt opt = { 0 }; struct Qdisc *qdisc; unsigned int ntx; sch->q.qlen = 0; gnet_stats_basic_sync_init(&sch->bstats); memset(&sch->qstats, 0, sizeof(sch->qstats)); /* MQ supports lockless qdiscs. However, statistics accounting needs * to account for all, none, or a mix of locked and unlocked child * qdiscs. Percpu stats are added to counters in-band and locking * qdisc totals are added at end. */ for (ntx = 0; ntx < dev->num_tx_queues; ntx++) { qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping); spin_lock_bh(qdisc_lock(qdisc)); gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats, &qdisc->bstats, false); gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats, &qdisc->qstats); sch->q.qlen += qdisc_qlen(qdisc); spin_unlock_bh(qdisc_lock(qdisc)); } mqprio_qopt_reconstruct(dev, &opt); opt.hw = priv->hw_offload; if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt)) goto nla_put_failure; if ((priv->flags & TC_MQPRIO_F_MODE) && nla_put_u16(skb, TCA_MQPRIO_MODE, priv->mode)) goto nla_put_failure; if ((priv->flags & TC_MQPRIO_F_SHAPER) && nla_put_u16(skb, TCA_MQPRIO_SHAPER, priv->shaper)) goto nla_put_failure; if ((priv->flags & TC_MQPRIO_F_MIN_RATE || priv->flags & TC_MQPRIO_F_MAX_RATE) && (dump_rates(priv, &opt, skb) != 0)) goto nla_put_failure; if (mqprio_dump_tc_entries(priv, skb)) goto nla_put_failure; return nla_nest_end(skb, nla); nla_put_failure: nlmsg_trim(skb, nla); return -1; } static struct Qdisc *mqprio_leaf(struct Qdisc *sch, unsigned long cl) { struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl); if (!dev_queue) return NULL; return rtnl_dereference(dev_queue->qdisc_sleeping); } static unsigned long mqprio_find(struct Qdisc *sch, u32 classid) { struct net_device *dev = qdisc_dev(sch); unsigned int ntx = TC_H_MIN(classid); /* There are essentially two regions here that have valid classid * values. The first region will have a classid value of 1 through * num_tx_queues. All of these are backed by actual Qdiscs. */ if (ntx < TC_H_MIN_PRIORITY) return (ntx <= dev->num_tx_queues) ? ntx : 0; /* The second region represents the hardware traffic classes. These * are represented by classid values of TC_H_MIN_PRIORITY through * TC_H_MIN_PRIORITY + netdev_get_num_tc - 1 */ return ((ntx - TC_H_MIN_PRIORITY) < netdev_get_num_tc(dev)) ? ntx : 0; } static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl, struct sk_buff *skb, struct tcmsg *tcm) { if (cl < TC_H_MIN_PRIORITY) { struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl); struct net_device *dev = qdisc_dev(sch); int tc = netdev_txq_to_tc(dev, cl - 1); tcm->tcm_parent = (tc < 0) ? 0 : TC_H_MAKE(TC_H_MAJ(sch->handle), TC_H_MIN(tc + TC_H_MIN_PRIORITY)); tcm->tcm_info = rtnl_dereference(dev_queue->qdisc_sleeping)->handle; } else { tcm->tcm_parent = TC_H_ROOT; tcm->tcm_info = 0; } tcm->tcm_handle |= TC_H_MIN(cl); return 0; } static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) __releases(d->lock) __acquires(d->lock) { if (cl >= TC_H_MIN_PRIORITY) { int i; __u32 qlen; struct gnet_stats_queue qstats = {0}; struct gnet_stats_basic_sync bstats; struct net_device *dev = qdisc_dev(sch); struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK]; gnet_stats_basic_sync_init(&bstats); /* Drop lock here it will be reclaimed before touching * statistics this is required because the d->lock we * hold here is the look on dev_queue->qdisc_sleeping * also acquired below. */ if (d->lock) spin_unlock_bh(d->lock); for (i = tc.offset; i < tc.offset + tc.count; i++) { struct netdev_queue *q = netdev_get_tx_queue(dev, i); struct Qdisc *qdisc = rtnl_dereference(q->qdisc); spin_lock_bh(qdisc_lock(qdisc)); gnet_stats_add_basic(&bstats, qdisc->cpu_bstats, &qdisc->bstats, false); gnet_stats_add_queue(&qstats, qdisc->cpu_qstats, &qdisc->qstats); sch->q.qlen += qdisc_qlen(qdisc); spin_unlock_bh(qdisc_lock(qdisc)); } qlen = qdisc_qlen(sch) + qstats.qlen; /* Reclaim root sleeping lock before completing stats */ if (d->lock) spin_lock_bh(d->lock); if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 || gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0) return -1; } else { struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl); sch = rtnl_dereference(dev_queue->qdisc_sleeping); if (gnet_stats_copy_basic(d, sch->cpu_bstats, &sch->bstats, true) < 0 || qdisc_qstats_copy(d, sch) < 0) return -1; } return 0; } static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg) { struct net_device *dev = qdisc_dev(sch); unsigned long ntx; if (arg->stop) return; /* Walk hierarchy with a virtual class per tc */ arg->count = arg->skip; for (ntx = arg->skip; ntx < netdev_get_num_tc(dev); ntx++) { if (!tc_qdisc_stats_dump(sch, ntx + TC_H_MIN_PRIORITY, arg)) return; } /* Pad the values and skip over unused traffic classes */ if (ntx < TC_MAX_QUEUE) { arg->count = TC_MAX_QUEUE; ntx = TC_MAX_QUEUE; } /* Reset offset, sort out remaining per-queue qdiscs */ for (ntx -= TC_MAX_QUEUE; ntx < dev->num_tx_queues; ntx++) { if (arg->fn(sch, ntx + 1, arg) < 0) { arg->stop = 1; return; } arg->count++; } } static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch, struct tcmsg *tcm) { return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent)); } static const struct Qdisc_class_ops mqprio_class_ops = { .graft = mqprio_graft, .leaf = mqprio_leaf, .find = mqprio_find, .walk = mqprio_walk, .dump = mqprio_dump_class, .dump_stats = mqprio_dump_class_stats, .select_queue = mqprio_select_queue, }; static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = { .cl_ops = &mqprio_class_ops, .id = "mqprio", .priv_size = sizeof(struct mqprio_sched), .init = mqprio_init, .destroy = mqprio_destroy, .attach = mqprio_attach, .change_real_num_tx = mq_change_real_num_tx, .dump = mqprio_dump, .owner = THIS_MODULE, }; MODULE_ALIAS_NET_SCH("mqprio"); static int __init mqprio_module_init(void) { return register_qdisc(&mqprio_qdisc_ops); } static void __exit mqprio_module_exit(void) { unregister_qdisc(&mqprio_qdisc_ops); } module_init(mqprio_module_init); module_exit(mqprio_module_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Classful multiqueue prio qdisc"); |
| 3 2 3 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2020 Google LLC. */ #include <linux/filter.h> #include <linux/bpf.h> #include <linux/btf.h> #include <linux/binfmts.h> #include <linux/lsm_hooks.h> #include <linux/bpf_lsm.h> #include <linux/kallsyms.h> #include <net/bpf_sk_storage.h> #include <linux/bpf_local_storage.h> #include <linux/btf_ids.h> #include <linux/ima.h> #include <linux/bpf-cgroup.h> /* For every LSM hook that allows attachment of BPF programs, declare a nop * function where a BPF program can be attached. */ #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ noinline RET bpf_lsm_##NAME(__VA_ARGS__) \ { \ return DEFAULT; \ } #include <linux/lsm_hook_defs.h> #undef LSM_HOOK #define LSM_HOOK(RET, DEFAULT, NAME, ...) BTF_ID(func, bpf_lsm_##NAME) BTF_SET_START(bpf_lsm_hooks) #include <linux/lsm_hook_defs.h> #undef LSM_HOOK BTF_SET_END(bpf_lsm_hooks) BTF_SET_START(bpf_lsm_disabled_hooks) BTF_ID(func, bpf_lsm_vm_enough_memory) BTF_ID(func, bpf_lsm_inode_need_killpriv) BTF_ID(func, bpf_lsm_inode_getsecurity) BTF_ID(func, bpf_lsm_inode_listsecurity) BTF_ID(func, bpf_lsm_inode_copy_up_xattr) BTF_ID(func, bpf_lsm_getselfattr) BTF_ID(func, bpf_lsm_getprocattr) BTF_ID(func, bpf_lsm_setprocattr) #ifdef CONFIG_KEYS BTF_ID(func, bpf_lsm_key_getsecurity) #endif #ifdef CONFIG_AUDIT BTF_ID(func, bpf_lsm_audit_rule_match) #endif BTF_ID(func, bpf_lsm_ismaclabel) BTF_SET_END(bpf_lsm_disabled_hooks) /* List of LSM hooks that should operate on 'current' cgroup regardless * of function signature. */ BTF_SET_START(bpf_lsm_current_hooks) /* operate on freshly allocated sk without any cgroup association */ #ifdef CONFIG_SECURITY_NETWORK BTF_ID(func, bpf_lsm_sk_alloc_security) BTF_ID(func, bpf_lsm_sk_free_security) #endif BTF_SET_END(bpf_lsm_current_hooks) /* List of LSM hooks that trigger while the socket is properly locked. */ BTF_SET_START(bpf_lsm_locked_sockopt_hooks) #ifdef CONFIG_SECURITY_NETWORK BTF_ID(func, bpf_lsm_sock_graft) BTF_ID(func, bpf_lsm_inet_csk_clone) BTF_ID(func, bpf_lsm_inet_conn_established) #endif BTF_SET_END(bpf_lsm_locked_sockopt_hooks) /* List of LSM hooks that trigger while the socket is _not_ locked, * but it's ok to call bpf_{g,s}etsockopt because the socket is still * in the early init phase. */ BTF_SET_START(bpf_lsm_unlocked_sockopt_hooks) #ifdef CONFIG_SECURITY_NETWORK BTF_ID(func, bpf_lsm_socket_post_create) BTF_ID(func, bpf_lsm_socket_socketpair) #endif BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks) #ifdef CONFIG_CGROUP_BPF void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func) { const struct btf_param *args __maybe_unused; if (btf_type_vlen(prog->aux->attach_func_proto) < 1 || btf_id_set_contains(&bpf_lsm_current_hooks, prog->aux->attach_btf_id)) { *bpf_func = __cgroup_bpf_run_lsm_current; return; } #ifdef CONFIG_NET args = btf_params(prog->aux->attach_func_proto); if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET]) *bpf_func = __cgroup_bpf_run_lsm_socket; else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK]) *bpf_func = __cgroup_bpf_run_lsm_sock; else #endif *bpf_func = __cgroup_bpf_run_lsm_current; } #endif int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, const struct bpf_prog *prog) { u32 btf_id = prog->aux->attach_btf_id; const char *func_name = prog->aux->attach_func_name; if (!prog->gpl_compatible) { bpf_log(vlog, "LSM programs must have a GPL compatible license\n"); return -EINVAL; } if (btf_id_set_contains(&bpf_lsm_disabled_hooks, btf_id)) { bpf_log(vlog, "attach_btf_id %u points to disabled hook %s\n", btf_id, func_name); return -EINVAL; } if (!btf_id_set_contains(&bpf_lsm_hooks, btf_id)) { bpf_log(vlog, "attach_btf_id %u points to wrong type name %s\n", btf_id, func_name); return -EINVAL; } return 0; } /* Mask for all the currently supported BPRM option flags */ #define BPF_F_BRPM_OPTS_MASK BPF_F_BPRM_SECUREEXEC BPF_CALL_2(bpf_bprm_opts_set, struct linux_binprm *, bprm, u64, flags) { if (flags & ~BPF_F_BRPM_OPTS_MASK) return -EINVAL; bprm->secureexec = (flags & BPF_F_BPRM_SECUREEXEC); return 0; } BTF_ID_LIST_SINGLE(bpf_bprm_opts_set_btf_ids, struct, linux_binprm) static const struct bpf_func_proto bpf_bprm_opts_set_proto = { .func = bpf_bprm_opts_set, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_BTF_ID, .arg1_btf_id = &bpf_bprm_opts_set_btf_ids[0], .arg2_type = ARG_ANYTHING, }; BPF_CALL_3(bpf_ima_inode_hash, struct inode *, inode, void *, dst, u32, size) { return ima_inode_hash(inode, dst, size); } static bool bpf_ima_inode_hash_allowed(const struct bpf_prog *prog) { return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id); } BTF_ID_LIST_SINGLE(bpf_ima_inode_hash_btf_ids, struct, inode) static const struct bpf_func_proto bpf_ima_inode_hash_proto = { .func = bpf_ima_inode_hash, .gpl_only = false, .might_sleep = true, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_BTF_ID, .arg1_btf_id = &bpf_ima_inode_hash_btf_ids[0], .arg2_type = ARG_PTR_TO_UNINIT_MEM, .arg3_type = ARG_CONST_SIZE, .allowed = bpf_ima_inode_hash_allowed, }; BPF_CALL_3(bpf_ima_file_hash, struct file *, file, void *, dst, u32, size) { return ima_file_hash(file, dst, size); } BTF_ID_LIST_SINGLE(bpf_ima_file_hash_btf_ids, struct, file) static const struct bpf_func_proto bpf_ima_file_hash_proto = { .func = bpf_ima_file_hash, .gpl_only = false, .might_sleep = true, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_BTF_ID, .arg1_btf_id = &bpf_ima_file_hash_btf_ids[0], .arg2_type = ARG_PTR_TO_UNINIT_MEM, .arg3_type = ARG_CONST_SIZE, .allowed = bpf_ima_inode_hash_allowed, }; BPF_CALL_1(bpf_get_attach_cookie, void *, ctx) { struct bpf_trace_run_ctx *run_ctx; run_ctx = container_of(current->bpf_ctx, struct bpf_trace_run_ctx, run_ctx); return run_ctx->bpf_cookie; } static const struct bpf_func_proto bpf_get_attach_cookie_proto = { .func = bpf_get_attach_cookie, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, }; static const struct bpf_func_proto * bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { const struct bpf_func_proto *func_proto; if (prog->expected_attach_type == BPF_LSM_CGROUP) { func_proto = cgroup_common_func_proto(func_id, prog); if (func_proto) return func_proto; } switch (func_id) { case BPF_FUNC_inode_storage_get: return &bpf_inode_storage_get_proto; case BPF_FUNC_inode_storage_delete: return &bpf_inode_storage_delete_proto; #ifdef CONFIG_NET case BPF_FUNC_sk_storage_get: return &bpf_sk_storage_get_proto; case BPF_FUNC_sk_storage_delete: return &bpf_sk_storage_delete_proto; #endif /* CONFIG_NET */ case BPF_FUNC_spin_lock: return &bpf_spin_lock_proto; case BPF_FUNC_spin_unlock: return &bpf_spin_unlock_proto; case BPF_FUNC_bprm_opts_set: return &bpf_bprm_opts_set_proto; case BPF_FUNC_ima_inode_hash: return &bpf_ima_inode_hash_proto; case BPF_FUNC_ima_file_hash: return &bpf_ima_file_hash_proto; case BPF_FUNC_get_attach_cookie: return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL; #ifdef CONFIG_NET case BPF_FUNC_setsockopt: if (prog->expected_attach_type != BPF_LSM_CGROUP) return NULL; if (btf_id_set_contains(&bpf_lsm_locked_sockopt_hooks, prog->aux->attach_btf_id)) return &bpf_sk_setsockopt_proto; if (btf_id_set_contains(&bpf_lsm_unlocked_sockopt_hooks, prog->aux->attach_btf_id)) return &bpf_unlocked_sk_setsockopt_proto; return NULL; case BPF_FUNC_getsockopt: if (prog->expected_attach_type != BPF_LSM_CGROUP) return NULL; if (btf_id_set_contains(&bpf_lsm_locked_sockopt_hooks, prog->aux->attach_btf_id)) return &bpf_sk_getsockopt_proto; if (btf_id_set_contains(&bpf_lsm_unlocked_sockopt_hooks, prog->aux->attach_btf_id)) return &bpf_unlocked_sk_getsockopt_proto; return NULL; #endif default: return tracing_prog_func_proto(func_id, prog); } } /* The set of hooks which are called without pagefaults disabled and are allowed * to "sleep" and thus can be used for sleepable BPF programs. */ BTF_SET_START(sleepable_lsm_hooks) BTF_ID(func, bpf_lsm_bpf) BTF_ID(func, bpf_lsm_bpf_map) BTF_ID(func, bpf_lsm_bpf_map_create) BTF_ID(func, bpf_lsm_bpf_map_free) BTF_ID(func, bpf_lsm_bpf_prog) BTF_ID(func, bpf_lsm_bpf_prog_load) BTF_ID(func, bpf_lsm_bpf_prog_free) BTF_ID(func, bpf_lsm_bpf_token_create) BTF_ID(func, bpf_lsm_bpf_token_free) BTF_ID(func, bpf_lsm_bpf_token_cmd) BTF_ID(func, bpf_lsm_bpf_token_capable) BTF_ID(func, bpf_lsm_bprm_check_security) BTF_ID(func, bpf_lsm_bprm_committed_creds) BTF_ID(func, bpf_lsm_bprm_committing_creds) BTF_ID(func, bpf_lsm_bprm_creds_for_exec) BTF_ID(func, bpf_lsm_bprm_creds_from_file) BTF_ID(func, bpf_lsm_capget) BTF_ID(func, bpf_lsm_capset) BTF_ID(func, bpf_lsm_cred_prepare) BTF_ID(func, bpf_lsm_file_ioctl) BTF_ID(func, bpf_lsm_file_lock) BTF_ID(func, bpf_lsm_file_open) BTF_ID(func, bpf_lsm_file_post_open) BTF_ID(func, bpf_lsm_file_receive) BTF_ID(func, bpf_lsm_inode_create) BTF_ID(func, bpf_lsm_inode_free_security) BTF_ID(func, bpf_lsm_inode_getattr) BTF_ID(func, bpf_lsm_inode_getxattr) BTF_ID(func, bpf_lsm_inode_mknod) BTF_ID(func, bpf_lsm_inode_need_killpriv) BTF_ID(func, bpf_lsm_inode_post_setxattr) BTF_ID(func, bpf_lsm_inode_post_removexattr) BTF_ID(func, bpf_lsm_inode_readlink) BTF_ID(func, bpf_lsm_inode_removexattr) BTF_ID(func, bpf_lsm_inode_rename) BTF_ID(func, bpf_lsm_inode_rmdir) BTF_ID(func, bpf_lsm_inode_setattr) BTF_ID(func, bpf_lsm_inode_setxattr) BTF_ID(func, bpf_lsm_inode_symlink) BTF_ID(func, bpf_lsm_inode_unlink) BTF_ID(func, bpf_lsm_kernel_module_request) BTF_ID(func, bpf_lsm_kernel_read_file) BTF_ID(func, bpf_lsm_kernfs_init_security) #ifdef CONFIG_SECURITY_PATH BTF_ID(func, bpf_lsm_path_unlink) BTF_ID(func, bpf_lsm_path_mkdir) BTF_ID(func, bpf_lsm_path_rmdir) BTF_ID(func, bpf_lsm_path_truncate) BTF_ID(func, bpf_lsm_path_symlink) BTF_ID(func, bpf_lsm_path_link) BTF_ID(func, bpf_lsm_path_rename) BTF_ID(func, bpf_lsm_path_chmod) BTF_ID(func, bpf_lsm_path_chown) #endif /* CONFIG_SECURITY_PATH */ BTF_ID(func, bpf_lsm_mmap_file) BTF_ID(func, bpf_lsm_netlink_send) BTF_ID(func, bpf_lsm_path_notify) BTF_ID(func, bpf_lsm_release_secctx) BTF_ID(func, bpf_lsm_sb_alloc_security) BTF_ID(func, bpf_lsm_sb_eat_lsm_opts) BTF_ID(func, bpf_lsm_sb_kern_mount) BTF_ID(func, bpf_lsm_sb_mount) BTF_ID(func, bpf_lsm_sb_remount) BTF_ID(func, bpf_lsm_sb_set_mnt_opts) BTF_ID(func, bpf_lsm_sb_show_options) BTF_ID(func, bpf_lsm_sb_statfs) BTF_ID(func, bpf_lsm_sb_umount) BTF_ID(func, bpf_lsm_settime) #ifdef CONFIG_SECURITY_NETWORK BTF_ID(func, bpf_lsm_inet_conn_established) BTF_ID(func, bpf_lsm_socket_accept) BTF_ID(func, bpf_lsm_socket_bind) BTF_ID(func, bpf_lsm_socket_connect) BTF_ID(func, bpf_lsm_socket_create) BTF_ID(func, bpf_lsm_socket_getpeername) BTF_ID(func, bpf_lsm_socket_getpeersec_dgram) BTF_ID(func, bpf_lsm_socket_getsockname) BTF_ID(func, bpf_lsm_socket_getsockopt) BTF_ID(func, bpf_lsm_socket_listen) BTF_ID(func, bpf_lsm_socket_post_create) BTF_ID(func, bpf_lsm_socket_recvmsg) BTF_ID(func, bpf_lsm_socket_sendmsg) BTF_ID(func, bpf_lsm_socket_shutdown) BTF_ID(func, bpf_lsm_socket_socketpair) #endif /* CONFIG_SECURITY_NETWORK */ BTF_ID(func, bpf_lsm_syslog) BTF_ID(func, bpf_lsm_task_alloc) BTF_ID(func, bpf_lsm_task_prctl) BTF_ID(func, bpf_lsm_task_setscheduler) BTF_ID(func, bpf_lsm_task_to_inode) BTF_ID(func, bpf_lsm_userns_create) BTF_SET_END(sleepable_lsm_hooks) BTF_SET_START(untrusted_lsm_hooks) BTF_ID(func, bpf_lsm_bpf_map_free) BTF_ID(func, bpf_lsm_bpf_prog_free) BTF_ID(func, bpf_lsm_file_alloc_security) BTF_ID(func, bpf_lsm_file_free_security) #ifdef CONFIG_SECURITY_NETWORK BTF_ID(func, bpf_lsm_sk_alloc_security) BTF_ID(func, bpf_lsm_sk_free_security) #endif /* CONFIG_SECURITY_NETWORK */ BTF_ID(func, bpf_lsm_task_free) BTF_SET_END(untrusted_lsm_hooks) bool bpf_lsm_is_sleepable_hook(u32 btf_id) { return btf_id_set_contains(&sleepable_lsm_hooks, btf_id); } bool bpf_lsm_is_trusted(const struct bpf_prog *prog) { return !btf_id_set_contains(&untrusted_lsm_hooks, prog->aux->attach_btf_id); } const struct bpf_prog_ops lsm_prog_ops = { }; const struct bpf_verifier_ops lsm_verifier_ops = { .get_func_proto = bpf_lsm_func_proto, .is_valid_access = btf_ctx_access, }; /* hooks return 0 or 1 */ BTF_SET_START(bool_lsm_hooks) #ifdef CONFIG_SECURITY_NETWORK_XFRM BTF_ID(func, bpf_lsm_xfrm_state_pol_flow_match) #endif #ifdef CONFIG_AUDIT BTF_ID(func, bpf_lsm_audit_rule_known) #endif BTF_ID(func, bpf_lsm_inode_xattr_skipcap) BTF_SET_END(bool_lsm_hooks) int bpf_lsm_get_retval_range(const struct bpf_prog *prog, struct bpf_retval_range *retval_range) { /* no return value range for void hooks */ if (!prog->aux->attach_func_proto->type) return -EINVAL; if (btf_id_set_contains(&bool_lsm_hooks, prog->aux->attach_btf_id)) { retval_range->minval = 0; retval_range->maxval = 1; } else { /* All other available LSM hooks, except task_prctl, return 0 * on success and negative error code on failure. * To keep things simple, we only allow bpf progs to return 0 * or negative errno for task_prctl too. */ retval_range->minval = -MAX_ERRNO; retval_range->maxval = 0; } return 0; } |
| 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 | // SPDX-License-Identifier: GPL-2.0+ /* * drivers/usb/class/usbtmc.c - USB Test & Measurement class driver * * Copyright (C) 2007 Stefan Kopp, Gechingen, Germany * Copyright (C) 2008 Novell, Inc. * Copyright (C) 2008 Greg Kroah-Hartman <gregkh@suse.de> * Copyright (C) 2018 IVI Foundation, Inc. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/kernel.h> #include <linux/fs.h> #include <linux/uaccess.h> #include <linux/kref.h> #include <linux/slab.h> #include <linux/poll.h> #include <linux/mutex.h> #include <linux/usb.h> #include <linux/compat.h> #include <linux/usb/tmc.h> /* Increment API VERSION when changing tmc.h with new flags or ioctls * or when changing a significant behavior of the driver. */ #define USBTMC_API_VERSION (3) #define USBTMC_HEADER_SIZE 12 #define USBTMC_MINOR_BASE 176 /* Minimum USB timeout (in milliseconds) */ #define USBTMC_MIN_TIMEOUT 100 /* Default USB timeout (in milliseconds) */ #define USBTMC_TIMEOUT 5000 /* Max number of urbs used in write transfers */ #define MAX_URBS_IN_FLIGHT 16 /* I/O buffer size used in generic read/write functions */ #define USBTMC_BUFSIZE (4096) /* * Maximum number of read cycles to empty bulk in endpoint during CLEAR and * ABORT_BULK_IN requests. Ends the loop if (for whatever reason) a short * packet is never read. */ #define USBTMC_MAX_READS_TO_CLEAR_BULK_IN 100 static const struct usb_device_id usbtmc_devices[] = { { USB_INTERFACE_INFO(USB_CLASS_APP_SPEC, 3, 0), }, { USB_INTERFACE_INFO(USB_CLASS_APP_SPEC, 3, 1), }, { 0, } /* terminating entry */ }; MODULE_DEVICE_TABLE(usb, usbtmc_devices); /* * This structure is the capabilities for the device * See section 4.2.1.8 of the USBTMC specification, * and section 4.2.2 of the USBTMC usb488 subclass * specification for details. */ struct usbtmc_dev_capabilities { __u8 interface_capabilities; __u8 device_capabilities; __u8 usb488_interface_capabilities; __u8 usb488_device_capabilities; }; /* This structure holds private data for each USBTMC device. One copy is * allocated for each USBTMC device in the driver's probe function. */ struct usbtmc_device_data { const struct usb_device_id *id; struct usb_device *usb_dev; struct usb_interface *intf; struct list_head file_list; unsigned int bulk_in; unsigned int bulk_out; u8 bTag; u8 bTag_last_write; /* needed for abort */ u8 bTag_last_read; /* needed for abort */ /* packet size of IN bulk */ u16 wMaxPacketSize; /* data for interrupt in endpoint handling */ u8 bNotify1; u8 bNotify2; u16 ifnum; u8 iin_bTag; u8 *iin_buffer; atomic_t iin_data_valid; unsigned int iin_ep; int iin_ep_present; int iin_interval; struct urb *iin_urb; u16 iin_wMaxPacketSize; /* coalesced usb488_caps from usbtmc_dev_capabilities */ __u8 usb488_caps; bool zombie; /* fd of disconnected device */ struct usbtmc_dev_capabilities capabilities; struct kref kref; struct mutex io_mutex; /* only one i/o function running at a time */ wait_queue_head_t waitq; struct fasync_struct *fasync; spinlock_t dev_lock; /* lock for file_list */ }; #define to_usbtmc_data(d) container_of(d, struct usbtmc_device_data, kref) /* * This structure holds private data for each USBTMC file handle. */ struct usbtmc_file_data { struct usbtmc_device_data *data; struct list_head file_elem; u32 timeout; u8 srq_byte; atomic_t srq_asserted; atomic_t closing; u8 bmTransferAttributes; /* member of DEV_DEP_MSG_IN */ u8 eom_val; u8 term_char; bool term_char_enabled; bool auto_abort; spinlock_t err_lock; /* lock for errors */ struct usb_anchor submitted; /* data for generic_write */ struct semaphore limit_write_sem; u32 out_transfer_size; int out_status; /* data for generic_read */ u32 in_transfer_size; int in_status; int in_urbs_used; struct usb_anchor in_anchor; wait_queue_head_t wait_bulk_in; }; /* Forward declarations */ static struct usb_driver usbtmc_driver; static void usbtmc_draw_down(struct usbtmc_file_data *file_data); static void usbtmc_delete(struct kref *kref) { struct usbtmc_device_data *data = to_usbtmc_data(kref); usb_put_dev(data->usb_dev); kfree(data); } static int usbtmc_open(struct inode *inode, struct file *filp) { struct usb_interface *intf; struct usbtmc_device_data *data; struct usbtmc_file_data *file_data; intf = usb_find_interface(&usbtmc_driver, iminor(inode)); if (!intf) { pr_err("can not find device for minor %d", iminor(inode)); return -ENODEV; } file_data = kzalloc(sizeof(*file_data), GFP_KERNEL); if (!file_data) return -ENOMEM; spin_lock_init(&file_data->err_lock); sema_init(&file_data->limit_write_sem, MAX_URBS_IN_FLIGHT); init_usb_anchor(&file_data->submitted); init_usb_anchor(&file_data->in_anchor); init_waitqueue_head(&file_data->wait_bulk_in); data = usb_get_intfdata(intf); /* Protect reference to data from file structure until release */ kref_get(&data->kref); mutex_lock(&data->io_mutex); file_data->data = data; atomic_set(&file_data->closing, 0); file_data->timeout = USBTMC_TIMEOUT; file_data->term_char = '\n'; file_data->term_char_enabled = 0; file_data->auto_abort = 0; file_data->eom_val = 1; INIT_LIST_HEAD(&file_data->file_elem); spin_lock_irq(&data->dev_lock); list_add_tail(&file_data->file_elem, &data->file_list); spin_unlock_irq(&data->dev_lock); mutex_unlock(&data->io_mutex); /* Store pointer in file structure's private data field */ filp->private_data = file_data; return 0; } /* * usbtmc_flush - called before file handle is closed */ static int usbtmc_flush(struct file *file, fl_owner_t id) { struct usbtmc_file_data *file_data; struct usbtmc_device_data *data; file_data = file->private_data; if (file_data == NULL) return -ENODEV; atomic_set(&file_data->closing, 1); data = file_data->data; /* wait for io to stop */ mutex_lock(&data->io_mutex); usbtmc_draw_down(file_data); spin_lock_irq(&file_data->err_lock); file_data->in_status = 0; file_data->in_transfer_size = 0; file_data->in_urbs_used = 0; file_data->out_status = 0; file_data->out_transfer_size = 0; spin_unlock_irq(&file_data->err_lock); wake_up_interruptible_all(&data->waitq); mutex_unlock(&data->io_mutex); return 0; } static int usbtmc_release(struct inode *inode, struct file *file) { struct usbtmc_file_data *file_data = file->private_data; /* prevent IO _AND_ usbtmc_interrupt */ mutex_lock(&file_data->data->io_mutex); spin_lock_irq(&file_data->data->dev_lock); list_del(&file_data->file_elem); spin_unlock_irq(&file_data->data->dev_lock); mutex_unlock(&file_data->data->io_mutex); kref_put(&file_data->data->kref, usbtmc_delete); file_data->data = NULL; kfree(file_data); return 0; } static int usbtmc_ioctl_abort_bulk_in_tag(struct usbtmc_device_data *data, u8 tag) { u8 *buffer; struct device *dev; int rv; int n; int actual; dev = &data->intf->dev; buffer = kmalloc(USBTMC_BUFSIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_INITIATE_ABORT_BULK_IN, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT, tag, data->bulk_in, buffer, 2, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "INITIATE_ABORT_BULK_IN returned %x with tag %02x\n", buffer[0], buffer[1]); if (buffer[0] == USBTMC_STATUS_FAILED) { /* No transfer in progress and the Bulk-OUT FIFO is empty. */ rv = 0; goto exit; } if (buffer[0] == USBTMC_STATUS_TRANSFER_NOT_IN_PROGRESS) { /* The device returns this status if either: * - There is a transfer in progress, but the specified bTag * does not match. * - There is no transfer in progress, but the Bulk-OUT FIFO * is not empty. */ rv = -ENOMSG; goto exit; } if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "INITIATE_ABORT_BULK_IN returned %x\n", buffer[0]); rv = -EPERM; goto exit; } n = 0; usbtmc_abort_bulk_in_status: dev_dbg(dev, "Reading from bulk in EP\n"); /* Data must be present. So use low timeout 300 ms */ actual = 0; rv = usb_bulk_msg(data->usb_dev, usb_rcvbulkpipe(data->usb_dev, data->bulk_in), buffer, USBTMC_BUFSIZE, &actual, 300); print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, buffer, actual, true); n++; if (rv < 0) { dev_err(dev, "usb_bulk_msg returned %d\n", rv); if (rv != -ETIMEDOUT) goto exit; } if (actual == USBTMC_BUFSIZE) goto usbtmc_abort_bulk_in_status; if (n >= USBTMC_MAX_READS_TO_CLEAR_BULK_IN) { dev_err(dev, "Couldn't clear device buffer within %d cycles\n", USBTMC_MAX_READS_TO_CLEAR_BULK_IN); rv = -EPERM; goto exit; } rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_CHECK_ABORT_BULK_IN_STATUS, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT, 0, data->bulk_in, buffer, 0x08, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "CHECK_ABORT_BULK_IN returned %x\n", buffer[0]); if (buffer[0] == USBTMC_STATUS_SUCCESS) { rv = 0; goto exit; } if (buffer[0] != USBTMC_STATUS_PENDING) { dev_err(dev, "CHECK_ABORT_BULK_IN returned %x\n", buffer[0]); rv = -EPERM; goto exit; } if ((buffer[1] & 1) > 0) { /* The device has 1 or more queued packets the Host can read */ goto usbtmc_abort_bulk_in_status; } /* The Host must send CHECK_ABORT_BULK_IN_STATUS at a later time. */ rv = -EAGAIN; exit: kfree(buffer); return rv; } static int usbtmc_ioctl_abort_bulk_in(struct usbtmc_device_data *data) { return usbtmc_ioctl_abort_bulk_in_tag(data, data->bTag_last_read); } static int usbtmc_ioctl_abort_bulk_out_tag(struct usbtmc_device_data *data, u8 tag) { struct device *dev; u8 *buffer; int rv; int n; dev = &data->intf->dev; buffer = kmalloc(8, GFP_KERNEL); if (!buffer) return -ENOMEM; rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_INITIATE_ABORT_BULK_OUT, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT, tag, data->bulk_out, buffer, 2, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "INITIATE_ABORT_BULK_OUT returned %x\n", buffer[0]); if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "INITIATE_ABORT_BULK_OUT returned %x\n", buffer[0]); rv = -EPERM; goto exit; } n = 0; usbtmc_abort_bulk_out_check_status: /* do not stress device with subsequent requests */ msleep(50); rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_CHECK_ABORT_BULK_OUT_STATUS, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT, 0, data->bulk_out, buffer, 0x08, USB_CTRL_GET_TIMEOUT); n++; if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "CHECK_ABORT_BULK_OUT returned %x\n", buffer[0]); if (buffer[0] == USBTMC_STATUS_SUCCESS) goto usbtmc_abort_bulk_out_clear_halt; if ((buffer[0] == USBTMC_STATUS_PENDING) && (n < USBTMC_MAX_READS_TO_CLEAR_BULK_IN)) goto usbtmc_abort_bulk_out_check_status; rv = -EPERM; goto exit; usbtmc_abort_bulk_out_clear_halt: rv = usb_clear_halt(data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out)); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } rv = 0; exit: kfree(buffer); return rv; } static int usbtmc_ioctl_abort_bulk_out(struct usbtmc_device_data *data) { return usbtmc_ioctl_abort_bulk_out_tag(data, data->bTag_last_write); } static int usbtmc_get_stb(struct usbtmc_file_data *file_data, __u8 *stb) { struct usbtmc_device_data *data = file_data->data; struct device *dev = &data->intf->dev; u8 *buffer; u8 tag; int rv; long wait_rv; unsigned long expire; dev_dbg(dev, "Enter ioctl_read_stb iin_ep_present: %d\n", data->iin_ep_present); buffer = kmalloc(8, GFP_KERNEL); if (!buffer) return -ENOMEM; atomic_set(&data->iin_data_valid, 0); rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC488_REQUEST_READ_STATUS_BYTE, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, data->iin_bTag, data->ifnum, buffer, 0x03, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "stb usb_control_msg returned %d\n", rv); goto exit; } if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "control status returned %x\n", buffer[0]); rv = -EIO; goto exit; } if (data->iin_ep_present) { expire = msecs_to_jiffies(file_data->timeout); wait_rv = wait_event_interruptible_timeout( data->waitq, atomic_read(&data->iin_data_valid) != 0, expire); if (wait_rv < 0) { dev_dbg(dev, "wait interrupted %ld\n", wait_rv); rv = wait_rv; goto exit; } if (wait_rv == 0) { dev_dbg(dev, "wait timed out\n"); rv = -ETIMEDOUT; goto exit; } tag = data->bNotify1 & 0x7f; if (tag != data->iin_bTag) { dev_err(dev, "expected bTag %x got %x\n", data->iin_bTag, tag); } *stb = data->bNotify2; } else { *stb = buffer[2]; } dev_dbg(dev, "stb:0x%02x received %d\n", (unsigned int)*stb, rv); rv = 0; exit: /* bump interrupt bTag */ data->iin_bTag += 1; if (data->iin_bTag > 127) /* 1 is for SRQ see USBTMC-USB488 subclass spec section 4.3.1 */ data->iin_bTag = 2; kfree(buffer); return rv; } static int usbtmc488_ioctl_read_stb(struct usbtmc_file_data *file_data, void __user *arg) { int srq_asserted = 0; __u8 stb; int rv; rv = usbtmc_get_stb(file_data, &stb); if (rv < 0) return rv; srq_asserted = atomic_xchg(&file_data->srq_asserted, srq_asserted); if (srq_asserted) stb |= 0x40; /* Set RQS bit */ rv = put_user(stb, (__u8 __user *)arg); return rv; } static int usbtmc_ioctl_get_srq_stb(struct usbtmc_file_data *file_data, void __user *arg) { struct usbtmc_device_data *data = file_data->data; struct device *dev = &data->intf->dev; int srq_asserted = 0; __u8 stb = 0; int rv; spin_lock_irq(&data->dev_lock); srq_asserted = atomic_xchg(&file_data->srq_asserted, srq_asserted); if (srq_asserted) { stb = file_data->srq_byte; spin_unlock_irq(&data->dev_lock); rv = put_user(stb, (__u8 __user *)arg); } else { spin_unlock_irq(&data->dev_lock); rv = -ENOMSG; } dev_dbg(dev, "stb:0x%02x with srq received %d\n", (unsigned int)stb, rv); return rv; } static int usbtmc488_ioctl_wait_srq(struct usbtmc_file_data *file_data, __u32 __user *arg) { struct usbtmc_device_data *data = file_data->data; struct device *dev = &data->intf->dev; u32 timeout; unsigned long expire; long wait_rv; if (!data->iin_ep_present) { dev_dbg(dev, "no interrupt endpoint present\n"); return -EFAULT; } if (get_user(timeout, arg)) return -EFAULT; expire = msecs_to_jiffies(timeout); mutex_unlock(&data->io_mutex); wait_rv = wait_event_interruptible_timeout( data->waitq, atomic_read(&file_data->srq_asserted) != 0 || atomic_read(&file_data->closing), expire); mutex_lock(&data->io_mutex); /* Note! disconnect or close could be called in the meantime */ if (atomic_read(&file_data->closing) || data->zombie) return -ENODEV; if (wait_rv < 0) { dev_dbg(dev, "%s - wait interrupted %ld\n", __func__, wait_rv); return wait_rv; } if (wait_rv == 0) { dev_dbg(dev, "%s - wait timed out\n", __func__); return -ETIMEDOUT; } dev_dbg(dev, "%s - srq asserted\n", __func__); return 0; } static int usbtmc488_ioctl_simple(struct usbtmc_device_data *data, void __user *arg, unsigned int cmd) { struct device *dev = &data->intf->dev; __u8 val; u8 *buffer; u16 wValue; int rv; if (!(data->usb488_caps & USBTMC488_CAPABILITY_SIMPLE)) return -EINVAL; buffer = kmalloc(8, GFP_KERNEL); if (!buffer) return -ENOMEM; if (cmd == USBTMC488_REQUEST_REN_CONTROL) { rv = copy_from_user(&val, arg, sizeof(val)); if (rv) { rv = -EFAULT; goto exit; } wValue = val ? 1 : 0; } else { wValue = 0; } rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), cmd, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, wValue, data->ifnum, buffer, 0x01, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "simple usb_control_msg failed %d\n", rv); goto exit; } else if (rv != 1) { dev_warn(dev, "simple usb_control_msg returned %d\n", rv); rv = -EIO; goto exit; } if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "simple control status returned %x\n", buffer[0]); rv = -EIO; goto exit; } rv = 0; exit: kfree(buffer); return rv; } /* * Sends a TRIGGER Bulk-OUT command message * See the USBTMC-USB488 specification, Table 2. * * Also updates bTag_last_write. */ static int usbtmc488_ioctl_trigger(struct usbtmc_file_data *file_data) { struct usbtmc_device_data *data = file_data->data; int retval; u8 *buffer; int actual; buffer = kzalloc(USBTMC_HEADER_SIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; buffer[0] = 128; buffer[1] = data->bTag; buffer[2] = ~data->bTag; retval = usb_bulk_msg(data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out), buffer, USBTMC_HEADER_SIZE, &actual, file_data->timeout); /* Store bTag (in case we need to abort) */ data->bTag_last_write = data->bTag; /* Increment bTag -- and increment again if zero */ data->bTag++; if (!data->bTag) data->bTag++; kfree(buffer); if (retval < 0) { dev_err(&data->intf->dev, "%s returned %d\n", __func__, retval); return retval; } return 0; } static struct urb *usbtmc_create_urb(void) { const size_t bufsize = USBTMC_BUFSIZE; u8 *dmabuf = NULL; struct urb *urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) return NULL; dmabuf = kzalloc(bufsize, GFP_KERNEL); if (!dmabuf) { usb_free_urb(urb); return NULL; } urb->transfer_buffer = dmabuf; urb->transfer_buffer_length = bufsize; urb->transfer_flags |= URB_FREE_BUFFER; return urb; } static void usbtmc_read_bulk_cb(struct urb *urb) { struct usbtmc_file_data *file_data = urb->context; int status = urb->status; unsigned long flags; /* sync/async unlink faults aren't errors */ if (status) { if (!(/* status == -ENOENT || */ status == -ECONNRESET || status == -EREMOTEIO || /* Short packet */ status == -ESHUTDOWN)) dev_err(&file_data->data->intf->dev, "%s - nonzero read bulk status received: %d\n", __func__, status); spin_lock_irqsave(&file_data->err_lock, flags); if (!file_data->in_status) file_data->in_status = status; spin_unlock_irqrestore(&file_data->err_lock, flags); } spin_lock_irqsave(&file_data->err_lock, flags); file_data->in_transfer_size += urb->actual_length; dev_dbg(&file_data->data->intf->dev, "%s - total size: %u current: %d status: %d\n", __func__, file_data->in_transfer_size, urb->actual_length, status); spin_unlock_irqrestore(&file_data->err_lock, flags); usb_anchor_urb(urb, &file_data->in_anchor); wake_up_interruptible(&file_data->wait_bulk_in); wake_up_interruptible(&file_data->data->waitq); } static inline bool usbtmc_do_transfer(struct usbtmc_file_data *file_data) { bool data_or_error; spin_lock_irq(&file_data->err_lock); data_or_error = !usb_anchor_empty(&file_data->in_anchor) || file_data->in_status; spin_unlock_irq(&file_data->err_lock); dev_dbg(&file_data->data->intf->dev, "%s: returns %d\n", __func__, data_or_error); return data_or_error; } static ssize_t usbtmc_generic_read(struct usbtmc_file_data *file_data, void __user *user_buffer, u32 transfer_size, u32 *transferred, u32 flags) { struct usbtmc_device_data *data = file_data->data; struct device *dev = &data->intf->dev; u32 done = 0; u32 remaining; const u32 bufsize = USBTMC_BUFSIZE; int retval = 0; u32 max_transfer_size; unsigned long expire; int bufcount = 1; int again = 0; long wait_rv; /* mutex already locked */ *transferred = done; max_transfer_size = transfer_size; if (flags & USBTMC_FLAG_IGNORE_TRAILER) { /* The device may send extra alignment bytes (up to * wMaxPacketSize – 1) to avoid sending a zero-length * packet */ remaining = transfer_size; if ((max_transfer_size % data->wMaxPacketSize) == 0) max_transfer_size += (data->wMaxPacketSize - 1); } else { /* round down to bufsize to avoid truncated data left */ if (max_transfer_size > bufsize) { max_transfer_size = roundup(max_transfer_size + 1 - bufsize, bufsize); } remaining = max_transfer_size; } spin_lock_irq(&file_data->err_lock); if (file_data->in_status) { /* return the very first error */ retval = file_data->in_status; spin_unlock_irq(&file_data->err_lock); goto error; } if (flags & USBTMC_FLAG_ASYNC) { if (usb_anchor_empty(&file_data->in_anchor)) again = 1; if (file_data->in_urbs_used == 0) { file_data->in_transfer_size = 0; file_data->in_status = 0; } } else { file_data->in_transfer_size = 0; file_data->in_status = 0; } if (max_transfer_size == 0) { bufcount = 0; } else { bufcount = roundup(max_transfer_size, bufsize) / bufsize; if (bufcount > file_data->in_urbs_used) bufcount -= file_data->in_urbs_used; else bufcount = 0; if (bufcount + file_data->in_urbs_used > MAX_URBS_IN_FLIGHT) { bufcount = MAX_URBS_IN_FLIGHT - file_data->in_urbs_used; } } spin_unlock_irq(&file_data->err_lock); dev_dbg(dev, "%s: requested=%u flags=0x%X size=%u bufs=%d used=%d\n", __func__, transfer_size, flags, max_transfer_size, bufcount, file_data->in_urbs_used); while (bufcount > 0) { u8 *dmabuf = NULL; struct urb *urb = usbtmc_create_urb(); if (!urb) { retval = -ENOMEM; goto error; } dmabuf = urb->transfer_buffer; usb_fill_bulk_urb(urb, data->usb_dev, usb_rcvbulkpipe(data->usb_dev, data->bulk_in), dmabuf, bufsize, usbtmc_read_bulk_cb, file_data); usb_anchor_urb(urb, &file_data->submitted); retval = usb_submit_urb(urb, GFP_KERNEL); /* urb is anchored. We can release our reference. */ usb_free_urb(urb); if (unlikely(retval)) { usb_unanchor_urb(urb); goto error; } file_data->in_urbs_used++; bufcount--; } if (again) { dev_dbg(dev, "%s: ret=again\n", __func__); return -EAGAIN; } if (user_buffer == NULL) return -EINVAL; expire = msecs_to_jiffies(file_data->timeout); while (max_transfer_size > 0) { u32 this_part; struct urb *urb = NULL; if (!(flags & USBTMC_FLAG_ASYNC)) { dev_dbg(dev, "%s: before wait time %lu\n", __func__, expire); wait_rv = wait_event_interruptible_timeout( file_data->wait_bulk_in, usbtmc_do_transfer(file_data), expire); dev_dbg(dev, "%s: wait returned %ld\n", __func__, wait_rv); if (wait_rv < 0) { retval = wait_rv; goto error; } if (wait_rv == 0) { retval = -ETIMEDOUT; goto error; } } urb = usb_get_from_anchor(&file_data->in_anchor); if (!urb) { if (!(flags & USBTMC_FLAG_ASYNC)) { /* synchronous case: must not happen */ retval = -EFAULT; goto error; } /* asynchronous case: ready, do not block or wait */ *transferred = done; dev_dbg(dev, "%s: (async) done=%u ret=0\n", __func__, done); return 0; } file_data->in_urbs_used--; if (max_transfer_size > urb->actual_length) max_transfer_size -= urb->actual_length; else max_transfer_size = 0; if (remaining > urb->actual_length) this_part = urb->actual_length; else this_part = remaining; print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, urb->transfer_buffer, urb->actual_length, true); if (copy_to_user(user_buffer + done, urb->transfer_buffer, this_part)) { usb_free_urb(urb); retval = -EFAULT; goto error; } remaining -= this_part; done += this_part; spin_lock_irq(&file_data->err_lock); if (urb->status) { /* return the very first error */ retval = file_data->in_status; spin_unlock_irq(&file_data->err_lock); usb_free_urb(urb); goto error; } spin_unlock_irq(&file_data->err_lock); if (urb->actual_length < bufsize) { /* short packet or ZLP received => ready */ usb_free_urb(urb); retval = 1; break; } if (!(flags & USBTMC_FLAG_ASYNC) && max_transfer_size > (bufsize * file_data->in_urbs_used)) { /* resubmit, since other buffers still not enough */ usb_anchor_urb(urb, &file_data->submitted); retval = usb_submit_urb(urb, GFP_KERNEL); if (unlikely(retval)) { usb_unanchor_urb(urb); usb_free_urb(urb); goto error; } file_data->in_urbs_used++; } usb_free_urb(urb); retval = 0; } error: *transferred = done; dev_dbg(dev, "%s: before kill\n", __func__); /* Attention: killing urbs can take long time (2 ms) */ usb_kill_anchored_urbs(&file_data->submitted); dev_dbg(dev, "%s: after kill\n", __func__); usb_scuttle_anchored_urbs(&file_data->in_anchor); file_data->in_urbs_used = 0; file_data->in_status = 0; /* no spinlock needed here */ dev_dbg(dev, "%s: done=%u ret=%d\n", __func__, done, retval); return retval; } static ssize_t usbtmc_ioctl_generic_read(struct usbtmc_file_data *file_data, void __user *arg) { struct usbtmc_message msg; ssize_t retval = 0; /* mutex already locked */ if (copy_from_user(&msg, arg, sizeof(struct usbtmc_message))) return -EFAULT; retval = usbtmc_generic_read(file_data, msg.message, msg.transfer_size, &msg.transferred, msg.flags); if (put_user(msg.transferred, &((struct usbtmc_message __user *)arg)->transferred)) return -EFAULT; return retval; } static void usbtmc_write_bulk_cb(struct urb *urb) { struct usbtmc_file_data *file_data = urb->context; int wakeup = 0; unsigned long flags; spin_lock_irqsave(&file_data->err_lock, flags); file_data->out_transfer_size += urb->actual_length; /* sync/async unlink faults aren't errors */ if (urb->status) { if (!(urb->status == -ENOENT || urb->status == -ECONNRESET || urb->status == -ESHUTDOWN)) dev_err(&file_data->data->intf->dev, "%s - nonzero write bulk status received: %d\n", __func__, urb->status); if (!file_data->out_status) { file_data->out_status = urb->status; wakeup = 1; } } spin_unlock_irqrestore(&file_data->err_lock, flags); dev_dbg(&file_data->data->intf->dev, "%s - write bulk total size: %u\n", __func__, file_data->out_transfer_size); up(&file_data->limit_write_sem); if (usb_anchor_empty(&file_data->submitted) || wakeup) wake_up_interruptible(&file_data->data->waitq); } static ssize_t usbtmc_generic_write(struct usbtmc_file_data *file_data, const void __user *user_buffer, u32 transfer_size, u32 *transferred, u32 flags) { struct usbtmc_device_data *data = file_data->data; struct device *dev; u32 done = 0; u32 remaining; unsigned long expire; const u32 bufsize = USBTMC_BUFSIZE; struct urb *urb = NULL; int retval = 0; u32 timeout; *transferred = 0; /* Get pointer to private data structure */ dev = &data->intf->dev; dev_dbg(dev, "%s: size=%u flags=0x%X sema=%u\n", __func__, transfer_size, flags, file_data->limit_write_sem.count); if (flags & USBTMC_FLAG_APPEND) { spin_lock_irq(&file_data->err_lock); retval = file_data->out_status; spin_unlock_irq(&file_data->err_lock); if (retval < 0) return retval; } else { spin_lock_irq(&file_data->err_lock); file_data->out_transfer_size = 0; file_data->out_status = 0; spin_unlock_irq(&file_data->err_lock); } remaining = transfer_size; if (remaining > INT_MAX) remaining = INT_MAX; timeout = file_data->timeout; expire = msecs_to_jiffies(timeout); while (remaining > 0) { u32 this_part, aligned; u8 *buffer = NULL; if (flags & USBTMC_FLAG_ASYNC) { if (down_trylock(&file_data->limit_write_sem)) { retval = (done)?(0):(-EAGAIN); goto exit; } } else { retval = down_timeout(&file_data->limit_write_sem, expire); if (retval < 0) { retval = -ETIMEDOUT; goto error; } } spin_lock_irq(&file_data->err_lock); retval = file_data->out_status; spin_unlock_irq(&file_data->err_lock); if (retval < 0) { up(&file_data->limit_write_sem); goto error; } /* prepare next urb to send */ urb = usbtmc_create_urb(); if (!urb) { retval = -ENOMEM; up(&file_data->limit_write_sem); goto error; } buffer = urb->transfer_buffer; if (remaining > bufsize) this_part = bufsize; else this_part = remaining; if (copy_from_user(buffer, user_buffer + done, this_part)) { retval = -EFAULT; up(&file_data->limit_write_sem); goto error; } print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, buffer, this_part, true); /* fill bulk with 32 bit alignment to meet USBTMC specification * (size + 3 & ~3) rounds up and simplifies user code */ aligned = (this_part + 3) & ~3; dev_dbg(dev, "write(size:%u align:%u done:%u)\n", (unsigned int)this_part, (unsigned int)aligned, (unsigned int)done); usb_fill_bulk_urb(urb, data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out), urb->transfer_buffer, aligned, usbtmc_write_bulk_cb, file_data); usb_anchor_urb(urb, &file_data->submitted); retval = usb_submit_urb(urb, GFP_KERNEL); if (unlikely(retval)) { usb_unanchor_urb(urb); up(&file_data->limit_write_sem); goto error; } usb_free_urb(urb); urb = NULL; /* urb will be finally released by usb driver */ remaining -= this_part; done += this_part; } /* All urbs are on the fly */ if (!(flags & USBTMC_FLAG_ASYNC)) { if (!usb_wait_anchor_empty_timeout(&file_data->submitted, timeout)) { retval = -ETIMEDOUT; goto error; } } retval = 0; goto exit; error: usb_kill_anchored_urbs(&file_data->submitted); exit: usb_free_urb(urb); spin_lock_irq(&file_data->err_lock); if (!(flags & USBTMC_FLAG_ASYNC)) done = file_data->out_transfer_size; if (!retval && file_data->out_status) retval = file_data->out_status; spin_unlock_irq(&file_data->err_lock); *transferred = done; dev_dbg(dev, "%s: done=%u, retval=%d, urbstat=%d\n", __func__, done, retval, file_data->out_status); return retval; } static ssize_t usbtmc_ioctl_generic_write(struct usbtmc_file_data *file_data, void __user *arg) { struct usbtmc_message msg; ssize_t retval = 0; /* mutex already locked */ if (copy_from_user(&msg, arg, sizeof(struct usbtmc_message))) return -EFAULT; retval = usbtmc_generic_write(file_data, msg.message, msg.transfer_size, &msg.transferred, msg.flags); if (put_user(msg.transferred, &((struct usbtmc_message __user *)arg)->transferred)) return -EFAULT; return retval; } /* * Get the generic write result */ static ssize_t usbtmc_ioctl_write_result(struct usbtmc_file_data *file_data, void __user *arg) { u32 transferred; int retval; spin_lock_irq(&file_data->err_lock); transferred = file_data->out_transfer_size; retval = file_data->out_status; spin_unlock_irq(&file_data->err_lock); if (put_user(transferred, (__u32 __user *)arg)) return -EFAULT; return retval; } /* * Sends a REQUEST_DEV_DEP_MSG_IN message on the Bulk-OUT endpoint. * @transfer_size: number of bytes to request from the device. * * See the USBTMC specification, Table 4. * * Also updates bTag_last_write. */ static int send_request_dev_dep_msg_in(struct usbtmc_file_data *file_data, u32 transfer_size) { struct usbtmc_device_data *data = file_data->data; int retval; u8 *buffer; int actual; buffer = kmalloc(USBTMC_HEADER_SIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; /* Setup IO buffer for REQUEST_DEV_DEP_MSG_IN message * Refer to class specs for details */ buffer[0] = 2; buffer[1] = data->bTag; buffer[2] = ~data->bTag; buffer[3] = 0; /* Reserved */ buffer[4] = transfer_size >> 0; buffer[5] = transfer_size >> 8; buffer[6] = transfer_size >> 16; buffer[7] = transfer_size >> 24; buffer[8] = file_data->term_char_enabled * 2; /* Use term character? */ buffer[9] = file_data->term_char; buffer[10] = 0; /* Reserved */ buffer[11] = 0; /* Reserved */ /* Send bulk URB */ retval = usb_bulk_msg(data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out), buffer, USBTMC_HEADER_SIZE, &actual, file_data->timeout); /* Store bTag (in case we need to abort) */ data->bTag_last_write = data->bTag; /* Increment bTag -- and increment again if zero */ data->bTag++; if (!data->bTag) data->bTag++; kfree(buffer); if (retval < 0) dev_err(&data->intf->dev, "%s returned %d\n", __func__, retval); return retval; } static ssize_t usbtmc_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct usbtmc_file_data *file_data; struct usbtmc_device_data *data; struct device *dev; const u32 bufsize = USBTMC_BUFSIZE; u32 n_characters; u8 *buffer; int actual; u32 done = 0; u32 remaining; int retval; /* Get pointer to private data structure */ file_data = filp->private_data; data = file_data->data; dev = &data->intf->dev; buffer = kmalloc(bufsize, GFP_KERNEL); if (!buffer) return -ENOMEM; retval = mutex_lock_interruptible(&data->io_mutex); if (retval < 0) goto exit_nolock; if (data->zombie) { retval = -ENODEV; goto exit; } if (count > INT_MAX) count = INT_MAX; dev_dbg(dev, "%s(count:%zu)\n", __func__, count); retval = send_request_dev_dep_msg_in(file_data, count); if (retval < 0) { if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_out(data); goto exit; } /* Loop until we have fetched everything we requested */ remaining = count; actual = 0; /* Send bulk URB */ retval = usb_bulk_msg(data->usb_dev, usb_rcvbulkpipe(data->usb_dev, data->bulk_in), buffer, bufsize, &actual, file_data->timeout); dev_dbg(dev, "%s: bulk_msg retval(%u), actual(%d)\n", __func__, retval, actual); /* Store bTag (in case we need to abort) */ data->bTag_last_read = data->bTag; if (retval < 0) { if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_in(data); goto exit; } /* Sanity checks for the header */ if (actual < USBTMC_HEADER_SIZE) { dev_err(dev, "Device sent too small first packet: %u < %u\n", actual, USBTMC_HEADER_SIZE); if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_in(data); goto exit; } if (buffer[0] != 2) { dev_err(dev, "Device sent reply with wrong MsgID: %u != 2\n", buffer[0]); if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_in(data); goto exit; } if (buffer[1] != data->bTag_last_write) { dev_err(dev, "Device sent reply with wrong bTag: %u != %u\n", buffer[1], data->bTag_last_write); if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_in(data); goto exit; } /* How many characters did the instrument send? */ n_characters = buffer[4] + (buffer[5] << 8) + (buffer[6] << 16) + (buffer[7] << 24); file_data->bmTransferAttributes = buffer[8]; dev_dbg(dev, "Bulk-IN header: N_characters(%u), bTransAttr(%u)\n", n_characters, buffer[8]); if (n_characters > remaining) { dev_err(dev, "Device wants to return more data than requested: %u > %zu\n", n_characters, count); if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_in(data); goto exit; } print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, buffer, actual, true); remaining = n_characters; /* Remove the USBTMC header */ actual -= USBTMC_HEADER_SIZE; /* Remove padding if it exists */ if (actual > remaining) actual = remaining; remaining -= actual; /* Copy buffer to user space */ if (copy_to_user(buf, &buffer[USBTMC_HEADER_SIZE], actual)) { /* There must have been an addressing problem */ retval = -EFAULT; goto exit; } if ((actual + USBTMC_HEADER_SIZE) == bufsize) { retval = usbtmc_generic_read(file_data, buf + actual, remaining, &done, USBTMC_FLAG_IGNORE_TRAILER); if (retval < 0) goto exit; } done += actual; /* Update file position value */ *f_pos = *f_pos + done; retval = done; exit: mutex_unlock(&data->io_mutex); exit_nolock: kfree(buffer); return retval; } static ssize_t usbtmc_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) { struct usbtmc_file_data *file_data; struct usbtmc_device_data *data; struct urb *urb = NULL; ssize_t retval = 0; u8 *buffer; u32 remaining, done; u32 transfersize, aligned, buflen; file_data = filp->private_data; data = file_data->data; mutex_lock(&data->io_mutex); if (data->zombie) { retval = -ENODEV; goto exit; } done = 0; spin_lock_irq(&file_data->err_lock); file_data->out_transfer_size = 0; file_data->out_status = 0; spin_unlock_irq(&file_data->err_lock); if (!count) goto exit; if (down_trylock(&file_data->limit_write_sem)) { /* previous calls were async */ retval = -EBUSY; goto exit; } urb = usbtmc_create_urb(); if (!urb) { retval = -ENOMEM; up(&file_data->limit_write_sem); goto exit; } buffer = urb->transfer_buffer; buflen = urb->transfer_buffer_length; if (count > INT_MAX) { transfersize = INT_MAX; buffer[8] = 0; } else { transfersize = count; buffer[8] = file_data->eom_val; } /* Setup IO buffer for DEV_DEP_MSG_OUT message */ buffer[0] = 1; buffer[1] = data->bTag; buffer[2] = ~data->bTag; buffer[3] = 0; /* Reserved */ buffer[4] = transfersize >> 0; buffer[5] = transfersize >> 8; buffer[6] = transfersize >> 16; buffer[7] = transfersize >> 24; /* buffer[8] is set above... */ buffer[9] = 0; /* Reserved */ buffer[10] = 0; /* Reserved */ buffer[11] = 0; /* Reserved */ remaining = transfersize; if (transfersize + USBTMC_HEADER_SIZE > buflen) { transfersize = buflen - USBTMC_HEADER_SIZE; aligned = buflen; } else { aligned = (transfersize + (USBTMC_HEADER_SIZE + 3)) & ~3; } if (copy_from_user(&buffer[USBTMC_HEADER_SIZE], buf, transfersize)) { retval = -EFAULT; up(&file_data->limit_write_sem); goto exit; } dev_dbg(&data->intf->dev, "%s(size:%u align:%u)\n", __func__, (unsigned int)transfersize, (unsigned int)aligned); print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, buffer, aligned, true); usb_fill_bulk_urb(urb, data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out), urb->transfer_buffer, aligned, usbtmc_write_bulk_cb, file_data); usb_anchor_urb(urb, &file_data->submitted); retval = usb_submit_urb(urb, GFP_KERNEL); if (unlikely(retval)) { usb_unanchor_urb(urb); up(&file_data->limit_write_sem); goto exit; } remaining -= transfersize; data->bTag_last_write = data->bTag; data->bTag++; if (!data->bTag) data->bTag++; /* call generic_write even when remaining = 0 */ retval = usbtmc_generic_write(file_data, buf + transfersize, remaining, &done, USBTMC_FLAG_APPEND); /* truncate alignment bytes */ if (done > remaining) done = remaining; /*add size of first urb*/ done += transfersize; if (retval < 0) { usb_kill_anchored_urbs(&file_data->submitted); dev_err(&data->intf->dev, "Unable to send data, error %d\n", (int)retval); if (file_data->auto_abort) usbtmc_ioctl_abort_bulk_out(data); goto exit; } retval = done; exit: usb_free_urb(urb); mutex_unlock(&data->io_mutex); return retval; } static int usbtmc_ioctl_clear(struct usbtmc_device_data *data) { struct device *dev; u8 *buffer; int rv; int n; int actual = 0; dev = &data->intf->dev; dev_dbg(dev, "Sending INITIATE_CLEAR request\n"); buffer = kmalloc(USBTMC_BUFSIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_INITIATE_CLEAR, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, 0, buffer, 1, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "INITIATE_CLEAR returned %x\n", buffer[0]); if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "INITIATE_CLEAR returned %x\n", buffer[0]); rv = -EPERM; goto exit; } n = 0; usbtmc_clear_check_status: dev_dbg(dev, "Sending CHECK_CLEAR_STATUS request\n"); rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_CHECK_CLEAR_STATUS, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, 0, buffer, 2, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "CHECK_CLEAR_STATUS returned %x\n", buffer[0]); if (buffer[0] == USBTMC_STATUS_SUCCESS) goto usbtmc_clear_bulk_out_halt; if (buffer[0] != USBTMC_STATUS_PENDING) { dev_err(dev, "CHECK_CLEAR_STATUS returned %x\n", buffer[0]); rv = -EPERM; goto exit; } if ((buffer[1] & 1) != 0) { do { dev_dbg(dev, "Reading from bulk in EP\n"); actual = 0; rv = usb_bulk_msg(data->usb_dev, usb_rcvbulkpipe(data->usb_dev, data->bulk_in), buffer, USBTMC_BUFSIZE, &actual, USB_CTRL_GET_TIMEOUT); print_hex_dump_debug("usbtmc ", DUMP_PREFIX_NONE, 16, 1, buffer, actual, true); n++; if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } } while ((actual == USBTMC_BUFSIZE) && (n < USBTMC_MAX_READS_TO_CLEAR_BULK_IN)); } else { /* do not stress device with subsequent requests */ msleep(50); n++; } if (n >= USBTMC_MAX_READS_TO_CLEAR_BULK_IN) { dev_err(dev, "Couldn't clear device buffer within %d cycles\n", USBTMC_MAX_READS_TO_CLEAR_BULK_IN); rv = -EPERM; goto exit; } goto usbtmc_clear_check_status; usbtmc_clear_bulk_out_halt: rv = usb_clear_halt(data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out)); if (rv < 0) { dev_err(dev, "usb_clear_halt returned %d\n", rv); goto exit; } rv = 0; exit: kfree(buffer); return rv; } static int usbtmc_ioctl_clear_out_halt(struct usbtmc_device_data *data) { int rv; rv = usb_clear_halt(data->usb_dev, usb_sndbulkpipe(data->usb_dev, data->bulk_out)); if (rv < 0) dev_err(&data->usb_dev->dev, "%s returned %d\n", __func__, rv); return rv; } static int usbtmc_ioctl_clear_in_halt(struct usbtmc_device_data *data) { int rv; rv = usb_clear_halt(data->usb_dev, usb_rcvbulkpipe(data->usb_dev, data->bulk_in)); if (rv < 0) dev_err(&data->usb_dev->dev, "%s returned %d\n", __func__, rv); return rv; } static int usbtmc_ioctl_cancel_io(struct usbtmc_file_data *file_data) { spin_lock_irq(&file_data->err_lock); file_data->in_status = -ECANCELED; file_data->out_status = -ECANCELED; spin_unlock_irq(&file_data->err_lock); usb_kill_anchored_urbs(&file_data->submitted); return 0; } static int usbtmc_ioctl_cleanup_io(struct usbtmc_file_data *file_data) { usb_kill_anchored_urbs(&file_data->submitted); usb_scuttle_anchored_urbs(&file_data->in_anchor); spin_lock_irq(&file_data->err_lock); file_data->in_status = 0; file_data->in_transfer_size = 0; file_data->out_status = 0; file_data->out_transfer_size = 0; spin_unlock_irq(&file_data->err_lock); file_data->in_urbs_used = 0; return 0; } static int get_capabilities(struct usbtmc_device_data *data) { struct device *dev = &data->usb_dev->dev; char *buffer; int rv = 0; buffer = kmalloc(0x18, GFP_KERNEL); if (!buffer) return -ENOMEM; rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_GET_CAPABILITIES, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, 0, buffer, 0x18, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto err_out; } dev_dbg(dev, "GET_CAPABILITIES returned %x\n", buffer[0]); if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "GET_CAPABILITIES returned %x\n", buffer[0]); rv = -EPERM; goto err_out; } dev_dbg(dev, "Interface capabilities are %x\n", buffer[4]); dev_dbg(dev, "Device capabilities are %x\n", buffer[5]); dev_dbg(dev, "USB488 interface capabilities are %x\n", buffer[14]); dev_dbg(dev, "USB488 device capabilities are %x\n", buffer[15]); data->capabilities.interface_capabilities = buffer[4]; data->capabilities.device_capabilities = buffer[5]; data->capabilities.usb488_interface_capabilities = buffer[14]; data->capabilities.usb488_device_capabilities = buffer[15]; data->usb488_caps = (buffer[14] & 0x07) | ((buffer[15] & 0x0f) << 4); rv = 0; err_out: kfree(buffer); return rv; } #define capability_attribute(name) \ static ssize_t name##_show(struct device *dev, \ struct device_attribute *attr, char *buf) \ { \ struct usb_interface *intf = to_usb_interface(dev); \ struct usbtmc_device_data *data = usb_get_intfdata(intf); \ \ return sprintf(buf, "%d\n", data->capabilities.name); \ } \ static DEVICE_ATTR_RO(name) capability_attribute(interface_capabilities); capability_attribute(device_capabilities); capability_attribute(usb488_interface_capabilities); capability_attribute(usb488_device_capabilities); static struct attribute *usbtmc_attrs[] = { &dev_attr_interface_capabilities.attr, &dev_attr_device_capabilities.attr, &dev_attr_usb488_interface_capabilities.attr, &dev_attr_usb488_device_capabilities.attr, NULL, }; ATTRIBUTE_GROUPS(usbtmc); static int usbtmc_ioctl_indicator_pulse(struct usbtmc_device_data *data) { struct device *dev; u8 *buffer; int rv; dev = &data->intf->dev; buffer = kmalloc(2, GFP_KERNEL); if (!buffer) return -ENOMEM; rv = usb_control_msg(data->usb_dev, usb_rcvctrlpipe(data->usb_dev, 0), USBTMC_REQUEST_INDICATOR_PULSE, USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, 0, buffer, 0x01, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "usb_control_msg returned %d\n", rv); goto exit; } dev_dbg(dev, "INDICATOR_PULSE returned %x\n", buffer[0]); if (buffer[0] != USBTMC_STATUS_SUCCESS) { dev_err(dev, "INDICATOR_PULSE returned %x\n", buffer[0]); rv = -EPERM; goto exit; } rv = 0; exit: kfree(buffer); return rv; } static int usbtmc_ioctl_request(struct usbtmc_device_data *data, void __user *arg) { struct device *dev = &data->intf->dev; struct usbtmc_ctrlrequest request; u8 *buffer = NULL; int rv; unsigned int is_in, pipe; unsigned long res; res = copy_from_user(&request, arg, sizeof(struct usbtmc_ctrlrequest)); if (res) return -EFAULT; if (request.req.wLength > USBTMC_BUFSIZE) return -EMSGSIZE; if (request.req.wLength == 0) /* Length-0 requests are never IN */ request.req.bRequestType &= ~USB_DIR_IN; is_in = request.req.bRequestType & USB_DIR_IN; if (request.req.wLength) { buffer = kmalloc(request.req.wLength, GFP_KERNEL); if (!buffer) return -ENOMEM; if (!is_in) { /* Send control data to device */ res = copy_from_user(buffer, request.data, request.req.wLength); if (res) { rv = -EFAULT; goto exit; } } } if (is_in) pipe = usb_rcvctrlpipe(data->usb_dev, 0); else pipe = usb_sndctrlpipe(data->usb_dev, 0); rv = usb_control_msg(data->usb_dev, pipe, request.req.bRequest, request.req.bRequestType, request.req.wValue, request.req.wIndex, buffer, request.req.wLength, USB_CTRL_GET_TIMEOUT); if (rv < 0) { dev_err(dev, "%s failed %d\n", __func__, rv); goto exit; } if (rv && is_in) { /* Read control data from device */ res = copy_to_user(request.data, buffer, rv); if (res) rv = -EFAULT; } exit: kfree(buffer); return rv; } /* * Get the usb timeout value */ static int usbtmc_ioctl_get_timeout(struct usbtmc_file_data *file_data, void __user *arg) { u32 timeout; timeout = file_data->timeout; return put_user(timeout, (__u32 __user *)arg); } /* * Set the usb timeout value */ static int usbtmc_ioctl_set_timeout(struct usbtmc_file_data *file_data, void __user *arg) { u32 timeout; if (get_user(timeout, (__u32 __user *)arg)) return -EFAULT; /* Note that timeout = 0 means * MAX_SCHEDULE_TIMEOUT in usb_control_msg */ if (timeout < USBTMC_MIN_TIMEOUT) return -EINVAL; file_data->timeout = timeout; return 0; } /* * enables/disables sending EOM on write */ static int usbtmc_ioctl_eom_enable(struct usbtmc_file_data *file_data, void __user *arg) { u8 eom_enable; if (copy_from_user(&eom_enable, arg, sizeof(eom_enable))) return -EFAULT; if (eom_enable > 1) return -EINVAL; file_data->eom_val = eom_enable; return 0; } /* * Configure termination character for read() */ static int usbtmc_ioctl_config_termc(struct usbtmc_file_data *file_data, void __user *arg) { struct usbtmc_termchar termc; if (copy_from_user(&termc, arg, sizeof(termc))) return -EFAULT; if ((termc.term_char_enabled > 1) || (termc.term_char_enabled && !(file_data->data->capabilities.device_capabilities & 1))) return -EINVAL; file_data->term_char = termc.term_char; file_data->term_char_enabled = termc.term_char_enabled; return 0; } static long usbtmc_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct usbtmc_file_data *file_data; struct usbtmc_device_data *data; int retval = -EBADRQC; __u8 tmp_byte; file_data = file->private_data; data = file_data->data; mutex_lock(&data->io_mutex); if (data->zombie) { retval = -ENODEV; goto skip_io_on_zombie; } switch (cmd) { case USBTMC_IOCTL_CLEAR_OUT_HALT: retval = usbtmc_ioctl_clear_out_halt(data); break; case USBTMC_IOCTL_CLEAR_IN_HALT: retval = usbtmc_ioctl_clear_in_halt(data); break; case USBTMC_IOCTL_INDICATOR_PULSE: retval = usbtmc_ioctl_indicator_pulse(data); break; case USBTMC_IOCTL_CLEAR: retval = usbtmc_ioctl_clear(data); break; case USBTMC_IOCTL_ABORT_BULK_OUT: retval = usbtmc_ioctl_abort_bulk_out(data); break; case USBTMC_IOCTL_ABORT_BULK_IN: retval = usbtmc_ioctl_abort_bulk_in(data); break; case USBTMC_IOCTL_CTRL_REQUEST: retval = usbtmc_ioctl_request(data, (void __user *)arg); break; case USBTMC_IOCTL_GET_TIMEOUT: retval = usbtmc_ioctl_get_timeout(file_data, (void __user *)arg); break; case USBTMC_IOCTL_SET_TIMEOUT: retval = usbtmc_ioctl_set_timeout(file_data, (void __user *)arg); break; case USBTMC_IOCTL_EOM_ENABLE: retval = usbtmc_ioctl_eom_enable(file_data, (void __user *)arg); break; case USBTMC_IOCTL_CONFIG_TERMCHAR: retval = usbtmc_ioctl_config_termc(file_data, (void __user *)arg); break; case USBTMC_IOCTL_WRITE: retval = usbtmc_ioctl_generic_write(file_data, (void __user *)arg); break; case USBTMC_IOCTL_READ: retval = usbtmc_ioctl_generic_read(file_data, (void __user *)arg); break; case USBTMC_IOCTL_WRITE_RESULT: retval = usbtmc_ioctl_write_result(file_data, (void __user *)arg); break; case USBTMC_IOCTL_API_VERSION: retval = put_user(USBTMC_API_VERSION, (__u32 __user *)arg); break; case USBTMC488_IOCTL_GET_CAPS: retval = put_user(data->usb488_caps, (unsigned char __user *)arg); break; case USBTMC488_IOCTL_READ_STB: retval = usbtmc488_ioctl_read_stb(file_data, (void __user *)arg); break; case USBTMC488_IOCTL_REN_CONTROL: retval = usbtmc488_ioctl_simple(data, (void __user *)arg, USBTMC488_REQUEST_REN_CONTROL); break; case USBTMC488_IOCTL_GOTO_LOCAL: retval = usbtmc488_ioctl_simple(data, (void __user *)arg, USBTMC488_REQUEST_GOTO_LOCAL); break; case USBTMC488_IOCTL_LOCAL_LOCKOUT: retval = usbtmc488_ioctl_simple(data, (void __user *)arg, USBTMC488_REQUEST_LOCAL_LOCKOUT); break; case USBTMC488_IOCTL_TRIGGER: retval = usbtmc488_ioctl_trigger(file_data); break; case USBTMC488_IOCTL_WAIT_SRQ: retval = usbtmc488_ioctl_wait_srq(file_data, (__u32 __user *)arg); break; case USBTMC_IOCTL_MSG_IN_ATTR: retval = put_user(file_data->bmTransferAttributes, (__u8 __user *)arg); break; case USBTMC_IOCTL_AUTO_ABORT: retval = get_user(tmp_byte, (unsigned char __user *)arg); if (retval == 0) file_data->auto_abort = !!tmp_byte; break; case USBTMC_IOCTL_GET_STB: retval = usbtmc_get_stb(file_data, &tmp_byte); if (!retval) retval = put_user(tmp_byte, (__u8 __user *)arg); break; case USBTMC_IOCTL_GET_SRQ_STB: retval = usbtmc_ioctl_get_srq_stb(file_data, (void __user *)arg); break; case USBTMC_IOCTL_CANCEL_IO: retval = usbtmc_ioctl_cancel_io(file_data); break; case USBTMC_IOCTL_CLEANUP_IO: retval = usbtmc_ioctl_cleanup_io(file_data); break; } skip_io_on_zombie: mutex_unlock(&data->io_mutex); return retval; } static int usbtmc_fasync(int fd, struct file *file, int on) { struct usbtmc_file_data *file_data = file->private_data; return fasync_helper(fd, file, on, &file_data->data->fasync); } static __poll_t usbtmc_poll(struct file *file, poll_table *wait) { struct usbtmc_file_data *file_data = file->private_data; struct usbtmc_device_data *data = file_data->data; __poll_t mask; mutex_lock(&data->io_mutex); if (data->zombie) { mask = EPOLLHUP | EPOLLERR; goto no_poll; } poll_wait(file, &data->waitq, wait); /* Note that EPOLLPRI is now assigned to SRQ, and * EPOLLIN|EPOLLRDNORM to normal read data. */ mask = 0; if (atomic_read(&file_data->srq_asserted)) mask |= EPOLLPRI; /* Note that the anchor submitted includes all urbs for BULK IN * and OUT. So EPOLLOUT is signaled when BULK OUT is empty and * all BULK IN urbs are completed and moved to in_anchor. */ if (usb_anchor_empty(&file_data->submitted)) mask |= (EPOLLOUT | EPOLLWRNORM); if (!usb_anchor_empty(&file_data->in_anchor)) mask |= (EPOLLIN | EPOLLRDNORM); spin_lock_irq(&file_data->err_lock); if (file_data->in_status || file_data->out_status) mask |= EPOLLERR; spin_unlock_irq(&file_data->err_lock); dev_dbg(&data->intf->dev, "poll mask = %x\n", mask); no_poll: mutex_unlock(&data->io_mutex); return mask; } static const struct file_operations fops = { .owner = THIS_MODULE, .read = usbtmc_read, .write = usbtmc_write, .open = usbtmc_open, .release = usbtmc_release, .flush = usbtmc_flush, .unlocked_ioctl = usbtmc_ioctl, .compat_ioctl = compat_ptr_ioctl, .fasync = usbtmc_fasync, .poll = usbtmc_poll, .llseek = default_llseek, }; static struct usb_class_driver usbtmc_class = { .name = "usbtmc%d", .fops = &fops, .minor_base = USBTMC_MINOR_BASE, }; static void usbtmc_interrupt(struct urb *urb) { struct usbtmc_device_data *data = urb->context; struct device *dev = &data->intf->dev; int status = urb->status; int rv; dev_dbg(&data->intf->dev, "int status: %d len %d\n", status, urb->actual_length); switch (status) { case 0: /* SUCCESS */ /* check for valid STB notification */ if (data->iin_buffer[0] > 0x81) { data->bNotify1 = data->iin_buffer[0]; data->bNotify2 = data->iin_buffer[1]; atomic_set(&data->iin_data_valid, 1); wake_up_interruptible(&data->waitq); goto exit; } /* check for SRQ notification */ if (data->iin_buffer[0] == 0x81) { unsigned long flags; struct list_head *elem; if (data->fasync) kill_fasync(&data->fasync, SIGIO, POLL_PRI); spin_lock_irqsave(&data->dev_lock, flags); list_for_each(elem, &data->file_list) { struct usbtmc_file_data *file_data; file_data = list_entry(elem, struct usbtmc_file_data, file_elem); file_data->srq_byte = data->iin_buffer[1]; atomic_set(&file_data->srq_asserted, 1); } spin_unlock_irqrestore(&data->dev_lock, flags); dev_dbg(dev, "srq received bTag %x stb %x\n", (unsigned int)data->iin_buffer[0], (unsigned int)data->iin_buffer[1]); wake_up_interruptible_all(&data->waitq); goto exit; } dev_warn(dev, "invalid notification: %x\n", data->iin_buffer[0]); break; case -EOVERFLOW: dev_err(dev, "overflow with length %d, actual length is %d\n", data->iin_wMaxPacketSize, urb->actual_length); fallthrough; default: /* urb terminated, clean up */ dev_dbg(dev, "urb terminated, status: %d\n", status); return; } exit: rv = usb_submit_urb(urb, GFP_ATOMIC); if (rv) dev_err(dev, "usb_submit_urb failed: %d\n", rv); } static void usbtmc_free_int(struct usbtmc_device_data *data) { if (!data->iin_ep_present || !data->iin_urb) return; usb_kill_urb(data->iin_urb); kfree(data->iin_buffer); data->iin_buffer = NULL; usb_free_urb(data->iin_urb); data->iin_urb = NULL; kref_put(&data->kref, usbtmc_delete); } static int usbtmc_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usbtmc_device_data *data; struct usb_host_interface *iface_desc; struct usb_endpoint_descriptor *bulk_in, *bulk_out, *int_in; int retcode; dev_dbg(&intf->dev, "%s called\n", __func__); data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; data->intf = intf; data->id = id; data->usb_dev = usb_get_dev(interface_to_usbdev(intf)); usb_set_intfdata(intf, data); kref_init(&data->kref); mutex_init(&data->io_mutex); init_waitqueue_head(&data->waitq); atomic_set(&data->iin_data_valid, 0); INIT_LIST_HEAD(&data->file_list); spin_lock_init(&data->dev_lock); data->zombie = 0; /* Initialize USBTMC bTag and other fields */ data->bTag = 1; /* 2 <= bTag <= 127 USBTMC-USB488 subclass specification 4.3.1 */ data->iin_bTag = 2; /* USBTMC devices have only one setting, so use that */ iface_desc = data->intf->cur_altsetting; data->ifnum = iface_desc->desc.bInterfaceNumber; /* Find bulk endpoints */ retcode = usb_find_common_endpoints(iface_desc, &bulk_in, &bulk_out, NULL, NULL); if (retcode) { dev_err(&intf->dev, "bulk endpoints not found\n"); goto err_put; } retcode = -EINVAL; data->bulk_in = bulk_in->bEndpointAddress; data->wMaxPacketSize = usb_endpoint_maxp(bulk_in); if (!data->wMaxPacketSize) goto err_put; dev_dbg(&intf->dev, "Found bulk in endpoint at %u\n", data->bulk_in); data->bulk_out = bulk_out->bEndpointAddress; dev_dbg(&intf->dev, "Found Bulk out endpoint at %u\n", data->bulk_out); /* Find int endpoint */ retcode = usb_find_int_in_endpoint(iface_desc, &int_in); if (!retcode) { data->iin_ep_present = 1; data->iin_ep = int_in->bEndpointAddress; data->iin_wMaxPacketSize = usb_endpoint_maxp(int_in); data->iin_interval = int_in->bInterval; dev_dbg(&intf->dev, "Found Int in endpoint at %u\n", data->iin_ep); } retcode = get_capabilities(data); if (retcode) dev_err(&intf->dev, "can't read capabilities\n"); if (data->iin_ep_present) { /* allocate int urb */ data->iin_urb = usb_alloc_urb(0, GFP_KERNEL); if (!data->iin_urb) { retcode = -ENOMEM; goto error_register; } /* Protect interrupt in endpoint data until iin_urb is freed */ kref_get(&data->kref); /* allocate buffer for interrupt in */ data->iin_buffer = kmalloc(data->iin_wMaxPacketSize, GFP_KERNEL); if (!data->iin_buffer) { retcode = -ENOMEM; goto error_register; } /* fill interrupt urb */ usb_fill_int_urb(data->iin_urb, data->usb_dev, usb_rcvintpipe(data->usb_dev, data->iin_ep), data->iin_buffer, data->iin_wMaxPacketSize, usbtmc_interrupt, data, data->iin_interval); retcode = usb_submit_urb(data->iin_urb, GFP_KERNEL); if (retcode) { dev_err(&intf->dev, "Failed to submit iin_urb\n"); goto error_register; } } retcode = usb_register_dev(intf, &usbtmc_class); if (retcode) { dev_err(&intf->dev, "Not able to get a minor (base %u, slice default): %d\n", USBTMC_MINOR_BASE, retcode); goto error_register; } dev_dbg(&intf->dev, "Using minor number %d\n", intf->minor); return 0; error_register: usbtmc_free_int(data); err_put: kref_put(&data->kref, usbtmc_delete); return retcode; } static void usbtmc_disconnect(struct usb_interface *intf) { struct usbtmc_device_data *data = usb_get_intfdata(intf); struct list_head *elem; usb_deregister_dev(intf, &usbtmc_class); mutex_lock(&data->io_mutex); data->zombie = 1; wake_up_interruptible_all(&data->waitq); list_for_each(elem, &data->file_list) { struct usbtmc_file_data *file_data; file_data = list_entry(elem, struct usbtmc_file_data, file_elem); usb_kill_anchored_urbs(&file_data->submitted); usb_scuttle_anchored_urbs(&file_data->in_anchor); } mutex_unlock(&data->io_mutex); usbtmc_free_int(data); kref_put(&data->kref, usbtmc_delete); } static void usbtmc_draw_down(struct usbtmc_file_data *file_data) { int time; time = usb_wait_anchor_empty_timeout(&file_data->submitted, 1000); if (!time) usb_kill_anchored_urbs(&file_data->submitted); usb_scuttle_anchored_urbs(&file_data->in_anchor); } static int usbtmc_suspend(struct usb_interface *intf, pm_message_t message) { struct usbtmc_device_data *data = usb_get_intfdata(intf); struct list_head *elem; if (!data) return 0; mutex_lock(&data->io_mutex); list_for_each(elem, &data->file_list) { struct usbtmc_file_data *file_data; file_data = list_entry(elem, struct usbtmc_file_data, file_elem); usbtmc_draw_down(file_data); } if (data->iin_ep_present && data->iin_urb) usb_kill_urb(data->iin_urb); mutex_unlock(&data->io_mutex); return 0; } static int usbtmc_resume(struct usb_interface *intf) { struct usbtmc_device_data *data = usb_get_intfdata(intf); int retcode = 0; if (data->iin_ep_present && data->iin_urb) retcode = usb_submit_urb(data->iin_urb, GFP_KERNEL); if (retcode) dev_err(&intf->dev, "Failed to submit iin_urb\n"); return retcode; } static int usbtmc_pre_reset(struct usb_interface *intf) { struct usbtmc_device_data *data = usb_get_intfdata(intf); struct list_head *elem; if (!data) return 0; mutex_lock(&data->io_mutex); list_for_each(elem, &data->file_list) { struct usbtmc_file_data *file_data; file_data = list_entry(elem, struct usbtmc_file_data, file_elem); usbtmc_ioctl_cancel_io(file_data); } return 0; } static int usbtmc_post_reset(struct usb_interface *intf) { struct usbtmc_device_data *data = usb_get_intfdata(intf); mutex_unlock(&data->io_mutex); return 0; } static struct usb_driver usbtmc_driver = { .name = "usbtmc", .id_table = usbtmc_devices, .probe = usbtmc_probe, .disconnect = usbtmc_disconnect, .suspend = usbtmc_suspend, .resume = usbtmc_resume, .pre_reset = usbtmc_pre_reset, .post_reset = usbtmc_post_reset, .dev_groups = usbtmc_groups, }; module_usb_driver(usbtmc_driver); MODULE_DESCRIPTION("USB Test & Measurement class driver"); MODULE_LICENSE("GPL"); |
| 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | #include <linux/atomic.h> #include <linux/export.h> #include <linux/generic-radix-tree.h> #include <linux/gfp.h> #include <linux/kmemleak.h> /* * Returns pointer to the specified byte @offset within @radix, or NULL if not * allocated */ void *__genradix_ptr(struct __genradix *radix, size_t offset) { return __genradix_ptr_inlined(radix, offset); } EXPORT_SYMBOL(__genradix_ptr); /* * Returns pointer to the specified byte @offset within @radix, allocating it if * necessary - newly allocated slots are always zeroed out: */ void *__genradix_ptr_alloc(struct __genradix *radix, size_t offset, struct genradix_node **preallocated, gfp_t gfp_mask) { struct genradix_root *v = READ_ONCE(radix->root); struct genradix_node *n, *new_node = NULL; unsigned level; if (preallocated) swap(new_node, *preallocated); /* Increase tree depth if necessary: */ while (1) { struct genradix_root *r = v, *new_root; n = genradix_root_to_node(r); level = genradix_root_to_depth(r); if (n && ilog2(offset) < genradix_depth_shift(level)) break; if (!new_node) { new_node = genradix_alloc_node(gfp_mask); if (!new_node) return NULL; } new_node->children[0] = n; new_root = ((struct genradix_root *) ((unsigned long) new_node | (n ? level + 1 : 0))); if ((v = cmpxchg_release(&radix->root, r, new_root)) == r) { v = new_root; new_node = NULL; } else { new_node->children[0] = NULL; } } while (level--) { struct genradix_node **p = &n->children[offset >> genradix_depth_shift(level)]; offset &= genradix_depth_size(level) - 1; n = READ_ONCE(*p); if (!n) { if (!new_node) { new_node = genradix_alloc_node(gfp_mask); if (!new_node) return NULL; } if (!(n = cmpxchg_release(p, NULL, new_node))) swap(n, new_node); } } if (new_node) genradix_free_node(new_node); return &n->data[offset]; } EXPORT_SYMBOL(__genradix_ptr_alloc); void *__genradix_iter_peek(struct genradix_iter *iter, struct __genradix *radix, size_t objs_per_page) { struct genradix_root *r; struct genradix_node *n; unsigned level, i; if (iter->offset == SIZE_MAX) return NULL; restart: r = READ_ONCE(radix->root); if (!r) return NULL; n = genradix_root_to_node(r); level = genradix_root_to_depth(r); if (ilog2(iter->offset) >= genradix_depth_shift(level)) return NULL; while (level) { level--; i = (iter->offset >> genradix_depth_shift(level)) & (GENRADIX_ARY - 1); while (!n->children[i]) { size_t objs_per_ptr = genradix_depth_size(level); if (iter->offset + objs_per_ptr < iter->offset) { iter->offset = SIZE_MAX; iter->pos = SIZE_MAX; return NULL; } i++; iter->offset = round_down(iter->offset + objs_per_ptr, objs_per_ptr); iter->pos = (iter->offset >> GENRADIX_NODE_SHIFT) * objs_per_page; if (i == GENRADIX_ARY) goto restart; } n = n->children[i]; } return &n->data[iter->offset & (GENRADIX_NODE_SIZE - 1)]; } EXPORT_SYMBOL(__genradix_iter_peek); void *__genradix_iter_peek_prev(struct genradix_iter *iter, struct __genradix *radix, size_t objs_per_page, size_t obj_size_plus_page_remainder) { struct genradix_root *r; struct genradix_node *n; unsigned level, i; if (iter->offset == SIZE_MAX) return NULL; restart: r = READ_ONCE(radix->root); if (!r) return NULL; n = genradix_root_to_node(r); level = genradix_root_to_depth(r); if (ilog2(iter->offset) >= genradix_depth_shift(level)) { iter->offset = genradix_depth_size(level); iter->pos = (iter->offset >> GENRADIX_NODE_SHIFT) * objs_per_page; iter->offset -= obj_size_plus_page_remainder; iter->pos--; } while (level) { level--; i = (iter->offset >> genradix_depth_shift(level)) & (GENRADIX_ARY - 1); while (!n->children[i]) { size_t objs_per_ptr = genradix_depth_size(level); iter->offset = round_down(iter->offset, objs_per_ptr); iter->pos = (iter->offset >> GENRADIX_NODE_SHIFT) * objs_per_page; if (!iter->offset) return NULL; iter->offset -= obj_size_plus_page_remainder; iter->pos--; if (!i) goto restart; --i; } n = n->children[i]; } return &n->data[iter->offset & (GENRADIX_NODE_SIZE - 1)]; } EXPORT_SYMBOL(__genradix_iter_peek_prev); static void genradix_free_recurse(struct genradix_node *n, unsigned level) { if (level) { unsigned i; for (i = 0; i < GENRADIX_ARY; i++) if (n->children[i]) genradix_free_recurse(n->children[i], level - 1); } genradix_free_node(n); } int __genradix_prealloc(struct __genradix *radix, size_t size, gfp_t gfp_mask) { size_t offset; for (offset = 0; offset < size; offset += GENRADIX_NODE_SIZE) if (!__genradix_ptr_alloc(radix, offset, NULL, gfp_mask)) return -ENOMEM; return 0; } EXPORT_SYMBOL(__genradix_prealloc); void __genradix_free(struct __genradix *radix) { struct genradix_root *r = xchg(&radix->root, NULL); genradix_free_recurse(genradix_root_to_node(r), genradix_root_to_depth(r)); } EXPORT_SYMBOL(__genradix_free); |
| 2 1 2 1 1 6 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 | // SPDX-License-Identifier: GPL-2.0 /* * Zoned block device handling * * Copyright (c) 2015, Hannes Reinecke * Copyright (c) 2015, SUSE Linux GmbH * * Copyright (c) 2016, Damien Le Moal * Copyright (c) 2016, Western Digital * Copyright (c) 2024, Western Digital Corporation or its affiliates. */ #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/blk-mq.h> #include <linux/spinlock.h> #include <linux/refcount.h> #include <linux/mempool.h> #include "blk.h" #include "blk-mq-sched.h" #include "blk-mq-debugfs.h" #define ZONE_COND_NAME(name) [BLK_ZONE_COND_##name] = #name static const char *const zone_cond_name[] = { ZONE_COND_NAME(NOT_WP), ZONE_COND_NAME(EMPTY), ZONE_COND_NAME(IMP_OPEN), ZONE_COND_NAME(EXP_OPEN), ZONE_COND_NAME(CLOSED), ZONE_COND_NAME(READONLY), ZONE_COND_NAME(FULL), ZONE_COND_NAME(OFFLINE), }; #undef ZONE_COND_NAME /* * Per-zone write plug. * @node: hlist_node structure for managing the plug using a hash table. * @ref: Zone write plug reference counter. A zone write plug reference is * always at least 1 when the plug is hashed in the disk plug hash table. * The reference is incremented whenever a new BIO needing plugging is * submitted and when a function needs to manipulate a plug. The * reference count is decremented whenever a plugged BIO completes and * when a function that referenced the plug returns. The initial * reference is dropped whenever the zone of the zone write plug is reset, * finished and when the zone becomes full (last write BIO to the zone * completes). * @lock: Spinlock to atomically manipulate the plug. * @flags: Flags indicating the plug state. * @zone_no: The number of the zone the plug is managing. * @wp_offset: The zone write pointer location relative to the start of the zone * as a number of 512B sectors. * @bio_list: The list of BIOs that are currently plugged. * @bio_work: Work struct to handle issuing of plugged BIOs * @rcu_head: RCU head to free zone write plugs with an RCU grace period. * @disk: The gendisk the plug belongs to. */ struct blk_zone_wplug { struct hlist_node node; refcount_t ref; spinlock_t lock; unsigned int flags; unsigned int zone_no; unsigned int wp_offset; struct bio_list bio_list; struct work_struct bio_work; struct rcu_head rcu_head; struct gendisk *disk; }; /* * Zone write plug flags bits: * - BLK_ZONE_WPLUG_PLUGGED: Indicates that the zone write plug is plugged, * that is, that write BIOs are being throttled due to a write BIO already * being executed or the zone write plug bio list is not empty. * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone * write pointer offset and need to update it. * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed * from the disk hash table and that the initial reference to the zone * write plug set when the plug was first added to the hash table has been * dropped. This flag is set when a zone is reset, finished or become full, * to prevent new references to the zone write plug to be taken for * newly incoming BIOs. A zone write plug flagged with this flag will be * freed once all remaining references from BIOs or functions are dropped. */ #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) #define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) #define BLK_ZONE_WPLUG_UNHASHED (1U << 2) /** * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX. * @zone_cond: BLK_ZONE_COND_XXX. * * Description: Centralize block layer function to convert BLK_ZONE_COND_XXX * into string format. Useful in the debugging and tracing zone conditions. For * invalid BLK_ZONE_COND_XXX it returns string "UNKNOWN". */ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond) { static const char *zone_cond_str = "UNKNOWN"; if (zone_cond < ARRAY_SIZE(zone_cond_name) && zone_cond_name[zone_cond]) zone_cond_str = zone_cond_name[zone_cond]; return zone_cond_str; } EXPORT_SYMBOL_GPL(blk_zone_cond_str); struct disk_report_zones_cb_args { struct gendisk *disk; report_zones_cb user_cb; void *user_data; }; static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, struct blk_zone *zone); static int disk_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data) { struct disk_report_zones_cb_args *args = data; struct gendisk *disk = args->disk; if (disk->zone_wplugs_hash) disk_zone_wplug_sync_wp_offset(disk, zone); if (!args->user_cb) return 0; return args->user_cb(zone, idx, args->user_data); } /** * blkdev_report_zones - Get zones information * @bdev: Target block device * @sector: Sector from which to report zones * @nr_zones: Maximum number of zones to report * @cb: Callback function called for each reported zone * @data: Private data for the callback * * Description: * Get zone information starting from the zone containing @sector for at most * @nr_zones, and call @cb for each zone reported by the device. * To report all zones in a device starting from @sector, the BLK_ALL_ZONES * constant can be passed to @nr_zones. * Returns the number of zones reported by the device, or a negative errno * value in case of failure. * * Note: The caller must use memalloc_noXX_save/restore() calls to control * memory allocations done within this function. */ int blkdev_report_zones(struct block_device *bdev, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data) { struct gendisk *disk = bdev->bd_disk; sector_t capacity = get_capacity(disk); struct disk_report_zones_cb_args args = { .disk = disk, .user_cb = cb, .user_data = data, }; if (!bdev_is_zoned(bdev) || WARN_ON_ONCE(!disk->fops->report_zones)) return -EOPNOTSUPP; if (!nr_zones || sector >= capacity) return 0; return disk->fops->report_zones(disk, sector, nr_zones, disk_report_zones_cb, &args); } EXPORT_SYMBOL_GPL(blkdev_report_zones); static int blkdev_zone_reset_all(struct block_device *bdev) { struct bio bio; bio_init(&bio, bdev, NULL, 0, REQ_OP_ZONE_RESET_ALL | REQ_SYNC); return submit_bio_wait(&bio); } /** * blkdev_zone_mgmt - Execute a zone management operation on a range of zones * @bdev: Target block device * @op: Operation to be performed on the zones * @sector: Start sector of the first zone to operate on * @nr_sectors: Number of sectors, should be at least the length of one zone and * must be zone size aligned. * * Description: * Perform the specified operation on the range of zones specified by * @sector..@sector+@nr_sectors. Specifying the entire disk sector range * is valid, but the specified range should not contain conventional zones. * The operation to execute on each zone can be a zone reset, open, close * or finish request. */ int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op, sector_t sector, sector_t nr_sectors) { sector_t zone_sectors = bdev_zone_sectors(bdev); sector_t capacity = bdev_nr_sectors(bdev); sector_t end_sector = sector + nr_sectors; struct bio *bio = NULL; int ret = 0; if (!bdev_is_zoned(bdev)) return -EOPNOTSUPP; if (bdev_read_only(bdev)) return -EPERM; if (!op_is_zone_mgmt(op)) return -EOPNOTSUPP; if (end_sector <= sector || end_sector > capacity) /* Out of range */ return -EINVAL; /* Check alignment (handle eventual smaller last zone) */ if (!bdev_is_zone_start(bdev, sector)) return -EINVAL; if (!bdev_is_zone_start(bdev, nr_sectors) && end_sector != capacity) return -EINVAL; /* * In the case of a zone reset operation over all zones, use * REQ_OP_ZONE_RESET_ALL. */ if (op == REQ_OP_ZONE_RESET && sector == 0 && nr_sectors == capacity) return blkdev_zone_reset_all(bdev); while (sector < end_sector) { bio = blk_next_bio(bio, bdev, 0, op | REQ_SYNC, GFP_KERNEL); bio->bi_iter.bi_sector = sector; sector += zone_sectors; /* This may take a while, so be nice to others */ cond_resched(); } ret = submit_bio_wait(bio); bio_put(bio); return ret; } EXPORT_SYMBOL_GPL(blkdev_zone_mgmt); struct zone_report_args { struct blk_zone __user *zones; }; static int blkdev_copy_zone_to_user(struct blk_zone *zone, unsigned int idx, void *data) { struct zone_report_args *args = data; if (copy_to_user(&args->zones[idx], zone, sizeof(struct blk_zone))) return -EFAULT; return 0; } /* * BLKREPORTZONE ioctl processing. * Called from blkdev_ioctl. */ int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; struct zone_report_args args; struct blk_zone_report rep; int ret; if (!argp) return -EINVAL; if (!bdev_is_zoned(bdev)) return -ENOTTY; if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report))) return -EFAULT; if (!rep.nr_zones) return -EINVAL; args.zones = argp + sizeof(struct blk_zone_report); ret = blkdev_report_zones(bdev, rep.sector, rep.nr_zones, blkdev_copy_zone_to_user, &args); if (ret < 0) return ret; rep.nr_zones = ret; rep.flags = BLK_ZONE_REP_CAPACITY; if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) return -EFAULT; return 0; } static int blkdev_truncate_zone_range(struct block_device *bdev, blk_mode_t mode, const struct blk_zone_range *zrange) { loff_t start, end; if (zrange->sector + zrange->nr_sectors <= zrange->sector || zrange->sector + zrange->nr_sectors > get_capacity(bdev->bd_disk)) /* Out of range */ return -EINVAL; start = zrange->sector << SECTOR_SHIFT; end = ((zrange->sector + zrange->nr_sectors) << SECTOR_SHIFT) - 1; return truncate_bdev_range(bdev, mode, start, end); } /* * BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE ioctl processing. * Called from blkdev_ioctl. */ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; struct blk_zone_range zrange; enum req_op op; int ret; if (!argp) return -EINVAL; if (!bdev_is_zoned(bdev)) return -ENOTTY; if (!(mode & BLK_OPEN_WRITE)) return -EBADF; if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range))) return -EFAULT; switch (cmd) { case BLKRESETZONE: op = REQ_OP_ZONE_RESET; /* Invalidate the page cache, including dirty pages. */ inode_lock(bdev->bd_mapping->host); filemap_invalidate_lock(bdev->bd_mapping); ret = blkdev_truncate_zone_range(bdev, mode, &zrange); if (ret) goto fail; break; case BLKOPENZONE: op = REQ_OP_ZONE_OPEN; break; case BLKCLOSEZONE: op = REQ_OP_ZONE_CLOSE; break; case BLKFINISHZONE: op = REQ_OP_ZONE_FINISH; break; default: return -ENOTTY; } ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors); fail: if (cmd == BLKRESETZONE) { filemap_invalidate_unlock(bdev->bd_mapping); inode_unlock(bdev->bd_mapping->host); } return ret; } static bool disk_zone_is_last(struct gendisk *disk, struct blk_zone *zone) { return zone->start + zone->len >= get_capacity(disk); } static bool disk_zone_is_full(struct gendisk *disk, unsigned int zno, unsigned int offset_in_zone) { if (zno < disk->nr_zones - 1) return offset_in_zone >= disk->zone_capacity; return offset_in_zone >= disk->last_zone_capacity; } static bool disk_zone_wplug_is_full(struct gendisk *disk, struct blk_zone_wplug *zwplug) { return disk_zone_is_full(disk, zwplug->zone_no, zwplug->wp_offset); } static bool disk_insert_zone_wplug(struct gendisk *disk, struct blk_zone_wplug *zwplug) { struct blk_zone_wplug *zwplg; unsigned long flags; unsigned int idx = hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits); /* * Add the new zone write plug to the hash table, but carefully as we * are racing with other submission context, so we may already have a * zone write plug for the same zone. */ spin_lock_irqsave(&disk->zone_wplugs_lock, flags); hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) { if (zwplg->zone_no == zwplug->zone_no) { spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); return false; } } hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]); atomic_inc(&disk->nr_zone_wplugs); spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); return true; } static struct blk_zone_wplug *disk_get_hashed_zone_wplug(struct gendisk *disk, sector_t sector) { unsigned int zno = disk_zone_no(disk, sector); unsigned int idx = hash_32(zno, disk->zone_wplugs_hash_bits); struct blk_zone_wplug *zwplug; rcu_read_lock(); hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[idx], node) { if (zwplug->zone_no == zno && refcount_inc_not_zero(&zwplug->ref)) { rcu_read_unlock(); return zwplug; } } rcu_read_unlock(); return NULL; } static inline struct blk_zone_wplug *disk_get_zone_wplug(struct gendisk *disk, sector_t sector) { if (!atomic_read(&disk->nr_zone_wplugs)) return NULL; return disk_get_hashed_zone_wplug(disk, sector); } static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head) { struct blk_zone_wplug *zwplug = container_of(rcu_head, struct blk_zone_wplug, rcu_head); mempool_free(zwplug, zwplug->disk->zone_wplugs_pool); } static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) { if (refcount_dec_and_test(&zwplug->ref)) { WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)); call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); } } static inline bool disk_should_remove_zone_wplug(struct gendisk *disk, struct blk_zone_wplug *zwplug) { lockdep_assert_held(&zwplug->lock); /* If the zone write plug was already removed, we are done. */ if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) return false; /* If the zone write plug is still plugged, it cannot be removed. */ if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) return false; /* * Completions of BIOs with blk_zone_write_plug_bio_endio() may * happen after handling a request completion with * blk_zone_write_plug_finish_request() (e.g. with split BIOs * that are chained). In such case, disk_zone_wplug_unplug_bio() * should not attempt to remove the zone write plug until all BIO * completions are seen. Check by looking at the zone write plug * reference count, which is 2 when the plug is unused (one reference * taken when the plug was allocated and another reference taken by the * caller context). */ if (refcount_read(&zwplug->ref) > 2) return false; /* We can remove zone write plugs for zones that are empty or full. */ return !zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug); } static void disk_remove_zone_wplug(struct gendisk *disk, struct blk_zone_wplug *zwplug) { unsigned long flags; /* If the zone write plug was already removed, we have nothing to do. */ if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) return; /* * Mark the zone write plug as unhashed and drop the extra reference we * took when the plug was inserted in the hash table. */ zwplug->flags |= BLK_ZONE_WPLUG_UNHASHED; spin_lock_irqsave(&disk->zone_wplugs_lock, flags); hlist_del_init_rcu(&zwplug->node); atomic_dec(&disk->nr_zone_wplugs); spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); disk_put_zone_wplug(zwplug); } static void blk_zone_wplug_bio_work(struct work_struct *work); /* * Get a reference on the write plug for the zone containing @sector. * If the plug does not exist, it is allocated and hashed. * Return a pointer to the zone write plug with the plug spinlock held. */ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, sector_t sector, gfp_t gfp_mask, unsigned long *flags) { unsigned int zno = disk_zone_no(disk, sector); struct blk_zone_wplug *zwplug; again: zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { /* * Check that a BIO completion or a zone reset or finish * operation has not already removed the zone write plug from * the hash table and dropped its reference count. In such case, * we need to get a new plug so start over from the beginning. */ spin_lock_irqsave(&zwplug->lock, *flags); if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) { spin_unlock_irqrestore(&zwplug->lock, *flags); disk_put_zone_wplug(zwplug); goto again; } return zwplug; } /* * Allocate and initialize a zone write plug with an extra reference * so that it is not freed when the zone write plug becomes idle without * the zone being full. */ zwplug = mempool_alloc(disk->zone_wplugs_pool, gfp_mask); if (!zwplug) return NULL; INIT_HLIST_NODE(&zwplug->node); refcount_set(&zwplug->ref, 2); spin_lock_init(&zwplug->lock); zwplug->flags = 0; zwplug->zone_no = zno; zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); bio_list_init(&zwplug->bio_list); INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); zwplug->disk = disk; spin_lock_irqsave(&zwplug->lock, *flags); /* * Insert the new zone write plug in the hash table. This can fail only * if another context already inserted a plug. Retry from the beginning * in such case. */ if (!disk_insert_zone_wplug(disk, zwplug)) { spin_unlock_irqrestore(&zwplug->lock, *flags); mempool_free(zwplug, disk->zone_wplugs_pool); goto again; } return zwplug; } static inline void blk_zone_wplug_bio_io_error(struct blk_zone_wplug *zwplug, struct bio *bio) { struct request_queue *q = zwplug->disk->queue; bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING); bio_io_error(bio); disk_put_zone_wplug(zwplug); /* Drop the reference taken by disk_zone_wplug_add_bio(() */ blk_queue_exit(q); } /* * Abort (fail) all plugged BIOs of a zone write plug. */ static void disk_zone_wplug_abort(struct blk_zone_wplug *zwplug) { struct bio *bio; if (bio_list_empty(&zwplug->bio_list)) return; pr_warn_ratelimited("%s: zone %u: Aborting plugged BIOs\n", zwplug->disk->disk_name, zwplug->zone_no); while ((bio = bio_list_pop(&zwplug->bio_list))) blk_zone_wplug_bio_io_error(zwplug, bio); } /* * Set a zone write plug write pointer offset to the specified value. * This aborts all plugged BIOs, which is fine as this function is called for * a zone reset operation, a zone finish operation or if the zone needs a wp * update from a report zone after a write error. */ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, struct blk_zone_wplug *zwplug, unsigned int wp_offset) { lockdep_assert_held(&zwplug->lock); /* Update the zone write pointer and abort all plugged BIOs. */ zwplug->flags &= ~BLK_ZONE_WPLUG_NEED_WP_UPDATE; zwplug->wp_offset = wp_offset; disk_zone_wplug_abort(zwplug); /* * The zone write plug now has no BIO plugged: remove it from the * hash table so that it cannot be seen. The plug will be freed * when the last reference is dropped. */ if (disk_should_remove_zone_wplug(disk, zwplug)) disk_remove_zone_wplug(disk, zwplug); } static unsigned int blk_zone_wp_offset(struct blk_zone *zone) { switch (zone->cond) { case BLK_ZONE_COND_IMP_OPEN: case BLK_ZONE_COND_EXP_OPEN: case BLK_ZONE_COND_CLOSED: return zone->wp - zone->start; case BLK_ZONE_COND_FULL: return zone->len; case BLK_ZONE_COND_EMPTY: return 0; case BLK_ZONE_COND_NOT_WP: case BLK_ZONE_COND_OFFLINE: case BLK_ZONE_COND_READONLY: default: /* * Conventional, offline and read-only zones do not have a valid * write pointer. */ return UINT_MAX; } } static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, struct blk_zone *zone) { struct blk_zone_wplug *zwplug; unsigned long flags; zwplug = disk_get_zone_wplug(disk, zone->start); if (!zwplug) return; spin_lock_irqsave(&zwplug->lock, flags); if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) disk_zone_wplug_set_wp_offset(disk, zwplug, blk_zone_wp_offset(zone)); spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector) { struct disk_report_zones_cb_args args = { .disk = disk, }; return disk->fops->report_zones(disk, sector, 1, disk_report_zones_cb, &args); } static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, unsigned int wp_offset) { struct gendisk *disk = bio->bi_bdev->bd_disk; sector_t sector = bio->bi_iter.bi_sector; struct blk_zone_wplug *zwplug; unsigned long flags; /* Conventional zones cannot be reset nor finished. */ if (!bdev_zone_is_seq(bio->bi_bdev, sector)) { bio_io_error(bio); return true; } /* * No-wait reset or finish BIOs do not make much sense as the callers * issue these as blocking operations in most cases. To avoid issues * the BIO execution potentially failing with BLK_STS_AGAIN, warn about * REQ_NOWAIT being set and ignore that flag. */ if (WARN_ON_ONCE(bio->bi_opf & REQ_NOWAIT)) bio->bi_opf &= ~REQ_NOWAIT; /* * If we have a zone write plug, set its write pointer offset to 0 * (reset case) or to the zone size (finish case). This will abort all * BIOs plugged for the target zone. It is fine as resetting or * finishing zones while writes are still in-flight will result in the * writes failing anyway. */ zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, wp_offset); spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } return false; } static bool blk_zone_wplug_handle_reset_all(struct bio *bio) { struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug; unsigned long flags; sector_t sector; /* * Set the write pointer offset of all zone write plugs to 0. This will * abort all plugged BIOs. It is fine as resetting zones while writes * are still in-flight will result in the writes failing anyway. */ for (sector = 0; sector < get_capacity(disk); sector += disk->queue->limits.chunk_sectors) { zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, 0); spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } } return false; } static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, struct blk_zone_wplug *zwplug) { /* * Take a reference on the zone write plug and schedule the submission * of the next plugged BIO. blk_zone_wplug_bio_work() will release the * reference we take here. */ WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); refcount_inc(&zwplug->ref); queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); } static inline void disk_zone_wplug_add_bio(struct gendisk *disk, struct blk_zone_wplug *zwplug, struct bio *bio, unsigned int nr_segs) { bool schedule_bio_work = false; /* * Grab an extra reference on the BIO request queue usage counter. * This reference will be reused to submit a request for the BIO for * blk-mq devices and dropped when the BIO is failed and after * it is issued in the case of BIO-based devices. */ percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); /* * The BIO is being plugged and thus will have to wait for the on-going * write and for all other writes already plugged. So polling makes * no sense. */ bio_clear_polled(bio); /* * REQ_NOWAIT BIOs are always handled using the zone write plug BIO * work, which can block. So clear the REQ_NOWAIT flag and schedule the * work if this is the first BIO we are plugging. */ if (bio->bi_opf & REQ_NOWAIT) { schedule_bio_work = !(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); bio->bi_opf &= ~REQ_NOWAIT; } /* * Reuse the poll cookie field to store the number of segments when * split to the hardware limits. */ bio->__bi_nr_segments = nr_segs; /* * We always receive BIOs after they are split and ready to be issued. * The block layer passes the parts of a split BIO in order, and the * user must also issue write sequentially. So simply add the new BIO * at the tail of the list to preserve the sequential write order. */ bio_list_add(&zwplug->bio_list, bio); zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; if (schedule_bio_work) disk_zone_wplug_schedule_bio_work(disk, zwplug); } /* * Called from bio_attempt_back_merge() when a BIO was merged with a request. */ void blk_zone_write_plug_bio_merged(struct bio *bio) { struct blk_zone_wplug *zwplug; unsigned long flags; /* * If the BIO was already plugged, then we were called through * blk_zone_write_plug_init_request() -> blk_attempt_bio_merge(). * For this case, we already hold a reference on the zone write plug for * the BIO and blk_zone_write_plug_init_request() will handle the * zone write pointer offset update. */ if (bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING)) return; bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING); /* * Get a reference on the zone write plug of the target zone and advance * the zone write pointer offset. Given that this is a merge, we already * have at least one request and one BIO referencing the zone write * plug. So this should not fail. */ zwplug = disk_get_zone_wplug(bio->bi_bdev->bd_disk, bio->bi_iter.bi_sector); if (WARN_ON_ONCE(!zwplug)) return; spin_lock_irqsave(&zwplug->lock, flags); zwplug->wp_offset += bio_sectors(bio); spin_unlock_irqrestore(&zwplug->lock, flags); } /* * Attempt to merge plugged BIOs with a newly prepared request for a BIO that * already went through zone write plugging (either a new BIO or one that was * unplugged). */ void blk_zone_write_plug_init_request(struct request *req) { sector_t req_back_sector = blk_rq_pos(req) + blk_rq_sectors(req); struct request_queue *q = req->q; struct gendisk *disk = q->disk; struct blk_zone_wplug *zwplug = disk_get_zone_wplug(disk, blk_rq_pos(req)); unsigned long flags; struct bio *bio; if (WARN_ON_ONCE(!zwplug)) return; /* * Indicate that completion of this request needs to be handled with * blk_zone_write_plug_finish_request(), which will drop the reference * on the zone write plug we took above on entry to this function. */ req->rq_flags |= RQF_ZONE_WRITE_PLUGGING; if (blk_queue_nomerges(q)) return; /* * Walk through the list of plugged BIOs to check if they can be merged * into the back of the request. */ spin_lock_irqsave(&zwplug->lock, flags); while (!disk_zone_wplug_is_full(disk, zwplug)) { bio = bio_list_peek(&zwplug->bio_list); if (!bio) break; if (bio->bi_iter.bi_sector != req_back_sector || !blk_rq_merge_ok(req, bio)) break; WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE_ZEROES && !bio->__bi_nr_segments); bio_list_pop(&zwplug->bio_list); if (bio_attempt_back_merge(req, bio, bio->__bi_nr_segments) != BIO_MERGE_OK) { bio_list_add_head(&zwplug->bio_list, bio); break; } /* Drop the reference taken by disk_zone_wplug_add_bio(). */ blk_queue_exit(q); zwplug->wp_offset += bio_sectors(bio); req_back_sector += bio_sectors(bio); } spin_unlock_irqrestore(&zwplug->lock, flags); } /* * Check and prepare a BIO for submission by incrementing the write pointer * offset of its zone write plug and changing zone append operations into * regular write when zone append emulation is needed. */ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug, struct bio *bio) { struct gendisk *disk = bio->bi_bdev->bd_disk; lockdep_assert_held(&zwplug->lock); /* * If we lost track of the zone write pointer due to a write error, * the user must either execute a report zones, reset the zone or finish * the to recover a reliable write pointer position. Fail BIOs if the * user did not do that as we cannot handle emulated zone append * otherwise. */ if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) return false; /* * Check that the user is not attempting to write to a full zone. * We know such BIO will fail, and that would potentially overflow our * write pointer offset beyond the end of the zone. */ if (disk_zone_wplug_is_full(disk, zwplug)) return false; if (bio_op(bio) == REQ_OP_ZONE_APPEND) { /* * Use a regular write starting at the current write pointer. * Similarly to native zone append operations, do not allow * merging. */ bio->bi_opf &= ~REQ_OP_MASK; bio->bi_opf |= REQ_OP_WRITE | REQ_NOMERGE; bio->bi_iter.bi_sector += zwplug->wp_offset; /* * Remember that this BIO is in fact a zone append operation * so that we can restore its operation code on completion. */ bio_set_flag(bio, BIO_EMULATES_ZONE_APPEND); } else { /* * Check for non-sequential writes early as we know that BIOs * with a start sector not unaligned to the zone write pointer * will fail. */ if (bio_offset_from_zone_start(bio) != zwplug->wp_offset) return false; } /* Advance the zone write pointer offset. */ zwplug->wp_offset += bio_sectors(bio); return true; } static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) { struct gendisk *disk = bio->bi_bdev->bd_disk; sector_t sector = bio->bi_iter.bi_sector; struct blk_zone_wplug *zwplug; gfp_t gfp_mask = GFP_NOIO; unsigned long flags; /* * BIOs must be fully contained within a zone so that we use the correct * zone write plug for the entire BIO. For blk-mq devices, the block * layer should already have done any splitting required to ensure this * and this BIO should thus not be straddling zone boundaries. For * BIO-based devices, it is the responsibility of the driver to split * the bio before submitting it. */ if (WARN_ON_ONCE(bio_straddles_zones(bio))) { bio_io_error(bio); return true; } /* Conventional zones do not need write plugging. */ if (!bdev_zone_is_seq(bio->bi_bdev, sector)) { /* Zone append to conventional zones is not allowed. */ if (bio_op(bio) == REQ_OP_ZONE_APPEND) { bio_io_error(bio); return true; } return false; } if (bio->bi_opf & REQ_NOWAIT) gfp_mask = GFP_NOWAIT; zwplug = disk_get_and_lock_zone_wplug(disk, sector, gfp_mask, &flags); if (!zwplug) { if (bio->bi_opf & REQ_NOWAIT) bio_wouldblock_error(bio); else bio_io_error(bio); return true; } /* Indicate that this BIO is being handled using zone write plugging. */ bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING); /* * If the zone is already plugged, add the BIO to the plug BIO list. * Do the same for REQ_NOWAIT BIOs to ensure that we will not see a * BLK_STS_AGAIN failure if we let the BIO execute. * Otherwise, plug and let the BIO execute. */ if ((zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) || (bio->bi_opf & REQ_NOWAIT)) goto plug; if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { spin_unlock_irqrestore(&zwplug->lock, flags); bio_io_error(bio); return true; } zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; spin_unlock_irqrestore(&zwplug->lock, flags); return false; plug: disk_zone_wplug_add_bio(disk, zwplug, bio, nr_segs); spin_unlock_irqrestore(&zwplug->lock, flags); return true; } static void blk_zone_wplug_handle_native_zone_append(struct bio *bio) { struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug; unsigned long flags; /* * We have native support for zone append operations, so we are not * going to handle @bio through plugging. However, we may already have a * zone write plug for the target zone if that zone was previously * partially written using regular writes. In such case, we risk leaving * the plug in the disk hash table if the zone is fully written using * zone append operations. Avoid this by removing the zone write plug. */ zwplug = disk_get_zone_wplug(disk, bio->bi_iter.bi_sector); if (likely(!zwplug)) return; spin_lock_irqsave(&zwplug->lock, flags); /* * We are about to remove the zone write plug. But if the user * (mistakenly) has issued regular writes together with native zone * append, we must aborts the writes as otherwise the plugged BIOs would * not be executed by the plug BIO work as disk_get_zone_wplug() will * return NULL after the plug is removed. Aborting the plugged write * BIOs is consistent with the fact that these writes will most likely * fail anyway as there is no ordering guarantees between zone append * operations and regular write operations. */ if (!bio_list_empty(&zwplug->bio_list)) { pr_warn_ratelimited("%s: zone %u: Invalid mix of zone append and regular writes\n", disk->disk_name, zwplug->zone_no); disk_zone_wplug_abort(zwplug); } disk_remove_zone_wplug(disk, zwplug); spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } /** * blk_zone_plug_bio - Handle a zone write BIO with zone write plugging * @bio: The BIO being submitted * @nr_segs: The number of physical segments of @bio * * Handle write, write zeroes and zone append operations requiring emulation * using zone write plugging. * * Return true whenever @bio execution needs to be delayed through the zone * write plug. Otherwise, return false to let the submission path process * @bio normally. */ bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs) { struct block_device *bdev = bio->bi_bdev; if (!bdev->bd_disk->zone_wplugs_hash) return false; /* * If the BIO already has the plugging flag set, then it was already * handled through this path and this is a submission from the zone * plug bio submit work. */ if (bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING)) return false; /* * We do not need to do anything special for empty flush BIOs, e.g * BIOs such as issued by blkdev_issue_flush(). The is because it is * the responsibility of the user to first wait for the completion of * write operations for flush to have any effect on the persistence of * the written data. */ if (op_is_flush(bio->bi_opf) && !bio_sectors(bio)) return false; /* * Regular writes and write zeroes need to be handled through the target * zone write plug. This includes writes with REQ_FUA | REQ_PREFLUSH * which may need to go through the flush machinery depending on the * target device capabilities. Plugging such writes is fine as the flush * machinery operates at the request level, below the plug, and * completion of the flush sequence will go through the regular BIO * completion, which will handle zone write plugging. * Zone append operations for devices that requested emulation must * also be plugged so that these BIOs can be changed into regular * write BIOs. * Zone reset, reset all and finish commands need special treatment * to correctly track the write pointer offset of zones. These commands * are not plugged as we do not need serialization with write * operations. It is the responsibility of the user to not issue reset * and finish commands when write operations are in flight. */ switch (bio_op(bio)) { case REQ_OP_ZONE_APPEND: if (!bdev_emulates_zone_append(bdev)) { blk_zone_wplug_handle_native_zone_append(bio); return false; } fallthrough; case REQ_OP_WRITE: case REQ_OP_WRITE_ZEROES: return blk_zone_wplug_handle_write(bio, nr_segs); case REQ_OP_ZONE_RESET: return blk_zone_wplug_handle_reset_or_finish(bio, 0); case REQ_OP_ZONE_FINISH: return blk_zone_wplug_handle_reset_or_finish(bio, bdev_zone_sectors(bdev)); case REQ_OP_ZONE_RESET_ALL: return blk_zone_wplug_handle_reset_all(bio); default: return false; } return false; } EXPORT_SYMBOL_GPL(blk_zone_plug_bio); static void disk_zone_wplug_unplug_bio(struct gendisk *disk, struct blk_zone_wplug *zwplug) { unsigned long flags; spin_lock_irqsave(&zwplug->lock, flags); /* Schedule submission of the next plugged BIO if we have one. */ if (!bio_list_empty(&zwplug->bio_list)) { disk_zone_wplug_schedule_bio_work(disk, zwplug); spin_unlock_irqrestore(&zwplug->lock, flags); return; } zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; /* * If the zone is full (it was fully written or finished, or empty * (it was reset), remove its zone write plug from the hash table. */ if (disk_should_remove_zone_wplug(disk, zwplug)) disk_remove_zone_wplug(disk, zwplug); spin_unlock_irqrestore(&zwplug->lock, flags); } void blk_zone_write_plug_bio_endio(struct bio *bio) { struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug = disk_get_zone_wplug(disk, bio->bi_iter.bi_sector); unsigned long flags; if (WARN_ON_ONCE(!zwplug)) return; /* Make sure we do not see this BIO again by clearing the plug flag. */ bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING); /* * If this is a regular write emulating a zone append operation, * restore the original operation code. */ if (bio_flagged(bio, BIO_EMULATES_ZONE_APPEND)) { bio->bi_opf &= ~REQ_OP_MASK; bio->bi_opf |= REQ_OP_ZONE_APPEND; } /* * If the BIO failed, abort all plugged BIOs and mark the plug as * needing a write pointer update. */ if (bio->bi_status != BLK_STS_OK) { spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_abort(zwplug); zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; spin_unlock_irqrestore(&zwplug->lock, flags); } /* Drop the reference we took when the BIO was issued. */ disk_put_zone_wplug(zwplug); /* * For BIO-based devices, blk_zone_write_plug_finish_request() * is not called. So we need to schedule execution of the next * plugged BIO here. */ if (bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) disk_zone_wplug_unplug_bio(disk, zwplug); /* Drop the reference we took when entering this function. */ disk_put_zone_wplug(zwplug); } void blk_zone_write_plug_finish_request(struct request *req) { struct gendisk *disk = req->q->disk; struct blk_zone_wplug *zwplug; zwplug = disk_get_zone_wplug(disk, req->__sector); if (WARN_ON_ONCE(!zwplug)) return; req->rq_flags &= ~RQF_ZONE_WRITE_PLUGGING; /* * Drop the reference we took when the request was initialized in * blk_zone_write_plug_init_request(). */ disk_put_zone_wplug(zwplug); disk_zone_wplug_unplug_bio(disk, zwplug); /* Drop the reference we took when entering this function. */ disk_put_zone_wplug(zwplug); } static void blk_zone_wplug_bio_work(struct work_struct *work) { struct blk_zone_wplug *zwplug = container_of(work, struct blk_zone_wplug, bio_work); struct block_device *bdev; unsigned long flags; struct bio *bio; /* * Submit the next plugged BIO. If we do not have any, clear * the plugged flag. */ spin_lock_irqsave(&zwplug->lock, flags); again: bio = bio_list_pop(&zwplug->bio_list); if (!bio) { zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; spin_unlock_irqrestore(&zwplug->lock, flags); goto put_zwplug; } if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { blk_zone_wplug_bio_io_error(zwplug, bio); goto again; } spin_unlock_irqrestore(&zwplug->lock, flags); bdev = bio->bi_bdev; submit_bio_noacct_nocheck(bio); /* * blk-mq devices will reuse the extra reference on the request queue * usage counter we took when the BIO was plugged, but the submission * path for BIO-based devices will not do that. So drop this extra * reference here. */ if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) blk_queue_exit(bdev->bd_disk->queue); put_zwplug: /* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */ disk_put_zone_wplug(zwplug); } static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) { return 1U << disk->zone_wplugs_hash_bits; } void disk_init_zone_resources(struct gendisk *disk) { spin_lock_init(&disk->zone_wplugs_lock); } /* * For the size of a disk zone write plug hash table, use the size of the * zone write plug mempool, which is the maximum of the disk open zones and * active zones limits. But do not exceed 4KB (512 hlist head entries), that is, * 9 bits. For a disk that has no limits, mempool size defaults to 128. */ #define BLK_ZONE_WPLUG_MAX_HASH_BITS 9 #define BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE 128 static int disk_alloc_zone_resources(struct gendisk *disk, unsigned int pool_size) { unsigned int i; atomic_set(&disk->nr_zone_wplugs, 0); disk->zone_wplugs_hash_bits = min(ilog2(pool_size) + 1, BLK_ZONE_WPLUG_MAX_HASH_BITS); disk->zone_wplugs_hash = kcalloc(disk_zone_wplugs_hash_size(disk), sizeof(struct hlist_head), GFP_KERNEL); if (!disk->zone_wplugs_hash) return -ENOMEM; for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++) INIT_HLIST_HEAD(&disk->zone_wplugs_hash[i]); disk->zone_wplugs_pool = mempool_create_kmalloc_pool(pool_size, sizeof(struct blk_zone_wplug)); if (!disk->zone_wplugs_pool) goto free_hash; disk->zone_wplugs_wq = alloc_workqueue("%s_zwplugs", WQ_MEM_RECLAIM | WQ_HIGHPRI, pool_size, disk->disk_name); if (!disk->zone_wplugs_wq) goto destroy_pool; return 0; destroy_pool: mempool_destroy(disk->zone_wplugs_pool); disk->zone_wplugs_pool = NULL; free_hash: kfree(disk->zone_wplugs_hash); disk->zone_wplugs_hash = NULL; disk->zone_wplugs_hash_bits = 0; return -ENOMEM; } static void disk_destroy_zone_wplugs_hash_table(struct gendisk *disk) { struct blk_zone_wplug *zwplug; unsigned int i; if (!disk->zone_wplugs_hash) return; /* Free all the zone write plugs we have. */ for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++) { while (!hlist_empty(&disk->zone_wplugs_hash[i])) { zwplug = hlist_entry(disk->zone_wplugs_hash[i].first, struct blk_zone_wplug, node); refcount_inc(&zwplug->ref); disk_remove_zone_wplug(disk, zwplug); disk_put_zone_wplug(zwplug); } } WARN_ON_ONCE(atomic_read(&disk->nr_zone_wplugs)); kfree(disk->zone_wplugs_hash); disk->zone_wplugs_hash = NULL; disk->zone_wplugs_hash_bits = 0; } static unsigned int disk_set_conv_zones_bitmap(struct gendisk *disk, unsigned long *bitmap) { unsigned int nr_conv_zones = 0; unsigned long flags; spin_lock_irqsave(&disk->zone_wplugs_lock, flags); if (bitmap) nr_conv_zones = bitmap_weight(bitmap, disk->nr_zones); bitmap = rcu_replace_pointer(disk->conv_zones_bitmap, bitmap, lockdep_is_held(&disk->zone_wplugs_lock)); spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); kfree_rcu_mightsleep(bitmap); return nr_conv_zones; } void disk_free_zone_resources(struct gendisk *disk) { if (!disk->zone_wplugs_pool) return; if (disk->zone_wplugs_wq) { destroy_workqueue(disk->zone_wplugs_wq); disk->zone_wplugs_wq = NULL; } disk_destroy_zone_wplugs_hash_table(disk); /* * Wait for the zone write plugs to be RCU-freed before * destorying the mempool. */ rcu_barrier(); mempool_destroy(disk->zone_wplugs_pool); disk->zone_wplugs_pool = NULL; disk_set_conv_zones_bitmap(disk, NULL); disk->zone_capacity = 0; disk->last_zone_capacity = 0; disk->nr_zones = 0; } static inline bool disk_need_zone_resources(struct gendisk *disk) { /* * All mq zoned devices need zone resources so that the block layer * can automatically handle write BIO plugging. BIO-based device drivers * (e.g. DM devices) are normally responsible for handling zone write * ordering and do not need zone resources, unless the driver requires * zone append emulation. */ return queue_is_mq(disk->queue) || queue_emulates_zone_append(disk->queue); } static int disk_revalidate_zone_resources(struct gendisk *disk, unsigned int nr_zones) { struct queue_limits *lim = &disk->queue->limits; unsigned int pool_size; if (!disk_need_zone_resources(disk)) return 0; /* * If the device has no limit on the maximum number of open and active * zones, use BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE. */ pool_size = max(lim->max_open_zones, lim->max_active_zones); if (!pool_size) pool_size = min(BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE, nr_zones); if (!disk->zone_wplugs_hash) return disk_alloc_zone_resources(disk, pool_size); return 0; } struct blk_revalidate_zone_args { struct gendisk *disk; unsigned long *conv_zones_bitmap; unsigned int nr_zones; unsigned int zone_capacity; unsigned int last_zone_capacity; sector_t sector; }; /* * Update the disk zone resources information and device queue limits. * The disk queue is frozen when this is executed. */ static int disk_update_zone_resources(struct gendisk *disk, struct blk_revalidate_zone_args *args) { struct request_queue *q = disk->queue; unsigned int nr_seq_zones, nr_conv_zones; unsigned int pool_size; struct queue_limits lim; disk->nr_zones = args->nr_zones; disk->zone_capacity = args->zone_capacity; disk->last_zone_capacity = args->last_zone_capacity; nr_conv_zones = disk_set_conv_zones_bitmap(disk, args->conv_zones_bitmap); if (nr_conv_zones >= disk->nr_zones) { pr_warn("%s: Invalid number of conventional zones %u / %u\n", disk->disk_name, nr_conv_zones, disk->nr_zones); return -ENODEV; } lim = queue_limits_start_update(q); /* * Some devices can advertize zone resource limits that are larger than * the number of sequential zones of the zoned block device, e.g. a * small ZNS namespace. For such case, assume that the zoned device has * no zone resource limits. */ nr_seq_zones = disk->nr_zones - nr_conv_zones; if (lim.max_open_zones >= nr_seq_zones) lim.max_open_zones = 0; if (lim.max_active_zones >= nr_seq_zones) lim.max_active_zones = 0; if (!disk->zone_wplugs_pool) goto commit; /* * If the device has no limit on the maximum number of open and active * zones, set its max open zone limit to the mempool size to indicate * to the user that there is a potential performance impact due to * dynamic zone write plug allocation when simultaneously writing to * more zones than the size of the mempool. */ pool_size = max(lim.max_open_zones, lim.max_active_zones); if (!pool_size) pool_size = min(BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE, nr_seq_zones); mempool_resize(disk->zone_wplugs_pool, pool_size); if (!lim.max_open_zones && !lim.max_active_zones) { if (pool_size < nr_seq_zones) lim.max_open_zones = pool_size; else lim.max_open_zones = 0; } commit: return queue_limits_commit_update_frozen(q, &lim); } static int blk_revalidate_conv_zone(struct blk_zone *zone, unsigned int idx, struct blk_revalidate_zone_args *args) { struct gendisk *disk = args->disk; if (zone->capacity != zone->len) { pr_warn("%s: Invalid conventional zone capacity\n", disk->disk_name); return -ENODEV; } if (disk_zone_is_last(disk, zone)) args->last_zone_capacity = zone->capacity; if (!disk_need_zone_resources(disk)) return 0; if (!args->conv_zones_bitmap) { args->conv_zones_bitmap = bitmap_zalloc(args->nr_zones, GFP_NOIO); if (!args->conv_zones_bitmap) return -ENOMEM; } set_bit(idx, args->conv_zones_bitmap); return 0; } static int blk_revalidate_seq_zone(struct blk_zone *zone, unsigned int idx, struct blk_revalidate_zone_args *args) { struct gendisk *disk = args->disk; struct blk_zone_wplug *zwplug; unsigned int wp_offset; unsigned long flags; /* * Remember the capacity of the first sequential zone and check * if it is constant for all zones, ignoring the last zone as it can be * smaller. */ if (!args->zone_capacity) args->zone_capacity = zone->capacity; if (disk_zone_is_last(disk, zone)) { args->last_zone_capacity = zone->capacity; } else if (zone->capacity != args->zone_capacity) { pr_warn("%s: Invalid variable zone capacity\n", disk->disk_name); return -ENODEV; } /* * If the device needs zone append emulation, we need to track the * write pointer of all zones that are not empty nor full. So make sure * we have a zone write plug for such zone if the device has a zone * write plug hash table. */ if (!queue_emulates_zone_append(disk->queue) || !disk->zone_wplugs_hash) return 0; disk_zone_wplug_sync_wp_offset(disk, zone); wp_offset = blk_zone_wp_offset(zone); if (!wp_offset || wp_offset >= zone->capacity) return 0; zwplug = disk_get_and_lock_zone_wplug(disk, zone->wp, GFP_NOIO, &flags); if (!zwplug) return -ENOMEM; spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); return 0; } /* * Helper function to check the validity of zones of a zoned block device. */ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, void *data) { struct blk_revalidate_zone_args *args = data; struct gendisk *disk = args->disk; sector_t zone_sectors = disk->queue->limits.chunk_sectors; int ret; /* Check for bad zones and holes in the zone report */ if (zone->start != args->sector) { pr_warn("%s: Zone gap at sectors %llu..%llu\n", disk->disk_name, args->sector, zone->start); return -ENODEV; } if (zone->start >= get_capacity(disk) || !zone->len) { pr_warn("%s: Invalid zone start %llu, length %llu\n", disk->disk_name, zone->start, zone->len); return -ENODEV; } /* * All zones must have the same size, with the exception on an eventual * smaller last zone. */ if (!disk_zone_is_last(disk, zone)) { if (zone->len != zone_sectors) { pr_warn("%s: Invalid zoned device with non constant zone size\n", disk->disk_name); return -ENODEV; } } else if (zone->len > zone_sectors) { pr_warn("%s: Invalid zoned device with larger last zone size\n", disk->disk_name); return -ENODEV; } if (!zone->capacity || zone->capacity > zone->len) { pr_warn("%s: Invalid zone capacity\n", disk->disk_name); return -ENODEV; } /* Check zone type */ switch (zone->type) { case BLK_ZONE_TYPE_CONVENTIONAL: ret = blk_revalidate_conv_zone(zone, idx, args); break; case BLK_ZONE_TYPE_SEQWRITE_REQ: ret = blk_revalidate_seq_zone(zone, idx, args); break; case BLK_ZONE_TYPE_SEQWRITE_PREF: default: pr_warn("%s: Invalid zone type 0x%x at sectors %llu\n", disk->disk_name, (int)zone->type, zone->start); ret = -ENODEV; } if (!ret) args->sector += zone->len; return ret; } /** * blk_revalidate_disk_zones - (re)allocate and initialize zone write plugs * @disk: Target disk * * Helper function for low-level device drivers to check, (re) allocate and * initialize resources used for managing zoned disks. This function should * normally be called by blk-mq based drivers when a zoned gendisk is probed * and when the zone configuration of the gendisk changes (e.g. after a format). * Before calling this function, the device driver must already have set the * device zone size (chunk_sector limit) and the max zone append limit. * BIO based drivers can also use this function as long as the device queue * can be safely frozen. */ int blk_revalidate_disk_zones(struct gendisk *disk) { struct request_queue *q = disk->queue; sector_t zone_sectors = q->limits.chunk_sectors; sector_t capacity = get_capacity(disk); struct blk_revalidate_zone_args args = { }; unsigned int noio_flag; int ret = -ENOMEM; if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) return -EIO; if (!capacity) return -ENODEV; /* * Checks that the device driver indicated a valid zone size and that * the max zone append limit is set. */ if (!zone_sectors || !is_power_of_2(zone_sectors)) { pr_warn("%s: Invalid non power of two zone size (%llu)\n", disk->disk_name, zone_sectors); return -ENODEV; } /* * Ensure that all memory allocations in this context are done as if * GFP_NOIO was specified. */ args.disk = disk; args.nr_zones = (capacity + zone_sectors - 1) >> ilog2(zone_sectors); noio_flag = memalloc_noio_save(); ret = disk_revalidate_zone_resources(disk, args.nr_zones); if (ret) { memalloc_noio_restore(noio_flag); return ret; } ret = disk->fops->report_zones(disk, 0, UINT_MAX, blk_revalidate_zone_cb, &args); if (!ret) { pr_warn("%s: No zones reported\n", disk->disk_name); ret = -ENODEV; } memalloc_noio_restore(noio_flag); /* * If zones where reported, make sure that the entire disk capacity * has been checked. */ if (ret > 0 && args.sector != capacity) { pr_warn("%s: Missing zones from sector %llu\n", disk->disk_name, args.sector); ret = -ENODEV; } /* * Set the new disk zone parameters only once the queue is frozen and * all I/Os are completed. */ if (ret > 0) ret = disk_update_zone_resources(disk, &args); else pr_warn("%s: failed to revalidate zones\n", disk->disk_name); if (ret) { unsigned int memflags = blk_mq_freeze_queue(q); disk_free_zone_resources(disk); blk_mq_unfreeze_queue(q, memflags); } return ret; } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones); /** * blk_zone_issue_zeroout - zero-fill a block range in a zone * @bdev: blockdev to write * @sector: start sector * @nr_sects: number of sectors to write * @gfp_mask: memory allocation flags (for bio_alloc) * * Description: * Zero-fill a block range in a zone (@sector must be equal to the zone write * pointer), handling potential errors due to the (initially unknown) lack of * hardware offload (See blkdev_issue_zeroout()). */ int blk_zone_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask) { int ret; if (WARN_ON_ONCE(!bdev_is_zoned(bdev))) return -EIO; ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, BLKDEV_ZERO_NOFALLBACK); if (ret != -EOPNOTSUPP) return ret; /* * The failed call to blkdev_issue_zeroout() advanced the zone write * pointer. Undo this using a report zone to update the zone write * pointer to the correct current value. */ ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector); if (ret != 1) return ret < 0 ? ret : -EIO; /* * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a * regular write with zero-pages. */ return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0); } EXPORT_SYMBOL_GPL(blk_zone_issue_zeroout); #ifdef CONFIG_BLK_DEBUG_FS static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug, struct seq_file *m) { unsigned int zwp_wp_offset, zwp_flags; unsigned int zwp_zone_no, zwp_ref; unsigned int zwp_bio_list_size; unsigned long flags; spin_lock_irqsave(&zwplug->lock, flags); zwp_zone_no = zwplug->zone_no; zwp_flags = zwplug->flags; zwp_ref = refcount_read(&zwplug->ref); zwp_wp_offset = zwplug->wp_offset; zwp_bio_list_size = bio_list_size(&zwplug->bio_list); spin_unlock_irqrestore(&zwplug->lock, flags); seq_printf(m, "%u 0x%x %u %u %u\n", zwp_zone_no, zwp_flags, zwp_ref, zwp_wp_offset, zwp_bio_list_size); } int queue_zone_wplugs_show(void *data, struct seq_file *m) { struct request_queue *q = data; struct gendisk *disk = q->disk; struct blk_zone_wplug *zwplug; unsigned int i; if (!disk->zone_wplugs_hash) return 0; rcu_read_lock(); for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++) hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[i], node) queue_zone_wplug_show(zwplug, m); rcu_read_unlock(); return 0; } #endif |
| 170 154 159 158 158 158 17 16 15 16 13 2 15 2 2 2 27 8 27 26 7 3 9 1 3 2 2 2 2 27 3 2 2 2 1 1 2 7 2 1 2 1 5 5 1 5 4 2 3 3 1 2 2 1 1 1 2 2 2 2 139 167 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 | // SPDX-License-Identifier: GPL-2.0 #include <linux/mount.h> #include <linux/pseudo_fs.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/proc_fs.h> #include <linux/proc_ns.h> #include <linux/magic.h> #include <linux/ktime.h> #include <linux/seq_file.h> #include <linux/pid_namespace.h> #include <linux/user_namespace.h> #include <linux/nsfs.h> #include <linux/uaccess.h> #include <linux/mnt_namespace.h> #include "mount.h" #include "internal.h" static struct vfsmount *nsfs_mnt; static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); static const struct file_operations ns_file_operations = { .unlocked_ioctl = ns_ioctl, .compat_ioctl = compat_ptr_ioctl, }; static char *ns_dname(struct dentry *dentry, char *buffer, int buflen) { struct inode *inode = d_inode(dentry); struct ns_common *ns = inode->i_private; const struct proc_ns_operations *ns_ops = ns->ops; return dynamic_dname(buffer, buflen, "%s:[%lu]", ns_ops->name, inode->i_ino); } const struct dentry_operations ns_dentry_operations = { .d_dname = ns_dname, .d_prune = stashed_dentry_prune, }; static void nsfs_evict(struct inode *inode) { struct ns_common *ns = inode->i_private; clear_inode(inode); ns->ops->put(ns); } int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb, void *private_data) { struct ns_common *ns; ns = ns_get_cb(private_data); if (!ns) return -ENOENT; return path_from_stashed(&ns->stashed, nsfs_mnt, ns, path); } struct ns_get_path_task_args { const struct proc_ns_operations *ns_ops; struct task_struct *task; }; static struct ns_common *ns_get_path_task(void *private_data) { struct ns_get_path_task_args *args = private_data; return args->ns_ops->get(args->task); } int ns_get_path(struct path *path, struct task_struct *task, const struct proc_ns_operations *ns_ops) { struct ns_get_path_task_args args = { .ns_ops = ns_ops, .task = task, }; return ns_get_path_cb(path, ns_get_path_task, &args); } /** * open_namespace - open a namespace * @ns: the namespace to open * * This will consume a reference to @ns indendent of success or failure. * * Return: A file descriptor on success or a negative error code on failure. */ int open_namespace(struct ns_common *ns) { struct path path __free(path_put) = {}; struct file *f; int err; /* call first to consume reference */ err = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); if (err < 0) return err; CLASS(get_unused_fd, fd)(O_CLOEXEC); if (fd < 0) return fd; f = dentry_open(&path, O_RDONLY, current_cred()); if (IS_ERR(f)) return PTR_ERR(f); fd_install(fd, f); return take_fd(fd); } int open_related_ns(struct ns_common *ns, struct ns_common *(*get_ns)(struct ns_common *ns)) { struct ns_common *relative; relative = get_ns(ns); if (IS_ERR(relative)) return PTR_ERR(relative); return open_namespace(relative); } EXPORT_SYMBOL_GPL(open_related_ns); static int copy_ns_info_to_user(const struct mnt_namespace *mnt_ns, struct mnt_ns_info __user *uinfo, size_t usize, struct mnt_ns_info *kinfo) { /* * If userspace and the kernel have the same struct size it can just * be copied. If userspace provides an older struct, only the bits that * userspace knows about will be copied. If userspace provides a new * struct, only the bits that the kernel knows aobut will be copied and * the size value will be set to the size the kernel knows about. */ kinfo->size = min(usize, sizeof(*kinfo)); kinfo->mnt_ns_id = mnt_ns->seq; kinfo->nr_mounts = READ_ONCE(mnt_ns->nr_mounts); /* Subtract the root mount of the mount namespace. */ if (kinfo->nr_mounts) kinfo->nr_mounts--; if (copy_to_user(uinfo, kinfo, kinfo->size)) return -EFAULT; return 0; } static bool nsfs_ioctl_valid(unsigned int cmd) { switch (cmd) { case NS_GET_USERNS: case NS_GET_PARENT: case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS: case NS_GET_TGID_IN_PIDNS: return (_IOC_TYPE(cmd) == _IOC_TYPE(cmd)); } /* Extensible ioctls require some extra handling. */ switch (_IOC_NR(cmd)) { case _IOC_NR(NS_MNT_GET_INFO): case _IOC_NR(NS_MNT_GET_NEXT): case _IOC_NR(NS_MNT_GET_PREV): return (_IOC_TYPE(cmd) == _IOC_TYPE(cmd)); } return false; } static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct user_namespace *user_ns; struct pid_namespace *pid_ns; struct task_struct *tsk; struct ns_common *ns; struct mnt_namespace *mnt_ns; bool previous = false; uid_t __user *argp; uid_t uid; int ret; if (!nsfs_ioctl_valid(ioctl)) return -ENOIOCTLCMD; ns = get_proc_ns(file_inode(filp)); switch (ioctl) { case NS_GET_USERNS: return open_related_ns(ns, ns_get_owner); case NS_GET_PARENT: if (!ns->ops->get_parent) return -EINVAL; return open_related_ns(ns, ns->ops->get_parent); case NS_GET_NSTYPE: return ns->ops->type; case NS_GET_OWNER_UID: if (ns->ops->type != CLONE_NEWUSER) return -EINVAL; user_ns = container_of(ns, struct user_namespace, ns); argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp); case NS_GET_MNTNS_ID: { __u64 __user *idp; __u64 id; if (ns->ops->type != CLONE_NEWNS) return -EINVAL; mnt_ns = container_of(ns, struct mnt_namespace, ns); idp = (__u64 __user *)arg; id = mnt_ns->seq; return put_user(id, idp); } case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS: fallthrough; case NS_GET_PID_IN_PIDNS: fallthrough; case NS_GET_TGID_IN_PIDNS: { if (ns->ops->type != CLONE_NEWPID) return -EINVAL; ret = -ESRCH; pid_ns = container_of(ns, struct pid_namespace, ns); guard(rcu)(); if (ioctl == NS_GET_PID_IN_PIDNS || ioctl == NS_GET_TGID_IN_PIDNS) tsk = find_task_by_vpid(arg); else tsk = find_task_by_pid_ns(arg, pid_ns); if (!tsk) break; switch (ioctl) { case NS_GET_PID_FROM_PIDNS: ret = task_pid_vnr(tsk); break; case NS_GET_TGID_FROM_PIDNS: ret = task_tgid_vnr(tsk); break; case NS_GET_PID_IN_PIDNS: ret = task_pid_nr_ns(tsk, pid_ns); break; case NS_GET_TGID_IN_PIDNS: ret = task_tgid_nr_ns(tsk, pid_ns); break; default: ret = 0; break; } if (!ret) ret = -ESRCH; return ret; } } /* extensible ioctls */ switch (_IOC_NR(ioctl)) { case _IOC_NR(NS_MNT_GET_INFO): { struct mnt_ns_info kinfo = {}; struct mnt_ns_info __user *uinfo = (struct mnt_ns_info __user *)arg; size_t usize = _IOC_SIZE(ioctl); if (ns->ops->type != CLONE_NEWNS) return -EINVAL; if (!uinfo) return -EINVAL; if (usize < MNT_NS_INFO_SIZE_VER0) return -EINVAL; return copy_ns_info_to_user(to_mnt_ns(ns), uinfo, usize, &kinfo); } case _IOC_NR(NS_MNT_GET_PREV): previous = true; fallthrough; case _IOC_NR(NS_MNT_GET_NEXT): { struct mnt_ns_info kinfo = {}; struct mnt_ns_info __user *uinfo = (struct mnt_ns_info __user *)arg; struct path path __free(path_put) = {}; struct file *f __free(fput) = NULL; size_t usize = _IOC_SIZE(ioctl); if (ns->ops->type != CLONE_NEWNS) return -EINVAL; if (usize < MNT_NS_INFO_SIZE_VER0) return -EINVAL; mnt_ns = get_sequential_mnt_ns(to_mnt_ns(ns), previous); if (IS_ERR(mnt_ns)) return PTR_ERR(mnt_ns); ns = to_ns_common(mnt_ns); /* Transfer ownership of @mnt_ns reference to @path. */ ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); if (ret) return ret; CLASS(get_unused_fd, fd)(O_CLOEXEC); if (fd < 0) return fd; f = dentry_open(&path, O_RDONLY, current_cred()); if (IS_ERR(f)) return PTR_ERR(f); if (uinfo) { /* * If @uinfo is passed return all information about the * mount namespace as well. */ ret = copy_ns_info_to_user(to_mnt_ns(ns), uinfo, usize, &kinfo); if (ret) return ret; } /* Transfer reference of @f to caller's fdtable. */ fd_install(fd, no_free_ptr(f)); /* File descriptor is live so hand it off to the caller. */ return take_fd(fd); } default: ret = -ENOTTY; } return ret; } int ns_get_name(char *buf, size_t size, struct task_struct *task, const struct proc_ns_operations *ns_ops) { struct ns_common *ns; int res = -ENOENT; const char *name; ns = ns_ops->get(task); if (ns) { name = ns_ops->real_ns_name ? : ns_ops->name; res = snprintf(buf, size, "%s:[%u]", name, ns->inum); ns_ops->put(ns); } return res; } bool proc_ns_file(const struct file *file) { return file->f_op == &ns_file_operations; } /** * ns_match() - Returns true if current namespace matches dev/ino provided. * @ns: current namespace * @dev: dev_t from nsfs that will be matched against current nsfs * @ino: ino_t from nsfs that will be matched against current nsfs * * Return: true if dev and ino matches the current nsfs. */ bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino) { return (ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev); } static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry) { struct inode *inode = d_inode(dentry); const struct ns_common *ns = inode->i_private; const struct proc_ns_operations *ns_ops = ns->ops; seq_printf(seq, "%s:[%lu]", ns_ops->name, inode->i_ino); return 0; } static const struct super_operations nsfs_ops = { .statfs = simple_statfs, .evict_inode = nsfs_evict, .show_path = nsfs_show_path, }; static int nsfs_init_inode(struct inode *inode, void *data) { struct ns_common *ns = data; inode->i_private = data; inode->i_mode |= S_IRUGO; inode->i_fop = &ns_file_operations; inode->i_ino = ns->inum; return 0; } static void nsfs_put_data(void *data) { struct ns_common *ns = data; ns->ops->put(ns); } static const struct stashed_operations nsfs_stashed_ops = { .init_inode = nsfs_init_inode, .put_data = nsfs_put_data, }; static int nsfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &nsfs_ops; ctx->dops = &ns_dentry_operations; fc->s_fs_info = (void *)&nsfs_stashed_ops; return 0; } static struct file_system_type nsfs = { .name = "nsfs", .init_fs_context = nsfs_init_fs_context, .kill_sb = kill_anon_super, }; void __init nsfs_init(void) { nsfs_mnt = kern_mount(&nsfs); if (IS_ERR(nsfs_mnt)) panic("can't set nsfs up\n"); nsfs_mnt->mnt_sb->s_flags &= ~SB_NOUSER; } |
| 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | // SPDX-License-Identifier: GPL-2.0+ /* * Fast-charge control for Apple "MFi" devices * * Copyright (C) 2019 Bastien Nocera <hadess@hadess.net> */ /* Standard include files */ #include <linux/module.h> #include <linux/power_supply.h> #include <linux/slab.h> #include <linux/usb.h> MODULE_AUTHOR("Bastien Nocera <hadess@hadess.net>"); MODULE_DESCRIPTION("Fast-charge control for Apple \"MFi\" devices"); MODULE_LICENSE("GPL"); #define TRICKLE_CURRENT_MA 0 #define FAST_CURRENT_MA 2500 #define APPLE_VENDOR_ID 0x05ac /* Apple */ /* The product ID is defined as starting with 0x12nn, as per the * "Choosing an Apple Device USB Configuration" section in * release R9 (2012) of the "MFi Accessory Hardware Specification" * * To distinguish an Apple device, a USB host can check the device * descriptor of attached USB devices for the following fields: * ■ Vendor ID: 0x05AC * ■ Product ID: 0x12nn * * Those checks will be done in .match() and .probe(). */ static const struct usb_device_id mfi_fc_id_table[] = { { .idVendor = APPLE_VENDOR_ID, .match_flags = USB_DEVICE_ID_MATCH_VENDOR }, {}, }; MODULE_DEVICE_TABLE(usb, mfi_fc_id_table); /* Driver-local specific stuff */ struct mfi_device { struct usb_device *udev; struct power_supply *battery; int charge_type; }; static int apple_mfi_fc_set_charge_type(struct mfi_device *mfi, const union power_supply_propval *val) { int current_ma; int retval; __u8 request_type; if (mfi->charge_type == val->intval) { dev_dbg(&mfi->udev->dev, "charge type %d already set\n", mfi->charge_type); return 0; } switch (val->intval) { case POWER_SUPPLY_CHARGE_TYPE_TRICKLE: current_ma = TRICKLE_CURRENT_MA; break; case POWER_SUPPLY_CHARGE_TYPE_FAST: current_ma = FAST_CURRENT_MA; break; default: return -EINVAL; } request_type = USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE; retval = usb_control_msg(mfi->udev, usb_sndctrlpipe(mfi->udev, 0), 0x40, /* Vendor‐defined power request */ request_type, current_ma, /* wValue, current offset */ current_ma, /* wIndex, current offset */ NULL, 0, USB_CTRL_GET_TIMEOUT); if (retval) { dev_dbg(&mfi->udev->dev, "retval = %d\n", retval); return retval; } mfi->charge_type = val->intval; return 0; } static int apple_mfi_fc_get_property(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { struct mfi_device *mfi = power_supply_get_drvdata(psy); dev_dbg(&mfi->udev->dev, "prop: %d\n", psp); switch (psp) { case POWER_SUPPLY_PROP_CHARGE_TYPE: val->intval = mfi->charge_type; break; case POWER_SUPPLY_PROP_SCOPE: val->intval = POWER_SUPPLY_SCOPE_DEVICE; break; default: return -ENODATA; } return 0; } static int apple_mfi_fc_set_property(struct power_supply *psy, enum power_supply_property psp, const union power_supply_propval *val) { struct mfi_device *mfi = power_supply_get_drvdata(psy); int ret; dev_dbg(&mfi->udev->dev, "prop: %d\n", psp); ret = pm_runtime_get_sync(&mfi->udev->dev); if (ret < 0) { pm_runtime_put_noidle(&mfi->udev->dev); return ret; } switch (psp) { case POWER_SUPPLY_PROP_CHARGE_TYPE: ret = apple_mfi_fc_set_charge_type(mfi, val); break; default: ret = -EINVAL; } pm_runtime_mark_last_busy(&mfi->udev->dev); pm_runtime_put_autosuspend(&mfi->udev->dev); return ret; } static int apple_mfi_fc_property_is_writeable(struct power_supply *psy, enum power_supply_property psp) { switch (psp) { case POWER_SUPPLY_PROP_CHARGE_TYPE: return 1; default: return 0; } } static enum power_supply_property apple_mfi_fc_properties[] = { POWER_SUPPLY_PROP_CHARGE_TYPE, POWER_SUPPLY_PROP_SCOPE }; static const struct power_supply_desc apple_mfi_fc_desc = { .name = "apple_mfi_fastcharge", .type = POWER_SUPPLY_TYPE_BATTERY, .properties = apple_mfi_fc_properties, .num_properties = ARRAY_SIZE(apple_mfi_fc_properties), .get_property = apple_mfi_fc_get_property, .set_property = apple_mfi_fc_set_property, .property_is_writeable = apple_mfi_fc_property_is_writeable }; static bool mfi_fc_match(struct usb_device *udev) { int idProduct; idProduct = le16_to_cpu(udev->descriptor.idProduct); /* See comment above mfi_fc_id_table[] */ return (idProduct >= 0x1200 && idProduct <= 0x12ff); } static int mfi_fc_probe(struct usb_device *udev) { struct power_supply_config battery_cfg = {}; struct mfi_device *mfi = NULL; int err; if (!mfi_fc_match(udev)) return -ENODEV; mfi = kzalloc(sizeof(struct mfi_device), GFP_KERNEL); if (!mfi) return -ENOMEM; battery_cfg.drv_data = mfi; mfi->charge_type = POWER_SUPPLY_CHARGE_TYPE_TRICKLE; mfi->battery = power_supply_register(&udev->dev, &apple_mfi_fc_desc, &battery_cfg); if (IS_ERR(mfi->battery)) { dev_err(&udev->dev, "Can't register battery\n"); err = PTR_ERR(mfi->battery); kfree(mfi); return err; } mfi->udev = usb_get_dev(udev); dev_set_drvdata(&udev->dev, mfi); return 0; } static void mfi_fc_disconnect(struct usb_device *udev) { struct mfi_device *mfi; mfi = dev_get_drvdata(&udev->dev); if (mfi->battery) power_supply_unregister(mfi->battery); dev_set_drvdata(&udev->dev, NULL); usb_put_dev(mfi->udev); kfree(mfi); } static struct usb_device_driver mfi_fc_driver = { .name = "apple-mfi-fastcharge", .probe = mfi_fc_probe, .disconnect = mfi_fc_disconnect, .id_table = mfi_fc_id_table, .match = mfi_fc_match, .generic_subclass = 1, }; static int __init mfi_fc_driver_init(void) { return usb_register_device_driver(&mfi_fc_driver, THIS_MODULE); } static void __exit mfi_fc_driver_exit(void) { usb_deregister_device_driver(&mfi_fc_driver); } module_init(mfi_fc_driver_init); module_exit(mfi_fc_driver_exit); |
| 21 22 22 1 1 1 1 1 1 1 1 1 18 18 18 18 18 18 17 18 17 18 18 18 18 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 | // SPDX-License-Identifier: GPL-2.0-or-later /* * drm gem framebuffer helper functions * * Copyright (C) 2017 Noralf Trønnes */ #include <linux/slab.h> #include <linux/module.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fourcc.h> #include <drm/drm_framebuffer.h> #include <drm/drm_gem.h> #include <drm/drm_gem_framebuffer_helper.h> #include <drm/drm_modeset_helper.h> #include "drm_internal.h" MODULE_IMPORT_NS("DMA_BUF"); #define AFBC_HEADER_SIZE 16 #define AFBC_TH_LAYOUT_ALIGNMENT 8 #define AFBC_HDR_ALIGN 64 #define AFBC_SUPERBLOCK_PIXELS 256 #define AFBC_SUPERBLOCK_ALIGNMENT 128 #define AFBC_TH_BODY_START_ALIGNMENT 4096 /** * DOC: overview * * This library provides helpers for drivers that don't subclass * &drm_framebuffer and use &drm_gem_object for their backing storage. * * Drivers without additional needs to validate framebuffers can simply use * drm_gem_fb_create() and everything is wired up automatically. Other drivers * can use all parts independently. */ /** * drm_gem_fb_get_obj() - Get GEM object backing the framebuffer * @fb: Framebuffer * @plane: Plane index * * No additional reference is taken beyond the one that the &drm_frambuffer * already holds. * * Returns: * Pointer to &drm_gem_object for the given framebuffer and plane index or NULL * if it does not exist. */ struct drm_gem_object *drm_gem_fb_get_obj(struct drm_framebuffer *fb, unsigned int plane) { struct drm_device *dev = fb->dev; if (drm_WARN_ON_ONCE(dev, plane >= ARRAY_SIZE(fb->obj))) return NULL; else if (drm_WARN_ON_ONCE(dev, !fb->obj[plane])) return NULL; return fb->obj[plane]; } EXPORT_SYMBOL_GPL(drm_gem_fb_get_obj); static int drm_gem_fb_init(struct drm_device *dev, struct drm_framebuffer *fb, const struct drm_mode_fb_cmd2 *mode_cmd, struct drm_gem_object **obj, unsigned int num_planes, const struct drm_framebuffer_funcs *funcs) { unsigned int i; int ret; drm_helper_mode_fill_fb_struct(dev, fb, mode_cmd); for (i = 0; i < num_planes; i++) fb->obj[i] = obj[i]; ret = drm_framebuffer_init(dev, fb, funcs); if (ret) drm_err(dev, "Failed to init framebuffer: %d\n", ret); return ret; } /** * drm_gem_fb_destroy - Free GEM backed framebuffer * @fb: Framebuffer * * Frees a GEM backed framebuffer with its backing buffer(s) and the structure * itself. Drivers can use this as their &drm_framebuffer_funcs->destroy * callback. */ void drm_gem_fb_destroy(struct drm_framebuffer *fb) { unsigned int i; for (i = 0; i < fb->format->num_planes; i++) drm_gem_object_put(fb->obj[i]); drm_framebuffer_cleanup(fb); kfree(fb); } EXPORT_SYMBOL(drm_gem_fb_destroy); /** * drm_gem_fb_create_handle - Create handle for GEM backed framebuffer * @fb: Framebuffer * @file: DRM file to register the handle for * @handle: Pointer to return the created handle * * This function creates a handle for the GEM object backing the framebuffer. * Drivers can use this as their &drm_framebuffer_funcs->create_handle * callback. The GETFB IOCTL calls into this callback. * * Returns: * 0 on success or a negative error code on failure. */ int drm_gem_fb_create_handle(struct drm_framebuffer *fb, struct drm_file *file, unsigned int *handle) { return drm_gem_handle_create(file, fb->obj[0], handle); } EXPORT_SYMBOL(drm_gem_fb_create_handle); /** * drm_gem_fb_init_with_funcs() - Helper function for implementing * &drm_mode_config_funcs.fb_create * callback in cases when the driver * allocates a subclass of * struct drm_framebuffer * @dev: DRM device * @fb: framebuffer object * @file: DRM file that holds the GEM handle(s) backing the framebuffer * @mode_cmd: Metadata from the userspace framebuffer creation request * @funcs: vtable to be used for the new framebuffer object * * This function can be used to set &drm_framebuffer_funcs for drivers that need * custom framebuffer callbacks. Use drm_gem_fb_create() if you don't need to * change &drm_framebuffer_funcs. The function does buffer size validation. * The buffer size validation is for a general case, though, so users should * pay attention to the checks being appropriate for them or, at least, * non-conflicting. * * Returns: * Zero or a negative error code. */ int drm_gem_fb_init_with_funcs(struct drm_device *dev, struct drm_framebuffer *fb, struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd, const struct drm_framebuffer_funcs *funcs) { const struct drm_format_info *info; struct drm_gem_object *objs[DRM_FORMAT_MAX_PLANES]; unsigned int i; int ret; info = drm_get_format_info(dev, mode_cmd); if (!info) { drm_dbg_kms(dev, "Failed to get FB format info\n"); return -EINVAL; } if (drm_drv_uses_atomic_modeset(dev) && !drm_any_plane_has_format(dev, mode_cmd->pixel_format, mode_cmd->modifier[0])) { drm_dbg_kms(dev, "Unsupported pixel format %p4cc / modifier 0x%llx\n", &mode_cmd->pixel_format, mode_cmd->modifier[0]); return -EINVAL; } for (i = 0; i < info->num_planes; i++) { unsigned int width = mode_cmd->width / (i ? info->hsub : 1); unsigned int height = mode_cmd->height / (i ? info->vsub : 1); unsigned int min_size; objs[i] = drm_gem_object_lookup(file, mode_cmd->handles[i]); if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT; goto err_gem_object_put; } min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) + mode_cmd->offsets[i]; if (objs[i]->size < min_size) { drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); drm_gem_object_put(objs[i]); ret = -EINVAL; goto err_gem_object_put; } } ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret) goto err_gem_object_put; return 0; err_gem_object_put: while (i > 0) { --i; drm_gem_object_put(objs[i]); } return ret; } EXPORT_SYMBOL_GPL(drm_gem_fb_init_with_funcs); /** * drm_gem_fb_create_with_funcs() - Helper function for the * &drm_mode_config_funcs.fb_create * callback * @dev: DRM device * @file: DRM file that holds the GEM handle(s) backing the framebuffer * @mode_cmd: Metadata from the userspace framebuffer creation request * @funcs: vtable to be used for the new framebuffer object * * This function can be used to set &drm_framebuffer_funcs for drivers that need * custom framebuffer callbacks. Use drm_gem_fb_create() if you don't need to * change &drm_framebuffer_funcs. The function does buffer size validation. * * Returns: * Pointer to a &drm_framebuffer on success or an error pointer on failure. */ struct drm_framebuffer * drm_gem_fb_create_with_funcs(struct drm_device *dev, struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd, const struct drm_framebuffer_funcs *funcs) { struct drm_framebuffer *fb; int ret; fb = kzalloc(sizeof(*fb), GFP_KERNEL); if (!fb) return ERR_PTR(-ENOMEM); ret = drm_gem_fb_init_with_funcs(dev, fb, file, mode_cmd, funcs); if (ret) { kfree(fb); return ERR_PTR(ret); } return fb; } EXPORT_SYMBOL_GPL(drm_gem_fb_create_with_funcs); static const struct drm_framebuffer_funcs drm_gem_fb_funcs = { .destroy = drm_gem_fb_destroy, .create_handle = drm_gem_fb_create_handle, }; /** * drm_gem_fb_create() - Helper function for the * &drm_mode_config_funcs.fb_create callback * @dev: DRM device * @file: DRM file that holds the GEM handle(s) backing the framebuffer * @mode_cmd: Metadata from the userspace framebuffer creation request * * This function creates a new framebuffer object described by * &drm_mode_fb_cmd2. This description includes handles for the buffer(s) * backing the framebuffer. * * If your hardware has special alignment or pitch requirements these should be * checked before calling this function. The function does buffer size * validation. Use drm_gem_fb_create_with_dirty() if you need framebuffer * flushing. * * Drivers can use this as their &drm_mode_config_funcs.fb_create callback. * The ADDFB2 IOCTL calls into this callback. * * Returns: * Pointer to a &drm_framebuffer on success or an error pointer on failure. */ struct drm_framebuffer * drm_gem_fb_create(struct drm_device *dev, struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd) { return drm_gem_fb_create_with_funcs(dev, file, mode_cmd, &drm_gem_fb_funcs); } EXPORT_SYMBOL_GPL(drm_gem_fb_create); static const struct drm_framebuffer_funcs drm_gem_fb_funcs_dirtyfb = { .destroy = drm_gem_fb_destroy, .create_handle = drm_gem_fb_create_handle, .dirty = drm_atomic_helper_dirtyfb, }; /** * drm_gem_fb_create_with_dirty() - Helper function for the * &drm_mode_config_funcs.fb_create callback * @dev: DRM device * @file: DRM file that holds the GEM handle(s) backing the framebuffer * @mode_cmd: Metadata from the userspace framebuffer creation request * * This function creates a new framebuffer object described by * &drm_mode_fb_cmd2. This description includes handles for the buffer(s) * backing the framebuffer. drm_atomic_helper_dirtyfb() is used for the dirty * callback giving framebuffer flushing through the atomic machinery. Use * drm_gem_fb_create() if you don't need the dirty callback. * The function does buffer size validation. * * Drivers should also call drm_plane_enable_fb_damage_clips() on all planes * to enable userspace to use damage clips also with the ATOMIC IOCTL. * * Drivers can use this as their &drm_mode_config_funcs.fb_create callback. * The ADDFB2 IOCTL calls into this callback. * * Returns: * Pointer to a &drm_framebuffer on success or an error pointer on failure. */ struct drm_framebuffer * drm_gem_fb_create_with_dirty(struct drm_device *dev, struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd) { return drm_gem_fb_create_with_funcs(dev, file, mode_cmd, &drm_gem_fb_funcs_dirtyfb); } EXPORT_SYMBOL_GPL(drm_gem_fb_create_with_dirty); /** * drm_gem_fb_vmap - maps all framebuffer BOs into kernel address space * @fb: the framebuffer * @map: returns the mapping's address for each BO * @data: returns the data address for each BO, can be NULL * * This function maps all buffer objects of the given framebuffer into * kernel address space and stores them in struct iosys_map. If the * mapping operation fails for one of the BOs, the function unmaps the * already established mappings automatically. * * Callers that want to access a BO's stored data should pass @data. * The argument returns the addresses of the data stored in each BO. This * is different from @map if the framebuffer's offsets field is non-zero. * * Both, @map and @data, must each refer to arrays with at least * fb->format->num_planes elements. * * See drm_gem_fb_vunmap() for unmapping. * * Returns: * 0 on success, or a negative errno code otherwise. */ int drm_gem_fb_vmap(struct drm_framebuffer *fb, struct iosys_map *map, struct iosys_map *data) { struct drm_gem_object *obj; unsigned int i; int ret; for (i = 0; i < fb->format->num_planes; ++i) { obj = drm_gem_fb_get_obj(fb, i); if (!obj) { ret = -EINVAL; goto err_drm_gem_vunmap; } ret = drm_gem_vmap(obj, &map[i]); if (ret) goto err_drm_gem_vunmap; } if (data) { for (i = 0; i < fb->format->num_planes; ++i) { memcpy(&data[i], &map[i], sizeof(data[i])); if (iosys_map_is_null(&data[i])) continue; iosys_map_incr(&data[i], fb->offsets[i]); } } return 0; err_drm_gem_vunmap: while (i) { --i; obj = drm_gem_fb_get_obj(fb, i); if (!obj) continue; drm_gem_vunmap(obj, &map[i]); } return ret; } EXPORT_SYMBOL(drm_gem_fb_vmap); /** * drm_gem_fb_vunmap - unmaps framebuffer BOs from kernel address space * @fb: the framebuffer * @map: mapping addresses as returned by drm_gem_fb_vmap() * * This function unmaps all buffer objects of the given framebuffer. * * See drm_gem_fb_vmap() for more information. */ void drm_gem_fb_vunmap(struct drm_framebuffer *fb, struct iosys_map *map) { unsigned int i = fb->format->num_planes; struct drm_gem_object *obj; while (i) { --i; obj = drm_gem_fb_get_obj(fb, i); if (!obj) continue; if (iosys_map_is_null(&map[i])) continue; drm_gem_vunmap(obj, &map[i]); } } EXPORT_SYMBOL(drm_gem_fb_vunmap); static void __drm_gem_fb_end_cpu_access(struct drm_framebuffer *fb, enum dma_data_direction dir, unsigned int num_planes) { struct drm_gem_object *obj; int ret; while (num_planes) { --num_planes; obj = drm_gem_fb_get_obj(fb, num_planes); if (!obj) continue; if (!drm_gem_is_imported(obj)) continue; ret = dma_buf_end_cpu_access(obj->dma_buf, dir); if (ret) drm_err(fb->dev, "dma_buf_end_cpu_access(%u, %d) failed: %d\n", ret, num_planes, dir); } } /** * drm_gem_fb_begin_cpu_access - prepares GEM buffer objects for CPU access * @fb: the framebuffer * @dir: access mode * * Prepares a framebuffer's GEM buffer objects for CPU access. This function * must be called before accessing the BO data within the kernel. For imported * BOs, the function calls dma_buf_begin_cpu_access(). * * See drm_gem_fb_end_cpu_access() for signalling the end of CPU access. * * Returns: * 0 on success, or a negative errno code otherwise. */ int drm_gem_fb_begin_cpu_access(struct drm_framebuffer *fb, enum dma_data_direction dir) { struct drm_gem_object *obj; unsigned int i; int ret; for (i = 0; i < fb->format->num_planes; ++i) { obj = drm_gem_fb_get_obj(fb, i); if (!obj) { ret = -EINVAL; goto err___drm_gem_fb_end_cpu_access; } if (!drm_gem_is_imported(obj)) continue; ret = dma_buf_begin_cpu_access(obj->dma_buf, dir); if (ret) goto err___drm_gem_fb_end_cpu_access; } return 0; err___drm_gem_fb_end_cpu_access: __drm_gem_fb_end_cpu_access(fb, dir, i); return ret; } EXPORT_SYMBOL(drm_gem_fb_begin_cpu_access); /** * drm_gem_fb_end_cpu_access - signals end of CPU access to GEM buffer objects * @fb: the framebuffer * @dir: access mode * * Signals the end of CPU access to the given framebuffer's GEM buffer objects. This * function must be paired with a corresponding call to drm_gem_fb_begin_cpu_access(). * For imported BOs, the function calls dma_buf_end_cpu_access(). * * See also drm_gem_fb_begin_cpu_access(). */ void drm_gem_fb_end_cpu_access(struct drm_framebuffer *fb, enum dma_data_direction dir) { __drm_gem_fb_end_cpu_access(fb, dir, fb->format->num_planes); } EXPORT_SYMBOL(drm_gem_fb_end_cpu_access); // TODO Drop this function and replace by drm_format_info_bpp() once all // DRM_FORMAT_* provide proper block info in drivers/gpu/drm/drm_fourcc.c static __u32 drm_gem_afbc_get_bpp(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd) { const struct drm_format_info *info; info = drm_get_format_info(dev, mode_cmd); switch (info->format) { case DRM_FORMAT_YUV420_8BIT: return 12; case DRM_FORMAT_YUV420_10BIT: return 15; case DRM_FORMAT_VUY101010: return 30; default: return drm_format_info_bpp(info, 0); } } static int drm_gem_afbc_min_size(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, struct drm_afbc_framebuffer *afbc_fb) { __u32 n_blocks, w_alignment, h_alignment, hdr_alignment; /* remove bpp when all users properly encode cpp in drm_format_info */ __u32 bpp; switch (mode_cmd->modifier[0] & AFBC_FORMAT_MOD_BLOCK_SIZE_MASK) { case AFBC_FORMAT_MOD_BLOCK_SIZE_16x16: afbc_fb->block_width = 16; afbc_fb->block_height = 16; break; case AFBC_FORMAT_MOD_BLOCK_SIZE_32x8: afbc_fb->block_width = 32; afbc_fb->block_height = 8; break; /* no user exists yet - fall through */ case AFBC_FORMAT_MOD_BLOCK_SIZE_64x4: case AFBC_FORMAT_MOD_BLOCK_SIZE_32x8_64x4: default: drm_dbg_kms(dev, "Invalid AFBC_FORMAT_MOD_BLOCK_SIZE: %lld.\n", mode_cmd->modifier[0] & AFBC_FORMAT_MOD_BLOCK_SIZE_MASK); return -EINVAL; } /* tiled header afbc */ w_alignment = afbc_fb->block_width; h_alignment = afbc_fb->block_height; hdr_alignment = AFBC_HDR_ALIGN; if (mode_cmd->modifier[0] & AFBC_FORMAT_MOD_TILED) { w_alignment *= AFBC_TH_LAYOUT_ALIGNMENT; h_alignment *= AFBC_TH_LAYOUT_ALIGNMENT; hdr_alignment = AFBC_TH_BODY_START_ALIGNMENT; } afbc_fb->aligned_width = ALIGN(mode_cmd->width, w_alignment); afbc_fb->aligned_height = ALIGN(mode_cmd->height, h_alignment); afbc_fb->offset = mode_cmd->offsets[0]; bpp = drm_gem_afbc_get_bpp(dev, mode_cmd); if (!bpp) { drm_dbg_kms(dev, "Invalid AFBC bpp value: %d\n", bpp); return -EINVAL; } n_blocks = (afbc_fb->aligned_width * afbc_fb->aligned_height) / AFBC_SUPERBLOCK_PIXELS; afbc_fb->afbc_size = ALIGN(n_blocks * AFBC_HEADER_SIZE, hdr_alignment); afbc_fb->afbc_size += n_blocks * ALIGN(bpp * AFBC_SUPERBLOCK_PIXELS / 8, AFBC_SUPERBLOCK_ALIGNMENT); return 0; } /** * drm_gem_fb_afbc_init() - Helper function for drivers using afbc to * fill and validate all the afbc-specific * struct drm_afbc_framebuffer members * * @dev: DRM device * @afbc_fb: afbc-specific framebuffer * @mode_cmd: Metadata from the userspace framebuffer creation request * @afbc_fb: afbc framebuffer * * This function can be used by drivers which support afbc to complete * the preparation of struct drm_afbc_framebuffer. It must be called after * allocating the said struct and calling drm_gem_fb_init_with_funcs(). * It is caller's responsibility to put afbc_fb->base.obj objects in case * the call is unsuccessful. * * Returns: * Zero on success or a negative error value on failure. */ int drm_gem_fb_afbc_init(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, struct drm_afbc_framebuffer *afbc_fb) { const struct drm_format_info *info; struct drm_gem_object **objs; int ret; objs = afbc_fb->base.obj; info = drm_get_format_info(dev, mode_cmd); if (!info) return -EINVAL; ret = drm_gem_afbc_min_size(dev, mode_cmd, afbc_fb); if (ret < 0) return ret; if (objs[0]->size < afbc_fb->afbc_size) return -EINVAL; return 0; } EXPORT_SYMBOL_GPL(drm_gem_fb_afbc_init); |
| 104 2 11 10 19 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the UDP protocol. * * Version: @(#)udp.h 1.0.2 04/28/93 * * Author: Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> */ #ifndef _LINUX_UDP_H #define _LINUX_UDP_H #include <net/inet_sock.h> #include <linux/skbuff.h> #include <net/netns/hash.h> #include <uapi/linux/udp.h> static inline struct udphdr *udp_hdr(const struct sk_buff *skb) { return (struct udphdr *)skb_transport_header(skb); } #define UDP_HTABLE_SIZE_MIN_PERNET 128 #define UDP_HTABLE_SIZE_MIN (IS_ENABLED(CONFIG_BASE_SMALL) ? 128 : 256) #define UDP_HTABLE_SIZE_MAX 65536 static inline u32 udp_hashfn(const struct net *net, u32 num, u32 mask) { return (num + net_hash_mix(net)) & mask; } enum { UDP_FLAGS_CORK, /* Cork is required */ UDP_FLAGS_NO_CHECK6_TX, /* Send zero UDP6 checksums on TX? */ UDP_FLAGS_NO_CHECK6_RX, /* Allow zero UDP6 checksums on RX? */ UDP_FLAGS_GRO_ENABLED, /* Request GRO aggregation */ UDP_FLAGS_ACCEPT_FRAGLIST, UDP_FLAGS_ACCEPT_L4, UDP_FLAGS_ENCAP_ENABLED, /* This socket enabled encap */ UDP_FLAGS_UDPLITE_SEND_CC, /* set via udplite setsockopt */ UDP_FLAGS_UDPLITE_RECV_CC, /* set via udplite setsockopt */ }; struct udp_sock { /* inet_sock has to be the first member */ struct inet_sock inet; #define udp_port_hash inet.sk.__sk_common.skc_u16hashes[0] #define udp_portaddr_hash inet.sk.__sk_common.skc_u16hashes[1] #define udp_portaddr_node inet.sk.__sk_common.skc_portaddr_node unsigned long udp_flags; int pending; /* Any pending frames ? */ __u8 encap_type; /* Is this an Encapsulation socket? */ #if !IS_ENABLED(CONFIG_BASE_SMALL) /* For UDP 4-tuple hash */ __u16 udp_lrpa_hash; struct hlist_nulls_node udp_lrpa_node; #endif /* * Following member retains the information to create a UDP header * when the socket is uncorked. */ __u16 len; /* total length of pending frames */ __u16 gso_size; /* * Fields specific to UDP-Lite. */ __u16 pcslen; __u16 pcrlen; /* * For encapsulation sockets. */ int (*encap_rcv)(struct sock *sk, struct sk_buff *skb); void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, int err, __be16 port, u32 info, u8 *payload); int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb); void (*encap_destroy)(struct sock *sk); /* GRO functions for UDP socket */ struct sk_buff * (*gro_receive)(struct sock *sk, struct list_head *head, struct sk_buff *skb); int (*gro_complete)(struct sock *sk, struct sk_buff *skb, int nhoff); /* udp_recvmsg try to use this before splicing sk_receive_queue */ struct sk_buff_head reader_queue ____cacheline_aligned_in_smp; /* This field is dirtied by udp_recvmsg() */ int forward_deficit; /* This fields follows rcvbuf value, and is touched by udp_recvmsg */ int forward_threshold; /* Cache friendly copy of sk->sk_peek_off >= 0 */ bool peeking_with_offset; /* * Accounting for the tunnel GRO fastpath. * Unprotected by compilers guard, as it uses space available in * the last UDP socket cacheline. */ struct hlist_node tunnel_list; }; #define udp_test_bit(nr, sk) \ test_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_set_bit(nr, sk) \ set_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_test_and_set_bit(nr, sk) \ test_and_set_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_clear_bit(nr, sk) \ clear_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_assign_bit(nr, sk, val) \ assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val) #define UDP_MAX_SEGMENTS (1 << 7UL) #define udp_sk(ptr) container_of_const(ptr, struct udp_sock, inet.sk) static inline int udp_set_peek_off(struct sock *sk, int val) { sk_set_peek_off(sk, val); WRITE_ONCE(udp_sk(sk)->peeking_with_offset, val >= 0); return 0; } static inline void udp_set_no_check6_tx(struct sock *sk, bool val) { udp_assign_bit(NO_CHECK6_TX, sk, val); } static inline void udp_set_no_check6_rx(struct sock *sk, bool val) { udp_assign_bit(NO_CHECK6_RX, sk, val); } static inline bool udp_get_no_check6_tx(const struct sock *sk) { return udp_test_bit(NO_CHECK6_TX, sk); } static inline bool udp_get_no_check6_rx(const struct sock *sk) { return udp_test_bit(NO_CHECK6_RX, sk); } static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { int gso_size; if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) { gso_size = skb_shinfo(skb)->gso_size; put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size); } } DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key); #if IS_ENABLED(CONFIG_IPV6) DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); #endif static inline bool udp_encap_needed(void) { if (static_branch_unlikely(&udp_encap_needed_key)) return true; #if IS_ENABLED(CONFIG_IPV6) if (static_branch_unlikely(&udpv6_encap_needed_key)) return true; #endif return false; } static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb) { if (!skb_is_gso(skb)) return false; if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_test_bit(ACCEPT_L4, sk)) return true; if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST && !udp_test_bit(ACCEPT_FRAGLIST, sk)) return true; /* GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits might still * land in a tunnel as the socket check in udp_gro_receive cannot be * foolproof. */ if (udp_encap_needed() && READ_ONCE(udp_sk(sk)->encap_rcv) && !(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) return true; return false; } static inline void udp_allow_gso(struct sock *sk) { udp_set_bit(ACCEPT_L4, sk); udp_set_bit(ACCEPT_FRAGLIST, sk); } #define udp_portaddr_for_each_entry(__sk, list) \ hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node) #define udp_portaddr_for_each_entry_from(__sk) \ hlist_for_each_entry_from(__sk, __sk_common.skc_portaddr_node) #define udp_portaddr_for_each_entry_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node) #if !IS_ENABLED(CONFIG_BASE_SMALL) #define udp_lrpa_for_each_entry_rcu(__up, node, list) \ hlist_nulls_for_each_entry_rcu(__up, node, list, udp_lrpa_node) #endif #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) { #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); #else return NULL; #endif } #endif /* _LINUX_UDP_H */ |
| 3 3 3 3 3 3 20 20 19 5 1 18 1 1 18 18 18 18 18 1 18 14 18 13 8 18 4 4 2 2 4 12 5 14 18 2 18 18 18 19 20 19 20 16 3 16 14 3 1 1 1 14 14 3 14 11 16 6 6 6 6 1 6 3 6 1 6 6 6 6 6 6 1 1 2 2 2 2 2 2 2 18 6 18 18 11 10 10 2 18 3 18 1 18 17 14 17 18 18 18 13 1 18 14 18 18 18 18 17 16 15 16 16 17 17 17 3 3 3 3 19 19 19 19 15 8 8 8 19 18 18 7 17 19 15 1 1 2 2 2 2 2 2 2 1 1 1 2 14 14 1 14 14 14 16 16 16 16 16 13 16 16 16 16 19 16 15 1 1 14 1 15 13 2 14 14 14 14 14 14 12 14 1 1 4 4 4 4 4 2 4 4 4 2 4 4 4 4 4 2 4 4 1 1 4 19 20 19 4 4 4 4 4 1 4 18 15 15 14 15 15 19 1 1 1 1 19 2 16 19 16 1 16 15 1 1 1 19 1 15 15 1 19 15 19 1 19 19 19 17 19 1 19 2 2 1 2 2 19 20 19 20 19 15 16 16 1 18 19 19 19 19 19 19 19 19 19 18 18 19 19 19 19 19 8 8 8 5 2 1 8 8 8 8 6 6 2 2 8 13 13 13 5 12 12 10 9 8 8 8 8 5 11 11 5 4 5 3 3 2 3 5 5 7 6 6 5 4 4 4 22 20 17 2 1 1 1 1 1 23 22 18 20 13 7 15 23 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 | // SPDX-License-Identifier: GPL-2.0 /* * Memory Migration functionality - linux/mm/migrate.c * * Copyright (C) 2006 Silicon Graphics, Inc., Christoph Lameter * * Page migration was first developed in the context of the memory hotplug * project. The main authors of the migration code are: * * IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Hirokazu Takahashi <taka@valinux.co.jp> * Dave Hansen <haveblue@us.ibm.com> * Christoph Lameter */ #include <linux/migrate.h> #include <linux/export.h> #include <linux/swap.h> #include <linux/swapops.h> #include <linux/pagemap.h> #include <linux/buffer_head.h> #include <linux/mm_inline.h> #include <linux/ksm.h> #include <linux/rmap.h> #include <linux/topology.h> #include <linux/cpu.h> #include <linux/cpuset.h> #include <linux/writeback.h> #include <linux/mempolicy.h> #include <linux/vmalloc.h> #include <linux/security.h> #include <linux/backing-dev.h> #include <linux/compaction.h> #include <linux/syscalls.h> #include <linux/compat.h> #include <linux/hugetlb.h> #include <linux/gfp.h> #include <linux/pfn_t.h> #include <linux/page_idle.h> #include <linux/page_owner.h> #include <linux/sched/mm.h> #include <linux/ptrace.h> #include <linux/memory.h> #include <linux/sched/sysctl.h> #include <linux/memory-tiers.h> #include <linux/pagewalk.h> #include <asm/tlbflush.h> #include <trace/events/migrate.h> #include "internal.h" #include "swap.h" bool isolate_movable_page(struct page *page, isolate_mode_t mode) { struct folio *folio = folio_get_nontail_page(page); const struct movable_operations *mops; /* * Avoid burning cycles with pages that are yet under __free_pages(), * or just got freed under us. * * In case we 'win' a race for a movable page being freed under us and * raise its refcount preventing __free_pages() from doing its job * the put_page() at the end of this block will take care of * release this page, thus avoiding a nasty leakage. */ if (!folio) goto out; /* * Check movable flag before taking the page lock because * we use non-atomic bitops on newly allocated page flags so * unconditionally grabbing the lock ruins page's owner side. */ if (unlikely(!__folio_test_movable(folio))) goto out_putfolio; /* * As movable pages are not isolated from LRU lists, concurrent * compaction threads can race against page migration functions * as well as race against the releasing a page. * * In order to avoid having an already isolated movable page * being (wrongly) re-isolated while it is under migration, * or to avoid attempting to isolate pages being released, * lets be sure we have the page lock * before proceeding with the movable page isolation steps. */ if (unlikely(!folio_trylock(folio))) goto out_putfolio; if (!folio_test_movable(folio) || folio_test_isolated(folio)) goto out_no_isolated; mops = folio_movable_ops(folio); VM_BUG_ON_FOLIO(!mops, folio); if (!mops->isolate_page(&folio->page, mode)) goto out_no_isolated; /* Driver shouldn't use the isolated flag */ WARN_ON_ONCE(folio_test_isolated(folio)); folio_set_isolated(folio); folio_unlock(folio); return true; out_no_isolated: folio_unlock(folio); out_putfolio: folio_put(folio); out: return false; } static void putback_movable_folio(struct folio *folio) { const struct movable_operations *mops = folio_movable_ops(folio); mops->putback_page(&folio->page); folio_clear_isolated(folio); } /* * Put previously isolated pages back onto the appropriate lists * from where they were once taken off for compaction/migration. * * This function shall be used whenever the isolated pageset has been * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range() * and folio_isolate_hugetlb(). */ void putback_movable_pages(struct list_head *l) { struct folio *folio; struct folio *folio2; list_for_each_entry_safe(folio, folio2, l, lru) { if (unlikely(folio_test_hugetlb(folio))) { folio_putback_hugetlb(folio); continue; } list_del(&folio->lru); /* * We isolated non-lru movable folio so here we can use * __folio_test_movable because LRU folio's mapping cannot * have PAGE_MAPPING_MOVABLE. */ if (unlikely(__folio_test_movable(folio))) { VM_BUG_ON_FOLIO(!folio_test_isolated(folio), folio); folio_lock(folio); if (folio_test_movable(folio)) putback_movable_folio(folio); else folio_clear_isolated(folio); folio_unlock(folio); folio_put(folio); } else { node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), -folio_nr_pages(folio)); folio_putback_lru(folio); } } } /* Must be called with an elevated refcount on the non-hugetlb folio */ bool isolate_folio_to_list(struct folio *folio, struct list_head *list) { bool isolated, lru; if (folio_test_hugetlb(folio)) return folio_isolate_hugetlb(folio, list); lru = !__folio_test_movable(folio); if (lru) isolated = folio_isolate_lru(folio); else isolated = isolate_movable_page(&folio->page, ISOLATE_UNEVICTABLE); if (!isolated) return false; list_add(&folio->lru, list); if (lru) node_stat_add_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio)); return true; } static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw, struct folio *folio, unsigned long idx) { struct page *page = folio_page(folio, idx); bool contains_data; pte_t newpte; void *addr; if (PageCompound(page)) return false; VM_BUG_ON_PAGE(!PageAnon(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page); if (folio_test_mlocked(folio) || (pvmw->vma->vm_flags & VM_LOCKED) || mm_forbids_zeropage(pvmw->vma->vm_mm)) return false; /* * The pmd entry mapping the old thp was flushed and the pte mapping * this subpage has been non present. If the subpage is only zero-filled * then map it to the shared zeropage. */ addr = kmap_local_page(page); contains_data = memchr_inv(addr, 0, PAGE_SIZE); kunmap_local(addr); if (contains_data) return false; newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), pvmw->vma->vm_page_prot)); set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); return true; } struct rmap_walk_arg { struct folio *folio; bool map_unused_to_zeropage; }; /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { struct rmap_walk_arg *rmap_walk_arg = arg; DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | PVMW_MIGRATION); while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags = RMAP_NONE; pte_t old_pte; pte_t pte; swp_entry_t entry; struct page *new; unsigned long idx = 0; /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ if (!pvmw.pte) { VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || !folio_test_pmd_mappable(folio), folio); remove_migration_pmd(&pvmw, new); continue; } #endif if (rmap_walk_arg->map_unused_to_zeropage && try_to_map_unused_to_zeropage(&pvmw, folio, idx)) continue; folio_get(folio); pte = mk_pte(new, READ_ONCE(vma->vm_page_prot)); old_pte = ptep_get(pvmw.pte); entry = pte_to_swp_entry(old_pte); if (!is_migration_entry_young(entry)) pte = pte_mkold(pte); if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); if (pte_swp_soft_dirty(old_pte)) pte = pte_mksoft_dirty(pte); else pte = pte_clear_soft_dirty(pte); if (is_writable_migration_entry(entry)) pte = pte_mkwrite(pte, vma); else if (pte_swp_uffd_wp(old_pte)) pte = pte_mkuffd_wp(pte); if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) rmap_flags |= RMAP_EXCLUSIVE; if (unlikely(is_device_private_page(new))) { if (pte_write(pte)) entry = make_writable_device_private_entry( page_to_pfn(new)); else entry = make_readable_device_private_entry( page_to_pfn(new)); pte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(old_pte)) pte = pte_swp_mksoft_dirty(pte); if (pte_swp_uffd_wp(old_pte)) pte = pte_swp_mkuffd_wp(pte); } #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { struct hstate *h = hstate_vma(vma); unsigned int shift = huge_page_shift(h); unsigned long psize = huge_page_size(h); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugetlb_add_anon_rmap(folio, vma, pvmw.address, rmap_flags); else hugetlb_add_file_rmap(folio); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte, psize); } else #endif { if (folio_test_anon(folio)) folio_add_anon_rmap_pte(folio, new, vma, pvmw.address, rmap_flags); else folio_add_file_rmap_pte(folio, new, vma); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } if (READ_ONCE(vma->vm_flags) & VM_LOCKED) mlock_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), compound_order(new)); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); } return true; } /* * Get rid of all migration entries and replace them by * references to the indicated page. */ void remove_migration_ptes(struct folio *src, struct folio *dst, int flags) { struct rmap_walk_arg rmap_walk_arg = { .folio = src, .map_unused_to_zeropage = flags & RMP_USE_SHARED_ZEROPAGE, }; struct rmap_walk_control rwc = { .rmap_one = remove_migration_pte, .arg = &rmap_walk_arg, }; VM_BUG_ON_FOLIO((flags & RMP_USE_SHARED_ZEROPAGE) && (src != dst), src); if (flags & RMP_LOCKED) rmap_walk_locked(dst, &rwc); else rmap_walk(dst, &rwc); } /* * Something used the pte of a page under migration. We need to * get to the page and wait until migration is finished. * When we return from this function the fault will be retried. */ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { spinlock_t *ptl; pte_t *ptep; pte_t pte; swp_entry_t entry; ptep = pte_offset_map_lock(mm, pmd, address, &ptl); if (!ptep) return; pte = ptep_get(ptep); pte_unmap(ptep); if (!is_swap_pte(pte)) goto out; entry = pte_to_swp_entry(pte); if (!is_migration_entry(entry)) goto out; migration_entry_wait_on_locked(entry, ptl); return; out: spin_unlock(ptl); } #ifdef CONFIG_HUGETLB_PAGE /* * The vma read lock must be held upon entry. Holding that lock prevents either * the pte or the ptl from being freed. * * This function will release the vma lock before returning. */ void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, ptep); pte_t pte; hugetlb_vma_assert_locked(vma); spin_lock(ptl); pte = huge_ptep_get(vma->vm_mm, addr, ptep); if (unlikely(!is_hugetlb_entry_migration(pte))) { spin_unlock(ptl); hugetlb_vma_unlock_read(vma); } else { /* * If migration entry existed, safe to release vma lock * here because the pgtable page won't be freed without the * pgtable lock released. See comment right above pgtable * lock release in migration_entry_wait_on_locked(). */ hugetlb_vma_unlock_read(vma); migration_entry_wait_on_locked(pte_to_swp_entry(pte), ptl); } } #endif #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) { spinlock_t *ptl; ptl = pmd_lock(mm, pmd); if (!is_pmd_migration_entry(*pmd)) goto unlock; migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl); return; unlock: spin_unlock(ptl); } #endif /* * Replace the folio in the mapping. * * The number of remaining references must be: * 1 for anonymous folios without a mapping * 2 for folios with a mapping * 3 for folios with a mapping and the private flag set. */ static int __folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int expected_count) { XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct zone *oldzone, *newzone; int dirty; long nr = folio_nr_pages(folio); long entries, i; if (!mapping) { /* Take off deferred split queue while frozen and memcg set */ if (folio_test_large(folio) && folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } /* No turning back from here */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; if (folio_test_anon(folio) && folio_test_large(folio)) mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); if (folio_test_swapbacked(folio)) __folio_set_swapbacked(newfolio); return MIGRATEPAGE_SUCCESS; } oldzone = folio_zone(folio); newzone = folio_zone(newfolio); xas_lock_irq(&xas); if (!folio_ref_freeze(folio, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN; } /* Take off deferred split queue while frozen and memcg set */ folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: * no turning back from here. */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; if (folio_test_anon(folio) && folio_test_large(folio)) mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); folio_ref_add(newfolio, nr); /* add cache reference */ if (folio_test_swapbacked(folio)) __folio_set_swapbacked(newfolio); if (folio_test_swapcache(folio)) { folio_set_swapcache(newfolio); newfolio->private = folio_get_private(folio); entries = nr; } else { entries = 1; } /* Move dirty while folio refs frozen and newfolio not yet exposed */ dirty = folio_test_dirty(folio); if (dirty) { folio_clear_dirty(folio); folio_set_dirty(newfolio); } /* Swap cache still stores N entries instead of a high-order entry */ for (i = 0; i < entries; i++) { xas_store(&xas, newfolio); xas_next(&xas); } /* * Drop cache reference from old folio by unfreezing * to one less reference. * We know this isn't the last reference. */ folio_ref_unfreeze(folio, expected_count - nr); xas_unlock(&xas); /* Leave irq disabled to prevent preemption while updating stats */ /* * If moved to a different zone then also account * the folio for that zone. Other VM counters will be * taken care of when we establish references to the * new folio and drop references to the old folio. * * Note that anonymous folios are accounted for * via NR_FILE_PAGES and NR_ANON_MAPPED if they * are mapped to swap space. */ if (newzone != oldzone) { struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; memcg = folio_memcg(folio); old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr); __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr); if (folio_test_swapbacked(folio) && !folio_test_swapcache(folio)) { __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr); __mod_lruvec_state(new_lruvec, NR_SHMEM, nr); if (folio_test_pmd_mappable(folio)) { __mod_lruvec_state(old_lruvec, NR_SHMEM_THPS, -nr); __mod_lruvec_state(new_lruvec, NR_SHMEM_THPS, nr); } } #ifdef CONFIG_SWAP if (folio_test_swapcache(folio)) { __mod_lruvec_state(old_lruvec, NR_SWAPCACHE, -nr); __mod_lruvec_state(new_lruvec, NR_SWAPCACHE, nr); } #endif if (dirty && mapping_can_writeback(mapping)) { __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr); __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr); __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } } local_irq_enable(); return MIGRATEPAGE_SUCCESS; } int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count) { int expected_count = folio_expected_ref_count(folio) + extra_count + 1; if (folio_ref_count(folio) != expected_count) return -EAGAIN; return __folio_migrate_mapping(mapping, newfolio, folio, expected_count); } EXPORT_SYMBOL(folio_migrate_mapping); /* * The expected number of remaining references is the same as that * of folio_migrate_mapping(). */ int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src) { XA_STATE(xas, &mapping->i_pages, folio_index(src)); int rc, expected_count = folio_expected_ref_count(src) + 1; if (folio_ref_count(src) != expected_count) return -EAGAIN; rc = folio_mc_copy(dst, src); if (unlikely(rc)) return rc; xas_lock_irq(&xas); if (!folio_ref_freeze(src, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN; } dst->index = src->index; dst->mapping = src->mapping; folio_ref_add(dst, folio_nr_pages(dst)); xas_store(&xas, dst); folio_ref_unfreeze(src, expected_count - folio_nr_pages(src)); xas_unlock_irq(&xas); return MIGRATEPAGE_SUCCESS; } /* * Copy the flags and some other ancillary information */ void folio_migrate_flags(struct folio *newfolio, struct folio *folio) { int cpupid; if (folio_test_referenced(folio)) folio_set_referenced(newfolio); if (folio_test_uptodate(folio)) folio_mark_uptodate(newfolio); if (folio_test_clear_active(folio)) { VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); folio_set_active(newfolio); } else if (folio_test_clear_unevictable(folio)) folio_set_unevictable(newfolio); if (folio_test_workingset(folio)) folio_set_workingset(newfolio); if (folio_test_checked(folio)) folio_set_checked(newfolio); /* * PG_anon_exclusive (-> PG_mappedtodisk) is always migrated via * migration entries. We can still have PG_anon_exclusive set on an * effectively unmapped and unreferenced first sub-pages of an * anonymous THP: we can simply copy it here via PG_mappedtodisk. */ if (folio_test_mappedtodisk(folio)) folio_set_mappedtodisk(newfolio); /* Move dirty on pages not done by folio_migrate_mapping() */ if (folio_test_dirty(folio)) folio_set_dirty(newfolio); if (folio_test_young(folio)) folio_set_young(newfolio); if (folio_test_idle(folio)) folio_set_idle(newfolio); folio_migrate_refs(newfolio, folio); /* * Copy NUMA information to the new page, to prevent over-eager * future migrations of this same page. */ cpupid = folio_xchg_last_cpupid(folio, -1); /* * For memory tiering mode, when migrate between slow and fast * memory node, reset cpupid, because that is used to record * page access time in slow memory node. */ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) { bool f_toptier = node_is_toptier(folio_nid(folio)); bool t_toptier = node_is_toptier(folio_nid(newfolio)); if (f_toptier != t_toptier) cpupid = -1; } folio_xchg_last_cpupid(newfolio, cpupid); folio_migrate_ksm(newfolio, folio); /* * Please do not reorder this without considering how mm/ksm.c's * ksm_get_folio() depends upon ksm_migrate_page() and the * swapcache flag. */ if (folio_test_swapcache(folio)) folio_clear_swapcache(folio); folio_clear_private(folio); /* page->private contains hugetlb specific flags */ if (!folio_test_hugetlb(folio)) folio->private = NULL; /* * If any waiters have accumulated on the new page then * wake them up. */ if (folio_test_writeback(newfolio)) folio_end_writeback(newfolio); /* * PG_readahead shares the same bit with PG_reclaim. The above * end_page_writeback() may clear PG_readahead mistakenly, so set the * bit after that. */ if (folio_test_readahead(folio)) folio_set_readahead(newfolio); folio_copy_owner(newfolio, folio); pgalloc_tag_swap(newfolio, folio); mem_cgroup_migrate(folio, newfolio); } EXPORT_SYMBOL(folio_migrate_flags); /************************************************************ * Migration functions ***********************************************************/ static int __migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, void *src_private, enum migrate_mode mode) { int rc, expected_count = folio_expected_ref_count(src) + 1; /* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) return -EAGAIN; rc = folio_mc_copy(dst, src); if (unlikely(rc)) return rc; rc = __folio_migrate_mapping(mapping, dst, src, expected_count); if (rc != MIGRATEPAGE_SUCCESS) return rc; if (src_private) folio_attach_private(dst, folio_detach_private(src)); folio_migrate_flags(dst, src); return MIGRATEPAGE_SUCCESS; } /** * migrate_folio() - Simple folio migration. * @mapping: The address_space containing the folio. * @dst: The folio to migrate the data to. * @src: The folio containing the current data. * @mode: How to migrate the page. * * Common logic to directly migrate a single LRU folio suitable for * folios that do not have private data. * * Folios are locked upon entry and exit. */ int migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { BUG_ON(folio_test_writeback(src)); /* Writeback must be complete */ return __migrate_folio(mapping, dst, src, NULL, mode); } EXPORT_SYMBOL(migrate_folio); #ifdef CONFIG_BUFFER_HEAD /* Returns true if all buffers are successfully locked */ static bool buffer_migrate_lock_buffers(struct buffer_head *head, enum migrate_mode mode) { struct buffer_head *bh = head; struct buffer_head *failed_bh; do { if (!trylock_buffer(bh)) { if (mode == MIGRATE_ASYNC) goto unlock; if (mode == MIGRATE_SYNC_LIGHT && !buffer_uptodate(bh)) goto unlock; lock_buffer(bh); } bh = bh->b_this_page; } while (bh != head); return true; unlock: /* We failed to lock the buffer and cannot stall. */ failed_bh = bh; bh = head; while (bh != failed_bh) { unlock_buffer(bh); bh = bh->b_this_page; } return false; } static int __buffer_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode, bool check_refs) { struct buffer_head *bh, *head; int rc; int expected_count; head = folio_buffers(src); if (!head) return migrate_folio(mapping, dst, src, mode); /* Check whether page does not have extra refs before we do more work */ expected_count = folio_expected_ref_count(src) + 1; if (folio_ref_count(src) != expected_count) return -EAGAIN; if (!buffer_migrate_lock_buffers(head, mode)) return -EAGAIN; if (check_refs) { bool busy, migrating; bool invalidated = false; migrating = test_and_set_bit_lock(BH_Migrate, &head->b_state); VM_WARN_ON_ONCE(migrating); recheck_buffers: busy = false; spin_lock(&mapping->i_private_lock); bh = head; do { if (atomic_read(&bh->b_count)) { busy = true; break; } bh = bh->b_this_page; } while (bh != head); spin_unlock(&mapping->i_private_lock); if (busy) { if (invalidated) { rc = -EAGAIN; goto unlock_buffers; } invalidate_bh_lrus(); invalidated = true; goto recheck_buffers; } } rc = filemap_migrate_folio(mapping, dst, src, mode); if (rc != MIGRATEPAGE_SUCCESS) goto unlock_buffers; bh = head; do { folio_set_bh(bh, dst, bh_offset(bh)); bh = bh->b_this_page; } while (bh != head); unlock_buffers: if (check_refs) clear_bit_unlock(BH_Migrate, &head->b_state); bh = head; do { unlock_buffer(bh); bh = bh->b_this_page; } while (bh != head); return rc; } /** * buffer_migrate_folio() - Migration function for folios with buffers. * @mapping: The address space containing @src. * @dst: The folio to migrate to. * @src: The folio to migrate from. * @mode: How to migrate the folio. * * This function can only be used if the underlying filesystem guarantees * that no other references to @src exist. For example attached buffer * heads are accessed only under the folio lock. If your filesystem cannot * provide this guarantee, buffer_migrate_folio_norefs() may be more * appropriate. * * Return: 0 on success or a negative errno on failure. */ int buffer_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __buffer_migrate_folio(mapping, dst, src, mode, false); } EXPORT_SYMBOL(buffer_migrate_folio); /** * buffer_migrate_folio_norefs() - Migration function for folios with buffers. * @mapping: The address space containing @src. * @dst: The folio to migrate to. * @src: The folio to migrate from. * @mode: How to migrate the folio. * * Like buffer_migrate_folio() except that this variant is more careful * and checks that there are also no buffer head references. This function * is the right one for mappings where buffer heads are directly looked * up and referenced (such as block device mappings). * * Return: 0 on success or a negative errno on failure. */ int buffer_migrate_folio_norefs(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __buffer_migrate_folio(mapping, dst, src, mode, true); } EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs); #endif /* CONFIG_BUFFER_HEAD */ int filemap_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __migrate_folio(mapping, dst, src, folio_get_private(src), mode); } EXPORT_SYMBOL_GPL(filemap_migrate_folio); /* * Default handling if a filesystem does not provide a migration function. */ static int fallback_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { WARN_ONCE(mapping->a_ops->writepages, "%ps does not implement migrate_folio\n", mapping->a_ops); if (folio_test_dirty(src)) return -EBUSY; /* * Filesystem may have private data at folio->private that we * can't migrate automatically. */ if (!filemap_release_folio(src, GFP_KERNEL)) return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY; return migrate_folio(mapping, dst, src, mode); } /* * Move a page to a newly allocated page * The page is locked and all ptes have been successfully removed. * * The new page will have replaced the old page if this function * is successful. * * Return value: * < 0 - error code * MIGRATEPAGE_SUCCESS - success */ static int move_to_new_folio(struct folio *dst, struct folio *src, enum migrate_mode mode) { int rc = -EAGAIN; bool is_lru = !__folio_test_movable(src); VM_BUG_ON_FOLIO(!folio_test_locked(src), src); VM_BUG_ON_FOLIO(!folio_test_locked(dst), dst); if (likely(is_lru)) { struct address_space *mapping = folio_mapping(src); if (!mapping) rc = migrate_folio(mapping, dst, src, mode); else if (mapping_inaccessible(mapping)) rc = -EOPNOTSUPP; else if (mapping->a_ops->migrate_folio) /* * Most folios have a mapping and most filesystems * provide a migrate_folio callback. Anonymous folios * are part of swap space which also has its own * migrate_folio callback. This is the most common path * for page migration. */ rc = mapping->a_ops->migrate_folio(mapping, dst, src, mode); else rc = fallback_migrate_folio(mapping, dst, src, mode); } else { const struct movable_operations *mops; /* * In case of non-lru page, it could be released after * isolation step. In that case, we shouldn't try migration. */ VM_BUG_ON_FOLIO(!folio_test_isolated(src), src); if (!folio_test_movable(src)) { rc = MIGRATEPAGE_SUCCESS; folio_clear_isolated(src); goto out; } mops = folio_movable_ops(src); rc = mops->migrate_page(&dst->page, &src->page, mode); WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS && !folio_test_isolated(src)); } /* * When successful, old pagecache src->mapping must be cleared before * src is freed; but stats require that PageAnon be left as PageAnon. */ if (rc == MIGRATEPAGE_SUCCESS) { if (__folio_test_movable(src)) { VM_BUG_ON_FOLIO(!folio_test_isolated(src), src); /* * We clear PG_movable under page_lock so any compactor * cannot try to migrate this page. */ folio_clear_isolated(src); } /* * Anonymous and movable src->mapping will be cleared by * free_pages_prepare so don't reset it here for keeping * the type to work PageAnon, for example. */ if (!folio_mapping_flags(src)) src->mapping = NULL; if (likely(!folio_is_zone_device(dst))) flush_dcache_folio(dst); } out: return rc; } /* * To record some information during migration, we use unused private * field of struct folio of the newly allocated destination folio. * This is safe because nobody is using it except us. */ enum { PAGE_WAS_MAPPED = BIT(0), PAGE_WAS_MLOCKED = BIT(1), PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, }; static void __migrate_folio_record(struct folio *dst, int old_page_state, struct anon_vma *anon_vma) { dst->private = (void *)anon_vma + old_page_state; } static void __migrate_folio_extract(struct folio *dst, int *old_page_state, struct anon_vma **anon_vmap) { unsigned long private = (unsigned long)dst->private; *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES); *old_page_state = private & PAGE_OLD_STATES; dst->private = NULL; } /* Restore the source folio to the original state upon failure */ static void migrate_folio_undo_src(struct folio *src, int page_was_mapped, struct anon_vma *anon_vma, bool locked, struct list_head *ret) { if (page_was_mapped) remove_migration_ptes(src, src, 0); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); if (locked) folio_unlock(src); if (ret) list_move_tail(&src->lru, ret); } /* Restore the destination folio to the original state upon failure */ static void migrate_folio_undo_dst(struct folio *dst, bool locked, free_folio_t put_new_folio, unsigned long private) { if (locked) folio_unlock(dst); if (put_new_folio) put_new_folio(dst, private); else folio_put(dst); } /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, enum migrate_reason reason) { /* * Compaction can migrate also non-LRU pages which are * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); if (reason != MR_MEMORY_FAILURE) /* We release the page in page_handle_poison. */ folio_put(src); } /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio **dstp, enum migrate_mode mode, enum migrate_reason reason, struct list_head *ret) { struct folio *dst; int rc = -EAGAIN; int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = data_race(!__folio_test_movable(src)); bool locked = false; bool dst_locked = false; if (folio_ref_count(src) == 1) { /* Folio was freed from under us. So we are done. */ folio_clear_active(src); folio_clear_unevictable(src); /* free_pages_prepare() will clear PG_isolated. */ list_del(&src->lru); migrate_folio_done(src, reason); return MIGRATEPAGE_SUCCESS; } dst = get_new_folio(src, private); if (!dst) return -ENOMEM; *dstp = dst; dst->private = NULL; if (!folio_trylock(src)) { if (mode == MIGRATE_ASYNC) goto out; /* * It's not safe for direct compaction to call lock_page. * For example, during page readahead pages are added locked * to the LRU. Later, when the IO completes the pages are * marked uptodate and unlocked. However, the queueing * could be merging multiple pages for one bio (e.g. * mpage_readahead). If an allocation happens for the * second or third page, the process can end up locking * the same page twice and deadlocking. Rather than * trying to be clever about what pages can be locked, * avoid the use of lock_page for direct compaction * altogether. */ if (current->flags & PF_MEMALLOC) goto out; /* * In "light" mode, we can wait for transient locks (eg * inserting a page into the page table), but it's not * worth waiting for I/O. */ if (mode == MIGRATE_SYNC_LIGHT && !folio_test_uptodate(src)) goto out; folio_lock(src); } locked = true; if (folio_test_mlocked(src)) old_page_state |= PAGE_WAS_MLOCKED; if (folio_test_writeback(src)) { /* * Only in the case of a full synchronous migration is it * necessary to wait for PageWriteback. In the async case, * the retry loop is too short and in the sync-light case, * the overhead of stalling is too much */ switch (mode) { case MIGRATE_SYNC: break; default: rc = -EBUSY; goto out; } folio_wait_writeback(src); } /* * By try_to_migrate(), src->mapcount goes down to 0 here. In this case, * we cannot notice that anon_vma is freed while we migrate a page. * This get_anon_vma() delays freeing anon_vma pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. * * Only folio_get_anon_vma() understands the subtleties of * getting a hold on an anon_vma from outside one of its mms. * But if we cannot get anon_vma, then we won't need it anyway, * because that implies that the anon page is no longer mapped * (and cannot be remapped so long as we hold the page lock). */ if (folio_test_anon(src) && !folio_test_ksm(src)) anon_vma = folio_get_anon_vma(src); /* * Block others from accessing the new page when we get around to * establishing additional references. We are usually the only one * holding a reference to dst at this point. We used to have a BUG * here if folio_trylock(dst) fails, but would like to allow for * cases where there might be a race with the previous use of dst. * This is much like races on refcount of oldpage: just don't BUG(). */ if (unlikely(!folio_trylock(dst))) goto out; dst_locked = true; if (unlikely(!is_lru)) { __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; } /* * Corner case handling: * 1. When a new swap-cache page is read into, it is added to the LRU * and treated as swapcache but it has no rmap yet. * Calling try_to_unmap() against a src->mapping==NULL page will * trigger a BUG. So handle it here. * 2. An orphaned page (see truncate_cleanup_page) might have * fs-private metadata. The page can be picked up due to memory * offlining. Everywhere else except page reclaim, the page is * invisible to the vm, so the page can not be migrated. So try to * free the metadata, so the page can be freed. */ if (!src->mapping) { if (folio_test_private(src)) { try_to_free_buffers(src); goto out; } } else if (folio_mapped(src)) { /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |= PAGE_WAS_MAPPED; } if (!folio_mapped(src)) { __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; } out: /* * A folio that has not been unmapped will be restored to * right list unless we want to retry. */ if (rc == -EAGAIN) ret = NULL; migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, anon_vma, locked, ret); migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private); return rc; } /* Migrate the folio to the newly allocated folio in dst. */ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio *dst, enum migrate_mode mode, enum migrate_reason reason, struct list_head *ret) { int rc; int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__folio_test_movable(src); struct list_head *prev; __migrate_folio_extract(dst, &old_page_state, &anon_vma); prev = dst->lru.prev; list_del(&dst->lru); rc = move_to_new_folio(dst, src, mode); if (rc) goto out; if (unlikely(!is_lru)) goto out_unlock_both; /* * When successful, push dst to LRU immediately: so that if it * turns out to be an mlocked page, remove_migration_ptes() will * automatically build up the correct dst->mlock_count for it. * * We would like to do something similar for the old page, when * unsuccessful, and other cases when a page has been temporarily * isolated from the unevictable LRU: but this case is the easiest. */ folio_add_lru(dst); if (old_page_state & PAGE_WAS_MLOCKED) lru_add_drain(); if (old_page_state & PAGE_WAS_MAPPED) remove_migration_ptes(src, dst, 0); out_unlock_both: folio_unlock(dst); set_page_owner_migrate_reason(&dst->page, reason); /* * If migration is successful, decrease refcount of dst, * which will not free the page because new page owner increased * refcounter. */ folio_put(dst); /* * A folio that has been migrated has all references removed * and will be freed. */ list_del(&src->lru); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); folio_unlock(src); migrate_folio_done(src, reason); return rc; out: /* * A folio that has not been migrated will be restored to * right list unless we want to retry. */ if (rc == -EAGAIN) { list_add(&dst->lru, prev); __migrate_folio_record(dst, old_page_state, anon_vma); return rc; } migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, anon_vma, true, ret); migrate_folio_undo_dst(dst, true, put_new_folio, private); return rc; } /* * Counterpart of unmap_and_move_page() for hugepage migration. * * This function doesn't wait the completion of hugepage I/O * because there is no race between I/O and migration for hugepage. * Note that currently hugepage I/O occurs only in direct I/O * where no lock is held and PG_writeback is irrelevant, * and writeback status of all subpages are counted in the reference * count of the head page (i.e. if all subpages of a 2MB hugepage are * under direct I/O, the reference of the head page is 512 and a bit more.) * This means that when we try to migrate hugepage whose subpages are * doing direct I/O, some references remain after try_to_unmap() and * hugepage migration fails without data corruption. * * There is also no race when direct I/O is issued on the page under migration, * because then pte is replaced with migration swap entry and direct I/O code * will wait in the page fault for migration to complete. */ static int unmap_and_move_huge_page(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, int force, enum migrate_mode mode, int reason, struct list_head *ret) { struct folio *dst; int rc = -EAGAIN; int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ folio_putback_hugetlb(src); return MIGRATEPAGE_SUCCESS; } dst = get_new_folio(src, private); if (!dst) return -ENOMEM; if (!folio_trylock(src)) { if (!force) goto out; switch (mode) { case MIGRATE_SYNC: break; default: goto out; } folio_lock(src); } /* * Check for pages which are in the process of being freed. Without * folio_mapping() set, hugetlbfs specific move page routine will not * be called and we could leak usage counts for subpools. */ if (hugetlb_folio_subpool(src) && !folio_mapping(src)) { rc = -EBUSY; goto out_unlock; } if (folio_test_anon(src)) anon_vma = folio_get_anon_vma(src); if (unlikely(!folio_trylock(dst))) goto put_anon; if (folio_mapped(src)) { enum ttu_flags ttu = 0; if (!folio_test_anon(src)) { /* * In shared mappings, try_to_unmap could potentially * call huge_pmd_unshare. Because of this, take * semaphore in write mode here and set TTU_RMAP_LOCKED * to let lower levels know we have taken the lock. */ mapping = hugetlb_folio_mapping_lock_write(src); if (unlikely(!mapping)) goto unlock_put_anon; ttu = TTU_RMAP_LOCKED; } try_to_migrate(src, ttu); page_was_mapped = 1; if (ttu & TTU_RMAP_LOCKED) i_mmap_unlock_write(mapping); } if (!folio_mapped(src)) rc = move_to_new_folio(dst, src, mode); if (page_was_mapped) remove_migration_ptes(src, rc == MIGRATEPAGE_SUCCESS ? dst : src, 0); unlock_put_anon: folio_unlock(dst); put_anon: if (anon_vma) put_anon_vma(anon_vma); if (rc == MIGRATEPAGE_SUCCESS) { move_hugetlb_state(src, dst, reason); put_new_folio = NULL; } out_unlock: folio_unlock(src); out: if (rc == MIGRATEPAGE_SUCCESS) folio_putback_hugetlb(src); else if (rc != -EAGAIN) list_move_tail(&src->lru, ret); /* * If migration was not successful and there's a freeing callback, * return the folio to that special allocator. Otherwise, simply drop * our additional reference. */ if (put_new_folio) put_new_folio(dst, private); else folio_put(dst); return rc; } static inline int try_split_folio(struct folio *folio, struct list_head *split_folios, enum migrate_mode mode) { int rc; if (mode == MIGRATE_ASYNC) { if (!folio_trylock(folio)) return -EAGAIN; } else { folio_lock(folio); } rc = split_folio_to_list(folio, split_folios); folio_unlock(folio); if (!rc) list_move_tail(&folio->lru, split_folios); return rc; } #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define NR_MAX_BATCHED_MIGRATION HPAGE_PMD_NR #else #define NR_MAX_BATCHED_MIGRATION 512 #endif #define NR_MAX_MIGRATE_PAGES_RETRY 10 #define NR_MAX_MIGRATE_ASYNC_RETRY 3 #define NR_MAX_MIGRATE_SYNC_RETRY \ (NR_MAX_MIGRATE_PAGES_RETRY - NR_MAX_MIGRATE_ASYNC_RETRY) struct migrate_pages_stats { int nr_succeeded; /* Normal and large folios migrated successfully, in units of base pages */ int nr_failed_pages; /* Normal and large folios failed to be migrated, in units of base pages. Untried folios aren't counted */ int nr_thp_succeeded; /* THP migrated successfully */ int nr_thp_failed; /* THP failed to be migrated */ int nr_thp_split; /* THP split before migrating */ int nr_split; /* Large folio (include THP) split before migrating */ }; /* * Returns the number of hugetlb folios that were not migrated, or an error code * after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no hugetlb folios are movable * any more because the list has become empty or no retryable hugetlb folios * exist any more. It is caller's responsibility to call putback_movable_pages() * only if ret != 0. */ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct migrate_pages_stats *stats, struct list_head *ret_folios) { int retry = 1; int nr_failed = 0; int nr_retry_pages = 0; int pass = 0; struct folio *folio, *folio2; int rc, nr_pages; for (pass = 0; pass < NR_MAX_MIGRATE_PAGES_RETRY && retry; pass++) { retry = 0; nr_retry_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { if (!folio_test_hugetlb(folio)) continue; nr_pages = folio_nr_pages(folio); cond_resched(); /* * Migratability of hugepages depends on architectures and * their size. This check is necessary because some callers * of hugepage migration like soft offline and memory * hotremove don't walk through page tables or check whether * the hugepage is pmd-based or not before kicking migration. */ if (!hugepage_migration_supported(folio_hstate(folio))) { nr_failed++; stats->nr_failed_pages += nr_pages; list_move_tail(&folio->lru, ret_folios); continue; } rc = unmap_and_move_huge_page(get_new_folio, put_new_folio, private, folio, pass > 2, mode, reason, ret_folios); /* * The rules are: * Success: hugetlb folio will be put back * -EAGAIN: stay on the from list * -ENOMEM: stay on the from list * Other errno: put on ret_folios list */ switch(rc) { case -ENOMEM: /* * When memory is low, don't bother to try to migrate * other folios, just exit. */ stats->nr_failed_pages += nr_pages + nr_retry_pages; return -ENOMEM; case -EAGAIN: retry++; nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; break; default: /* * Permanent failure (-EBUSY, etc.): * unlike -EAGAIN case, the failed folio is * removed from migration folio list and not * retried in the next outer loop. */ nr_failed++; stats->nr_failed_pages += nr_pages; break; } } } /* * nr_failed is number of hugetlb folios failed to be migrated. After * NR_MAX_MIGRATE_PAGES_RETRY attempts, give up and count retried hugetlb * folios as failed. */ nr_failed += retry; stats->nr_failed_pages += nr_retry_pages; return nr_failed; } static void migrate_folios_move(struct list_head *src_folios, struct list_head *dst_folios, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct migrate_pages_stats *stats, int *retry, int *thp_retry, int *nr_failed, int *nr_retry_pages) { struct folio *folio, *folio2, *dst, *dst2; bool is_thp; int nr_pages; int rc; dst = list_first_entry(dst_folios, struct folio, lru); dst2 = list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, src_folios, lru) { is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio); nr_pages = folio_nr_pages(folio); cond_resched(); rc = migrate_folio_move(put_new_folio, private, folio, dst, mode, reason, ret_folios); /* * The rules are: * Success: folio will be freed * -EAGAIN: stay on the unmap_folios list * Other errno: put on ret_folios list */ switch (rc) { case -EAGAIN: *retry += 1; *thp_retry += is_thp; *nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; stats->nr_thp_succeeded += is_thp; break; default: *nr_failed += 1; stats->nr_thp_failed += is_thp; stats->nr_failed_pages += nr_pages; break; } dst = dst2; dst2 = list_next_entry(dst, lru); } } static void migrate_folios_undo(struct list_head *src_folios, struct list_head *dst_folios, free_folio_t put_new_folio, unsigned long private, struct list_head *ret_folios) { struct folio *folio, *folio2, *dst, *dst2; dst = list_first_entry(dst_folios, struct folio, lru); dst2 = list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, src_folios, lru) { int old_page_state = 0; struct anon_vma *anon_vma = NULL; __migrate_folio_extract(dst, &old_page_state, &anon_vma); migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, anon_vma, true, ret_folios); list_del(&dst->lru); migrate_folio_undo_dst(dst, true, put_new_folio, private); dst = dst2; dst2 = list_next_entry(dst, lru); } } /* * migrate_pages_batch() first unmaps folios in the from list as many as * possible, then move the unmapped folios. * * We only batch migration if mode == MIGRATE_ASYNC to avoid to wait a * lock or bit when we have locked more than one folio. Which may cause * deadlock (e.g., for loop device). So, if mode != MIGRATE_ASYNC, the * length of the from list must be <= 1. */ static int migrate_pages_batch(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct list_head *split_folios, struct migrate_pages_stats *stats, int nr_pass) { int retry = 1; int thp_retry = 1; int nr_failed = 0; int nr_retry_pages = 0; int pass = 0; bool is_thp = false; bool is_large = false; struct folio *folio, *folio2, *dst = NULL; int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); bool nosplit = (reason == MR_NUMA_MISPLACED); VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); for (pass = 0; pass < nr_pass && retry; pass++) { retry = 0; thp_retry = 0; nr_retry_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { is_large = folio_test_large(folio); is_thp = folio_test_pmd_mappable(folio); nr_pages = folio_nr_pages(folio); cond_resched(); /* * The rare folio on the deferred split list should * be split now. It should not count as a failure: * but increment nr_failed because, without doing so, * migrate_pages() may report success with (split but * unmigrated) pages still on its fromlist; whereas it * always reports success when its fromlist is empty. * stats->nr_thp_failed should be increased too, * otherwise stats inconsistency will happen when * migrate_pages_batch is called via migrate_pages() * with MIGRATE_SYNC and MIGRATE_ASYNC. * * Only check it without removing it from the list. * Since the folio can be on deferred_split_scan() * local list and removing it can cause the local list * corruption. Folio split process below can handle it * with the help of folio_ref_freeze(). * * nr_pages > 2 is needed to avoid checking order-1 * page cache folios. They exist, in contrast to * non-existent order-1 anonymous folios, and do not * use _deferred_list. */ if (nr_pages > 2 && !list_empty(&folio->_deferred_list) && folio_test_partially_mapped(folio)) { if (!try_split_folio(folio, split_folios, mode)) { nr_failed++; stats->nr_thp_failed += is_thp; stats->nr_thp_split += is_thp; stats->nr_split++; continue; } } /* * Large folio migration might be unsupported or * the allocation might be failed so we should retry * on the same folio with the large folio split * to normal folios. * * Split folios are put in split_folios, and * we will migrate them after the rest of the * list is processed. */ if (!thp_migration_supported() && is_thp) { nr_failed++; stats->nr_thp_failed++; if (!try_split_folio(folio, split_folios, mode)) { stats->nr_thp_split++; stats->nr_split++; continue; } stats->nr_failed_pages += nr_pages; list_move_tail(&folio->lru, ret_folios); continue; } rc = migrate_folio_unmap(get_new_folio, put_new_folio, private, folio, &dst, mode, reason, ret_folios); /* * The rules are: * Success: folio will be freed * Unmap: folio will be put on unmap_folios list, * dst folio put on dst_folios list * -EAGAIN: stay on the from list * -ENOMEM: stay on the from list * Other errno: put on ret_folios list */ switch(rc) { case -ENOMEM: /* * When memory is low, don't bother to try to migrate * other folios, move unmapped folios, then exit. */ nr_failed++; stats->nr_thp_failed += is_thp; /* Large folio NUMA faulting doesn't split to retry. */ if (is_large && !nosplit) { int ret = try_split_folio(folio, split_folios, mode); if (!ret) { stats->nr_thp_split += is_thp; stats->nr_split++; break; } else if (reason == MR_LONGTERM_PIN && ret == -EAGAIN) { /* * Try again to split large folio to * mitigate the failure of longterm pinning. */ retry++; thp_retry += is_thp; nr_retry_pages += nr_pages; /* Undo duplicated failure counting. */ nr_failed--; stats->nr_thp_failed -= is_thp; break; } } stats->nr_failed_pages += nr_pages + nr_retry_pages; /* nr_failed isn't updated for not used */ stats->nr_thp_failed += thp_retry; rc_saved = rc; if (list_empty(&unmap_folios)) goto out; else goto move; case -EAGAIN: retry++; thp_retry += is_thp; nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; stats->nr_thp_succeeded += is_thp; break; case MIGRATEPAGE_UNMAP: list_move_tail(&folio->lru, &unmap_folios); list_add_tail(&dst->lru, &dst_folios); break; default: /* * Permanent failure (-EBUSY, etc.): * unlike -EAGAIN case, the failed folio is * removed from migration folio list and not * retried in the next outer loop. */ nr_failed++; stats->nr_thp_failed += is_thp; stats->nr_failed_pages += nr_pages; break; } } } nr_failed += retry; stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); retry = 1; for (pass = 0; pass < nr_pass && retry; pass++) { retry = 0; thp_retry = 0; nr_retry_pages = 0; /* Move the unmapped folios */ migrate_folios_move(&unmap_folios, &dst_folios, put_new_folio, private, mode, reason, ret_folios, stats, &retry, &thp_retry, &nr_failed, &nr_retry_pages); } nr_failed += retry; stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; rc = rc_saved ? : nr_failed; out: /* Cleanup remaining folios */ migrate_folios_undo(&unmap_folios, &dst_folios, put_new_folio, private, ret_folios); return rc; } static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct list_head *split_folios, struct migrate_pages_stats *stats) { int rc, nr_failed = 0; LIST_HEAD(folios); struct migrate_pages_stats astats; memset(&astats, 0, sizeof(astats)); /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ rc = migrate_pages_batch(from, get_new_folio, put_new_folio, private, MIGRATE_ASYNC, reason, &folios, split_folios, &astats, NR_MAX_MIGRATE_ASYNC_RETRY); stats->nr_succeeded += astats.nr_succeeded; stats->nr_thp_succeeded += astats.nr_thp_succeeded; stats->nr_thp_split += astats.nr_thp_split; stats->nr_split += astats.nr_split; if (rc < 0) { stats->nr_failed_pages += astats.nr_failed_pages; stats->nr_thp_failed += astats.nr_thp_failed; list_splice_tail(&folios, ret_folios); return rc; } stats->nr_thp_failed += astats.nr_thp_split; /* * Do not count rc, as pages will be retried below. * Count nr_split only, since it includes nr_thp_split. */ nr_failed += astats.nr_split; /* * Fall back to migrate all failed folios one by one synchronously. All * failed folios except split THPs will be retried, so their failure * isn't counted */ list_splice_tail_init(&folios, from); while (!list_empty(from)) { list_move(from->next, &folios); rc = migrate_pages_batch(&folios, get_new_folio, put_new_folio, private, mode, reason, ret_folios, split_folios, stats, NR_MAX_MIGRATE_SYNC_RETRY); list_splice_tail_init(&folios, ret_folios); if (rc < 0) return rc; nr_failed += rc; } return nr_failed; } /* * migrate_pages - migrate the folios specified in a list, to the free folios * supplied as the target for the page migration * * @from: The list of folios to be migrated. * @get_new_folio: The function used to allocate free folios to be used * as the target of the folio migration. * @put_new_folio: The function used to free target folios if migration * fails, or NULL if no special handling is necessary. * @private: Private data to be passed on to get_new_folio() * @mode: The migration mode that specifies the constraints for * folio migration, if any. * @reason: The reason for folio migration. * @ret_succeeded: Set to the number of folios migrated successfully if * the caller passes a non-NULL pointer. * * The function returns after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no folios * are movable any more because the list has become empty or no retryable folios * exist any more. It is caller's responsibility to call putback_movable_pages() * only if ret != 0. * * Returns the number of {normal folio, large folio, hugetlb} that were not * migrated, or an error code. The number of large folio splits will be * considered as the number of non-migrated large folio, no matter how many * split folios of the large folio are migrated successfully. */ int migrate_pages(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, unsigned int *ret_succeeded) { int rc, rc_gather; int nr_pages; struct folio *folio, *folio2; LIST_HEAD(folios); LIST_HEAD(ret_folios); LIST_HEAD(split_folios); struct migrate_pages_stats stats; trace_mm_migrate_pages_start(mode, reason); memset(&stats, 0, sizeof(stats)); rc_gather = migrate_hugetlbs(from, get_new_folio, put_new_folio, private, mode, reason, &stats, &ret_folios); if (rc_gather < 0) goto out; again: nr_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { /* Retried hugetlb folios will be kept in list */ if (folio_test_hugetlb(folio)) { list_move_tail(&folio->lru, &ret_folios); continue; } nr_pages += folio_nr_pages(folio); if (nr_pages >= NR_MAX_BATCHED_MIGRATION) break; } if (nr_pages >= NR_MAX_BATCHED_MIGRATION) list_cut_before(&folios, from, &folio2->lru); else list_splice_init(from, &folios); if (mode == MIGRATE_ASYNC) rc = migrate_pages_batch(&folios, get_new_folio, put_new_folio, private, mode, reason, &ret_folios, &split_folios, &stats, NR_MAX_MIGRATE_PAGES_RETRY); else rc = migrate_pages_sync(&folios, get_new_folio, put_new_folio, private, mode, reason, &ret_folios, &split_folios, &stats); list_splice_tail_init(&folios, &ret_folios); if (rc < 0) { rc_gather = rc; list_splice_tail(&split_folios, &ret_folios); goto out; } if (!list_empty(&split_folios)) { /* * Failure isn't counted since all split folios of a large folio * is counted as 1 failure already. And, we only try to migrate * with minimal effort, force MIGRATE_ASYNC mode and retry once. */ migrate_pages_batch(&split_folios, get_new_folio, put_new_folio, private, MIGRATE_ASYNC, reason, &ret_folios, NULL, &stats, 1); list_splice_tail_init(&split_folios, &ret_folios); } rc_gather += rc; if (!list_empty(from)) goto again; out: /* * Put the permanent failure folio back to migration list, they * will be put back to the right list by the caller. */ list_splice(&ret_folios, from); /* * Return 0 in case all split folios of fail-to-migrate large folios * are migrated successfully. */ if (list_empty(from)) rc_gather = 0; count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded); count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages); count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded); count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed); count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split); trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages, stats.nr_thp_succeeded, stats.nr_thp_failed, stats.nr_thp_split, stats.nr_split, mode, reason); if (ret_succeeded) *ret_succeeded = stats.nr_succeeded; return rc_gather; } struct folio *alloc_migration_target(struct folio *src, unsigned long private) { struct migration_target_control *mtc; gfp_t gfp_mask; unsigned int order = 0; int nid; int zidx; mtc = (struct migration_target_control *)private; gfp_mask = mtc->gfp_mask; nid = mtc->nid; if (nid == NUMA_NO_NODE) nid = folio_nid(src); if (folio_test_hugetlb(src)) { struct hstate *h = folio_hstate(src); gfp_mask = htlb_modify_alloc_mask(h, gfp_mask); return alloc_hugetlb_folio_nodemask(h, nid, mtc->nmask, gfp_mask, htlb_allow_alloc_fallback(mtc->reason)); } if (folio_test_large(src)) { /* * clear __GFP_RECLAIM to make the migration callback * consistent with regular THP allocations. */ gfp_mask &= ~__GFP_RECLAIM; gfp_mask |= GFP_TRANSHUGE; order = folio_order(src); } zidx = zone_idx(folio_zone(src)); if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE) gfp_mask |= __GFP_HIGHMEM; return __folio_alloc(gfp_mask, order, nid, mtc->nmask); } #ifdef CONFIG_NUMA static int store_status(int __user *status, int start, int value, int nr) { while (nr-- > 0) { if (put_user(value, status + start)) return -EFAULT; start++; } return 0; } static int do_move_pages_to_node(struct list_head *pagelist, int node) { int err; struct migration_target_control mtc = { .nid = node, .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, .reason = MR_SYSCALL, }; err = migrate_pages(pagelist, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL, NULL); if (err) putback_movable_pages(pagelist); return err; } static int __add_folio_for_migration(struct folio *folio, int node, struct list_head *pagelist, bool migrate_all) { if (is_zero_folio(folio) || is_huge_zero_folio(folio)) return -EFAULT; if (folio_is_zone_device(folio)) return -ENOENT; if (folio_nid(folio) == node) return 0; if (folio_maybe_mapped_shared(folio) && !migrate_all) return -EACCES; if (folio_test_hugetlb(folio)) { if (folio_isolate_hugetlb(folio, pagelist)) return 1; } else if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, pagelist); node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), folio_nr_pages(folio)); return 1; } return -EBUSY; } /* * Resolves the given address to a struct folio, isolates it from the LRU and * puts it to the given pagelist. * Returns: * errno - if the folio cannot be found/isolated * 0 - when it doesn't have to be migrated because it is already on the * target node * 1 - when it has been queued */ static int add_folio_for_migration(struct mm_struct *mm, const void __user *p, int node, struct list_head *pagelist, bool migrate_all) { struct vm_area_struct *vma; struct folio_walk fw; struct folio *folio; unsigned long addr; int err = -EFAULT; mmap_read_lock(mm); addr = (unsigned long)untagged_addr_remote(mm, p); vma = vma_lookup(mm, addr); if (vma && vma_migratable(vma)) { folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); if (folio) { err = __add_folio_for_migration(folio, node, pagelist, migrate_all); folio_walk_end(&fw, vma); } else { err = -ENOENT; } } mmap_read_unlock(mm); return err; } static int move_pages_and_store_status(int node, struct list_head *pagelist, int __user *status, int start, int i, unsigned long nr_pages) { int err; if (list_empty(pagelist)) return 0; err = do_move_pages_to_node(pagelist, node); if (err) { /* * Positive err means the number of failed * pages to migrate. Since we are going to * abort and return the number of non-migrated * pages, so need to include the rest of the * nr_pages that have not been attempted as * well. */ if (err > 0) err += nr_pages - i; return err; } return store_status(status, start, node, i - start); } /* * Migrate an array of page address onto an array of nodes and fill * the corresponding array of status. */ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, unsigned long nr_pages, const void __user * __user *pages, const int __user *nodes, int __user *status, int flags) { compat_uptr_t __user *compat_pages = (void __user *)pages; int current_node = NUMA_NO_NODE; LIST_HEAD(pagelist); int start, i; int err = 0, err1; lru_cache_disable(); for (i = start = 0; i < nr_pages; i++) { const void __user *p; int node; err = -EFAULT; if (in_compat_syscall()) { compat_uptr_t cp; if (get_user(cp, compat_pages + i)) goto out_flush; p = compat_ptr(cp); } else { if (get_user(p, pages + i)) goto out_flush; } if (get_user(node, nodes + i)) goto out_flush; err = -ENODEV; if (node < 0 || node >= MAX_NUMNODES) goto out_flush; if (!node_state(node, N_MEMORY)) goto out_flush; err = -EACCES; if (!node_isset(node, task_nodes)) goto out_flush; if (current_node == NUMA_NO_NODE) { current_node = node; start = i; } else if (node != current_node) { err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) goto out; start = i; current_node = node; } /* * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ err = add_folio_for_migration(mm, p, current_node, &pagelist, flags & MPOL_MF_MOVE_ALL); if (err > 0) { /* The page is successfully queued for migration */ continue; } /* * The move_pages() man page does not have an -EEXIST choice, so * use -EFAULT instead. */ if (err == -EEXIST) err = -EFAULT; /* * If the page is already on the target node (!err), store the * node, otherwise, store the err. */ err = store_status(status, i, err ? : current_node, 1); if (err) goto out_flush; err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) { /* We have accounted for page i */ if (err > 0) err--; goto out; } current_node = NUMA_NO_NODE; } out_flush: /* Make sure we do not overwrite the existing error */ err1 = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err >= 0) err = err1; out: lru_cache_enable(); return err; } /* * Determine the nodes of an array of pages and store it in an array of status. */ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, const void __user **pages, int *status) { unsigned long i; mmap_read_lock(mm); for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); struct vm_area_struct *vma; struct folio_walk fw; struct folio *folio; int err = -EFAULT; vma = vma_lookup(mm, addr); if (!vma) goto set_status; folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); if (folio) { if (is_zero_folio(folio) || is_huge_zero_folio(folio)) err = -EFAULT; else if (folio_is_zone_device(folio)) err = -ENOENT; else err = folio_nid(folio); folio_walk_end(&fw, vma); } else { err = -ENOENT; } set_status: *status = err; pages++; status++; } mmap_read_unlock(mm); } static int get_compat_pages_array(const void __user *chunk_pages[], const void __user * __user *pages, unsigned long chunk_nr) { compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages; compat_uptr_t p; int i; for (i = 0; i < chunk_nr; i++) { if (get_user(p, pages32 + i)) return -EFAULT; chunk_pages[i] = compat_ptr(p); } return 0; } /* * Determine the nodes of a user array of pages and store it in * a user array of status. */ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, const void __user * __user *pages, int __user *status) { #define DO_PAGES_STAT_CHUNK_NR 16UL const void __user *chunk_pages[DO_PAGES_STAT_CHUNK_NR]; int chunk_status[DO_PAGES_STAT_CHUNK_NR]; while (nr_pages) { unsigned long chunk_nr = min(nr_pages, DO_PAGES_STAT_CHUNK_NR); if (in_compat_syscall()) { if (get_compat_pages_array(chunk_pages, pages, chunk_nr)) break; } else { if (copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages))) break; } do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status))) break; pages += chunk_nr; status += chunk_nr; nr_pages -= chunk_nr; } return nr_pages ? -EFAULT : 0; } static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes) { struct task_struct *task; struct mm_struct *mm; /* * There is no need to check if current process has the right to modify * the specified process when they are same. */ if (!pid) { mmget(current->mm); *mem_nodes = cpuset_mems_allowed(current); return current->mm; } task = find_get_task_by_vpid(pid); if (!task) { return ERR_PTR(-ESRCH); } /* * Check if this process has the right to modify the specified * process. Use the regular "ptrace_may_access()" checks. */ if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) { mm = ERR_PTR(-EPERM); goto out; } mm = ERR_PTR(security_task_movememory(task)); if (IS_ERR(mm)) goto out; *mem_nodes = cpuset_mems_allowed(task); mm = get_task_mm(task); out: put_task_struct(task); if (!mm) mm = ERR_PTR(-EINVAL); return mm; } /* * Move a list of pages in the address space of the currently executing * process. */ static int kernel_move_pages(pid_t pid, unsigned long nr_pages, const void __user * __user *pages, const int __user *nodes, int __user *status, int flags) { struct mm_struct *mm; int err; nodemask_t task_nodes; /* Check flags */ if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) return -EPERM; mm = find_mm_struct(pid, &task_nodes); if (IS_ERR(mm)) return PTR_ERR(mm); if (nodes) err = do_pages_move(mm, task_nodes, nr_pages, pages, nodes, status, flags); else err = do_pages_stat(mm, nr_pages, pages, status); mmput(mm); return err; } SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages, const void __user * __user *, pages, const int __user *, nodes, int __user *, status, int, flags) { return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags); } #ifdef CONFIG_NUMA_BALANCING /* * Returns true if this is a safe migration target node for misplaced NUMA * pages. Currently it only checks the watermarks which is crude. */ static bool migrate_balanced_pgdat(struct pglist_data *pgdat, unsigned long nr_migrate_pages) { int z; for (z = pgdat->nr_zones - 1; z >= 0; z--) { struct zone *zone = pgdat->node_zones + z; if (!managed_zone(zone)) continue; /* Avoid waking kswapd by allocating pages_to_migrate pages. */ if (!zone_watermark_ok(zone, 0, high_wmark_pages(zone) + nr_migrate_pages, ZONE_MOVABLE, ALLOC_CMA)) continue; return true; } return false; } static struct folio *alloc_misplaced_dst_folio(struct folio *src, unsigned long data) { int nid = (int) data; int order = folio_order(src); gfp_t gfp = __GFP_THISNODE; if (order > 0) gfp |= GFP_TRANSHUGE_LIGHT; else { gfp |= GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN; gfp &= ~__GFP_RECLAIM; } return __folio_alloc_node(gfp, order, nid); } /* * Prepare for calling migrate_misplaced_folio() by isolating the folio if * permitted. Must be called with the PTL still held. */ int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) { int nr_pages = folio_nr_pages(folio); pg_data_t *pgdat = NODE_DATA(node); if (folio_is_file_lru(folio)) { /* * Do not migrate file folios that are mapped in multiple * processes with execute permissions as they are probably * shared libraries. * * See folio_maybe_mapped_shared() on possible imprecision * when we cannot easily detect if a folio is shared. */ if ((vma->vm_flags & VM_EXEC) && folio_maybe_mapped_shared(folio)) return -EACCES; /* * Do not migrate dirty folios as not all filesystems can move * dirty folios in MIGRATE_ASYNC mode which is a waste of * cycles. */ if (folio_test_dirty(folio)) return -EAGAIN; } /* Avoid migrating to a node that is nearly full */ if (!migrate_balanced_pgdat(pgdat, nr_pages)) { int z; if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)) return -EAGAIN; for (z = pgdat->nr_zones - 1; z >= 0; z--) { if (managed_zone(pgdat->node_zones + z)) break; } /* * If there are no managed zones, it should not proceed * further. */ if (z < 0) return -EAGAIN; wakeup_kswapd(pgdat->node_zones + z, 0, folio_order(folio), ZONE_MOVABLE); return -EAGAIN; } if (!folio_isolate_lru(folio)) return -EAGAIN; node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), nr_pages); return 0; } /* * Attempt to migrate a misplaced folio to the specified destination * node. Caller is expected to have isolated the folio by calling * migrate_misplaced_folio_prepare(), which will result in an * elevated reference count on the folio. This function will un-isolate the * folio, dereferencing the folio before returning. */ int migrate_misplaced_folio(struct folio *folio, int node) { pg_data_t *pgdat = NODE_DATA(node); int nr_remaining; unsigned int nr_succeeded; LIST_HEAD(migratepages); struct mem_cgroup *memcg = get_mem_cgroup_from_folio(folio); struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); list_add(&folio->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, NULL, node, MIGRATE_ASYNC, MR_NUMA_MISPLACED, &nr_succeeded); if (nr_remaining && !list_empty(&migratepages)) putback_movable_pages(&migratepages); if (nr_succeeded) { count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); count_memcg_events(memcg, NUMA_PAGE_MIGRATE, nr_succeeded); if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && !node_is_toptier(folio_nid(folio)) && node_is_toptier(node)) mod_lruvec_state(lruvec, PGPROMOTE_SUCCESS, nr_succeeded); } mem_cgroup_put(memcg); BUG_ON(!list_empty(&migratepages)); return nr_remaining ? -EAGAIN : 0; } #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ |
| 1 7 7 7 2 6 9 9 8 9 7 7 7 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14 1 1 1 1 1 1 13 13 14 7 6 9 9 9 9 14 14 14 14 14 6 6 6 6 9 9 9 9 6 6 5 6 6 6 6 6 7 8 6 6 6 1 1 5 6 6 6 6 6 6 6 1 2 2 2 1 1 1 1 1 1 1 1 1 1 8 8 2 2 2 2 2 2 2 2 8 8 8 8 8 8 8 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/errno.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/mm.h> #include <linux/slab.h> #include <linux/poll.h> #include <linux/hashtable.h> #include <linux/io_uring.h> #include <trace/events/io_uring.h> #include <uapi/linux/io_uring.h> #include "io_uring.h" #include "alloc_cache.h" #include "refs.h" #include "napi.h" #include "opdef.h" #include "kbuf.h" #include "poll.h" #include "cancel.h" struct io_poll_update { struct file *file; u64 old_user_data; u64 new_user_data; __poll_t events; bool update_events; bool update_user_data; }; struct io_poll_table { struct poll_table_struct pt; struct io_kiocb *req; int nr_entries; int error; bool owning; /* output value, set only if arm poll returns >0 */ __poll_t result_mask; }; #define IO_POLL_CANCEL_FLAG BIT(31) #define IO_POLL_RETRY_FLAG BIT(30) #define IO_POLL_REF_MASK GENMASK(29, 0) /* * We usually have 1-2 refs taken, 128 is more than enough and we want to * maximise the margin between this amount and the moment when it overflows. */ #define IO_POLL_REF_BIAS 128 #define IO_WQE_F_DOUBLE 1 static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key); static inline struct io_kiocb *wqe_to_req(struct wait_queue_entry *wqe) { unsigned long priv = (unsigned long)wqe->private; return (struct io_kiocb *)(priv & ~IO_WQE_F_DOUBLE); } static inline bool wqe_is_double(struct wait_queue_entry *wqe) { unsigned long priv = (unsigned long)wqe->private; return priv & IO_WQE_F_DOUBLE; } static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) { int v; /* * poll_refs are already elevated and we don't have much hope for * grabbing the ownership. Instead of incrementing set a retry flag * to notify the loop that there might have been some change. */ v = atomic_fetch_or(IO_POLL_RETRY_FLAG, &req->poll_refs); if (v & IO_POLL_REF_MASK) return false; return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); } /* * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can * bump it and acquire ownership. It's disallowed to modify requests while not * owning it, that prevents from races for enqueueing task_work's and b/w * arming poll and wakeups. */ static inline bool io_poll_get_ownership(struct io_kiocb *req) { if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) return io_poll_get_ownership_slowpath(req); return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); } static void io_poll_mark_cancelled(struct io_kiocb *req) { atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); } static struct io_poll *io_poll_get_double(struct io_kiocb *req) { /* pure poll stashes this in ->async_data, poll driven retry elsewhere */ if (req->opcode == IORING_OP_POLL_ADD) return req->async_data; return req->apoll->double_poll; } static struct io_poll *io_poll_get_single(struct io_kiocb *req) { if (req->opcode == IORING_OP_POLL_ADD) return io_kiocb_to_cmd(req, struct io_poll); return &req->apoll->poll; } static void io_poll_req_insert(struct io_kiocb *req) { struct io_hash_table *table = &req->ctx->cancel_table; u32 index = hash_long(req->cqe.user_data, table->hash_bits); lockdep_assert_held(&req->ctx->uring_lock); hlist_add_head(&req->hash_node, &table->hbs[index].list); } static void io_init_poll_iocb(struct io_poll *poll, __poll_t events) { poll->head = NULL; #define IO_POLL_UNMASK (EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP) /* mask in events that we always want/need */ poll->events = events | IO_POLL_UNMASK; INIT_LIST_HEAD(&poll->wait.entry); init_waitqueue_func_entry(&poll->wait, io_poll_wake); } static inline void io_poll_remove_entry(struct io_poll *poll) { struct wait_queue_head *head = smp_load_acquire(&poll->head); if (head) { spin_lock_irq(&head->lock); list_del_init(&poll->wait.entry); poll->head = NULL; spin_unlock_irq(&head->lock); } } static void io_poll_remove_entries(struct io_kiocb *req) { /* * Nothing to do if neither of those flags are set. Avoid dipping * into the poll/apoll/double cachelines if we can. */ if (!(req->flags & (REQ_F_SINGLE_POLL | REQ_F_DOUBLE_POLL))) return; /* * While we hold the waitqueue lock and the waitqueue is nonempty, * wake_up_pollfree() will wait for us. However, taking the waitqueue * lock in the first place can race with the waitqueue being freed. * * We solve this as eventpoll does: by taking advantage of the fact that * all users of wake_up_pollfree() will RCU-delay the actual free. If * we enter rcu_read_lock() and see that the pointer to the queue is * non-NULL, we can then lock it without the memory being freed out from * under us. * * Keep holding rcu_read_lock() as long as we hold the queue lock, in * case the caller deletes the entry from the queue, leaving it empty. * In that case, only RCU prevents the queue memory from being freed. */ rcu_read_lock(); if (req->flags & REQ_F_SINGLE_POLL) io_poll_remove_entry(io_poll_get_single(req)); if (req->flags & REQ_F_DOUBLE_POLL) io_poll_remove_entry(io_poll_get_double(req)); rcu_read_unlock(); } enum { IOU_POLL_DONE = 0, IOU_POLL_NO_ACTION = 1, IOU_POLL_REMOVE_POLL_USE_RES = 2, IOU_POLL_REISSUE = 3, IOU_POLL_REQUEUE = 4, }; static void __io_poll_execute(struct io_kiocb *req, int mask) { unsigned flags = 0; io_req_set_res(req, mask, 0); req->io_task_work.func = io_poll_task_func; trace_io_uring_task_add(req, mask); if (!(req->flags & REQ_F_POLL_NO_LAZY)) flags = IOU_F_TWQ_LAZY_WAKE; __io_req_task_work_add(req, flags); } static inline void io_poll_execute(struct io_kiocb *req, int res) { if (io_poll_get_ownership(req)) __io_poll_execute(req, res); } /* * All poll tw should go through this. Checks for poll events, manages * references, does rewait, etc. * * Returns a negative error on failure. IOU_POLL_NO_ACTION when no action * require, which is either spurious wakeup or multishot CQE is served. * IOU_POLL_DONE when it's done with the request, then the mask is stored in * req->cqe.res. IOU_POLL_REMOVE_POLL_USE_RES indicates to remove multishot * poll and that the result is stored in req->cqe. */ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw) { int v; if (unlikely(io_should_terminate_tw())) return -ECANCELED; do { v = atomic_read(&req->poll_refs); if (unlikely(v != 1)) { /* tw should be the owner and so have some refs */ if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK))) return IOU_POLL_NO_ACTION; if (v & IO_POLL_CANCEL_FLAG) return -ECANCELED; /* * cqe.res contains only events of the first wake up * and all others are to be lost. Redo vfs_poll() to get * up to date state. */ if ((v & IO_POLL_REF_MASK) != 1) req->cqe.res = 0; if (v & IO_POLL_RETRY_FLAG) { req->cqe.res = 0; /* * We won't find new events that came in between * vfs_poll and the ref put unless we clear the * flag in advance. */ atomic_andnot(IO_POLL_RETRY_FLAG, &req->poll_refs); v &= ~IO_POLL_RETRY_FLAG; } } /* the mask was stashed in __io_poll_execute */ if (!req->cqe.res) { struct poll_table_struct pt = { ._key = req->apoll_events }; req->cqe.res = vfs_poll(req->file, &pt) & req->apoll_events; /* * We got woken with a mask, but someone else got to * it first. The above vfs_poll() doesn't add us back * to the waitqueue, so if we get nothing back, we * should be safe and attempt a reissue. */ if (unlikely(!req->cqe.res)) { /* Multishot armed need not reissue */ if (!(req->apoll_events & EPOLLONESHOT)) continue; return IOU_POLL_REISSUE; } } if (unlikely(req->cqe.res & EPOLLERR)) req_set_fail(req); if (req->apoll_events & EPOLLONESHOT) return IOU_POLL_DONE; /* multishot, just fill a CQE and proceed */ if (!(req->flags & REQ_F_APOLL_MULTISHOT)) { __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events); if (!io_req_post_cqe(req, mask, IORING_CQE_F_MORE)) { io_req_set_res(req, mask, 0); return IOU_POLL_REMOVE_POLL_USE_RES; } } else { int ret = io_poll_issue(req, tw); if (ret == IOU_COMPLETE) return IOU_POLL_REMOVE_POLL_USE_RES; else if (ret == IOU_REQUEUE) return IOU_POLL_REQUEUE; if (ret != IOU_RETRY && ret < 0) return ret; } /* force the next iteration to vfs_poll() */ req->cqe.res = 0; /* * Release all references, retry if someone tried to restart * task_work while we were executing it. */ v &= IO_POLL_REF_MASK; } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK); io_napi_add(req); return IOU_POLL_NO_ACTION; } void io_poll_task_func(struct io_kiocb *req, io_tw_token_t tw) { int ret; ret = io_poll_check_events(req, tw); if (ret == IOU_POLL_NO_ACTION) { io_kbuf_recycle(req, 0); return; } else if (ret == IOU_POLL_REQUEUE) { io_kbuf_recycle(req, 0); __io_poll_execute(req, 0); return; } io_poll_remove_entries(req); /* task_work always has ->uring_lock held */ hash_del(&req->hash_node); if (req->opcode == IORING_OP_POLL_ADD) { if (ret == IOU_POLL_DONE) { struct io_poll *poll; poll = io_kiocb_to_cmd(req, struct io_poll); req->cqe.res = mangle_poll(req->cqe.res & poll->events); } else if (ret == IOU_POLL_REISSUE) { io_req_task_submit(req, tw); return; } else if (ret != IOU_POLL_REMOVE_POLL_USE_RES) { req->cqe.res = ret; req_set_fail(req); } io_req_set_res(req, req->cqe.res, 0); io_req_task_complete(req, tw); } else { io_tw_lock(req->ctx, tw); if (ret == IOU_POLL_REMOVE_POLL_USE_RES) io_req_task_complete(req, tw); else if (ret == IOU_POLL_DONE || ret == IOU_POLL_REISSUE) io_req_task_submit(req, tw); else io_req_defer_failed(req, ret); } } static void io_poll_cancel_req(struct io_kiocb *req) { io_poll_mark_cancelled(req); /* kick tw, which should complete the request */ io_poll_execute(req, 0); } #define IO_ASYNC_POLL_COMMON (EPOLLONESHOT | EPOLLPRI) static __cold int io_pollfree_wake(struct io_kiocb *req, struct io_poll *poll) { io_poll_mark_cancelled(req); /* we have to kick tw in case it's not already */ io_poll_execute(req, 0); /* * If the waitqueue is being freed early but someone is already * holds ownership over it, we have to tear down the request as * best we can. That means immediately removing the request from * its waitqueue and preventing all further accesses to the * waitqueue via the request. */ list_del_init(&poll->wait.entry); /* * Careful: this *must* be the last step, since as soon * as req->head is NULL'ed out, the request can be * completed and freed, since aio_poll_complete_work() * will no longer need to take the waitqueue lock. */ smp_store_release(&poll->head, NULL); return 1; } static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key) { struct io_kiocb *req = wqe_to_req(wait); struct io_poll *poll = container_of(wait, struct io_poll, wait); __poll_t mask = key_to_poll(key); if (unlikely(mask & POLLFREE)) return io_pollfree_wake(req, poll); /* for instances that support it check for an event match first */ if (mask && !(mask & (poll->events & ~IO_ASYNC_POLL_COMMON))) return 0; if (io_poll_get_ownership(req)) { /* * If we trigger a multishot poll off our own wakeup path, * disable multishot as there is a circular dependency between * CQ posting and triggering the event. */ if (mask & EPOLL_URING_WAKE) poll->events |= EPOLLONESHOT; /* optional, saves extra locking for removal in tw handler */ if (mask && poll->events & EPOLLONESHOT) { list_del_init(&poll->wait.entry); poll->head = NULL; if (wqe_is_double(wait)) req->flags &= ~REQ_F_DOUBLE_POLL; else req->flags &= ~REQ_F_SINGLE_POLL; } __io_poll_execute(req, mask); } return 1; } /* fails only when polling is already completing by the first entry */ static bool io_poll_double_prepare(struct io_kiocb *req) { struct wait_queue_head *head; struct io_poll *poll = io_poll_get_single(req); /* head is RCU protected, see io_poll_remove_entries() comments */ rcu_read_lock(); head = smp_load_acquire(&poll->head); /* * poll arm might not hold ownership and so race for req->flags with * io_poll_wake(). There is only one poll entry queued, serialise with * it by taking its head lock. As we're still arming the tw hanlder * is not going to be run, so there are no races with it. */ if (head) { spin_lock_irq(&head->lock); req->flags |= REQ_F_DOUBLE_POLL; if (req->opcode == IORING_OP_POLL_ADD) req->flags |= REQ_F_ASYNC_DATA; spin_unlock_irq(&head->lock); } rcu_read_unlock(); return !!head; } static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt, struct wait_queue_head *head, struct io_poll **poll_ptr) { struct io_kiocb *req = pt->req; unsigned long wqe_private = (unsigned long) req; /* * The file being polled uses multiple waitqueues for poll handling * (e.g. one for read, one for write). Setup a separate io_poll * if this happens. */ if (unlikely(pt->nr_entries)) { struct io_poll *first = poll; /* double add on the same waitqueue head, ignore */ if (first->head == head) return; /* already have a 2nd entry, fail a third attempt */ if (*poll_ptr) { if ((*poll_ptr)->head == head) return; pt->error = -EINVAL; return; } poll = kmalloc(sizeof(*poll), GFP_ATOMIC); if (!poll) { pt->error = -ENOMEM; return; } /* mark as double wq entry */ wqe_private |= IO_WQE_F_DOUBLE; io_init_poll_iocb(poll, first->events); if (!io_poll_double_prepare(req)) { /* the request is completing, just back off */ kfree(poll); return; } *poll_ptr = poll; } else { /* fine to modify, there is no poll queued to race with us */ req->flags |= REQ_F_SINGLE_POLL; } pt->nr_entries++; poll->head = head; poll->wait.private = (void *) wqe_private; if (poll->events & EPOLLEXCLUSIVE) { add_wait_queue_exclusive(head, &poll->wait); } else { add_wait_queue(head, &poll->wait); } } static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head, struct poll_table_struct *p) { struct io_poll_table *pt = container_of(p, struct io_poll_table, pt); struct io_poll *poll = io_kiocb_to_cmd(pt->req, struct io_poll); __io_queue_proc(poll, pt, head, (struct io_poll **) &pt->req->async_data); } static bool io_poll_can_finish_inline(struct io_kiocb *req, struct io_poll_table *pt) { return pt->owning || io_poll_get_ownership(req); } static void io_poll_add_hash(struct io_kiocb *req, unsigned int issue_flags) { struct io_ring_ctx *ctx = req->ctx; io_ring_submit_lock(ctx, issue_flags); io_poll_req_insert(req); io_ring_submit_unlock(ctx, issue_flags); } /* * Returns 0 when it's handed over for polling. The caller owns the requests if * it returns non-zero, but otherwise should not touch it. Negative values * contain an error code. When the result is >0, the polling has completed * inline and ipt.result_mask is set to the mask. */ static int __io_arm_poll_handler(struct io_kiocb *req, struct io_poll *poll, struct io_poll_table *ipt, __poll_t mask, unsigned issue_flags) { INIT_HLIST_NODE(&req->hash_node); io_init_poll_iocb(poll, mask); poll->file = req->file; req->apoll_events = poll->events; ipt->pt._key = mask; ipt->req = req; ipt->error = 0; ipt->nr_entries = 0; /* * Polling is either completed here or via task_work, so if we're in the * task context we're naturally serialised with tw by merit of running * the same task. When it's io-wq, take the ownership to prevent tw * from running. However, when we're in the task context, skip taking * it as an optimisation. * * Note: even though the request won't be completed/freed, without * ownership we still can race with io_poll_wake(). * io_poll_can_finish_inline() tries to deal with that. */ ipt->owning = issue_flags & IO_URING_F_UNLOCKED; atomic_set(&req->poll_refs, (int)ipt->owning); /* * Exclusive waits may only wake a limited amount of entries * rather than all of them, this may interfere with lazy * wake if someone does wait(events > 1). Ensure we don't do * lazy wake for those, as we need to process each one as they * come in. */ if (poll->events & EPOLLEXCLUSIVE) req->flags |= REQ_F_POLL_NO_LAZY; mask = vfs_poll(req->file, &ipt->pt) & poll->events; if (unlikely(ipt->error || !ipt->nr_entries)) { io_poll_remove_entries(req); if (!io_poll_can_finish_inline(req, ipt)) { io_poll_mark_cancelled(req); return 0; } else if (mask && (poll->events & EPOLLET)) { ipt->result_mask = mask; return 1; } return ipt->error ?: -EINVAL; } if (mask && ((poll->events & (EPOLLET|EPOLLONESHOT)) == (EPOLLET|EPOLLONESHOT))) { if (!io_poll_can_finish_inline(req, ipt)) { io_poll_add_hash(req, issue_flags); return 0; } io_poll_remove_entries(req); ipt->result_mask = mask; /* no one else has access to the req, forget about the ref */ return 1; } io_poll_add_hash(req, issue_flags); if (mask && (poll->events & EPOLLET) && io_poll_can_finish_inline(req, ipt)) { __io_poll_execute(req, mask); return 0; } io_napi_add(req); if (ipt->owning) { /* * Try to release ownership. If we see a change of state, e.g. * poll was waken up, queue up a tw, it'll deal with it. */ if (atomic_cmpxchg(&req->poll_refs, 1, 0) != 1) __io_poll_execute(req, 0); } return 0; } static void io_async_queue_proc(struct file *file, struct wait_queue_head *head, struct poll_table_struct *p) { struct io_poll_table *pt = container_of(p, struct io_poll_table, pt); struct async_poll *apoll = pt->req->apoll; __io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll); } /* * We can't reliably detect loops in repeated poll triggers and issue * subsequently failing. But rather than fail these immediately, allow a * certain amount of retries before we give up. Given that this condition * should _rarely_ trigger even once, we should be fine with a larger value. */ #define APOLL_MAX_RETRY 128 static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req, unsigned issue_flags) { struct io_ring_ctx *ctx = req->ctx; struct async_poll *apoll; if (req->flags & REQ_F_POLLED) { apoll = req->apoll; kfree(apoll->double_poll); } else { if (!(issue_flags & IO_URING_F_UNLOCKED)) apoll = io_cache_alloc(&ctx->apoll_cache, GFP_ATOMIC); else apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC); if (!apoll) return NULL; apoll->poll.retries = APOLL_MAX_RETRY; } apoll->double_poll = NULL; req->apoll = apoll; if (unlikely(!--apoll->poll.retries)) return NULL; return apoll; } int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) { const struct io_issue_def *def = &io_issue_defs[req->opcode]; struct async_poll *apoll; struct io_poll_table ipt; __poll_t mask = POLLPRI | POLLERR | EPOLLET; int ret; if (!def->pollin && !def->pollout) return IO_APOLL_ABORTED; if (!io_file_can_poll(req)) return IO_APOLL_ABORTED; if (!(req->flags & REQ_F_APOLL_MULTISHOT)) mask |= EPOLLONESHOT; if (def->pollin) { mask |= EPOLLIN | EPOLLRDNORM; /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ if (req->flags & REQ_F_CLEAR_POLLIN) mask &= ~EPOLLIN; } else { mask |= EPOLLOUT | EPOLLWRNORM; } if (def->poll_exclusive) mask |= EPOLLEXCLUSIVE; apoll = io_req_alloc_apoll(req, issue_flags); if (!apoll) return IO_APOLL_ABORTED; req->flags &= ~(REQ_F_SINGLE_POLL | REQ_F_DOUBLE_POLL); req->flags |= REQ_F_POLLED; ipt.pt._qproc = io_async_queue_proc; io_kbuf_recycle(req, issue_flags); ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask, issue_flags); if (ret) return ret > 0 ? IO_APOLL_READY : IO_APOLL_ABORTED; trace_io_uring_poll_arm(req, mask, apoll->poll.events); return IO_APOLL_OK; } /* * Returns true if we found and killed one or more poll requests */ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, bool cancel_all) { unsigned nr_buckets = 1U << ctx->cancel_table.hash_bits; struct hlist_node *tmp; struct io_kiocb *req; bool found = false; int i; lockdep_assert_held(&ctx->uring_lock); for (i = 0; i < nr_buckets; i++) { struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i]; hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) { if (io_match_task_safe(req, tctx, cancel_all)) { hlist_del_init(&req->hash_node); io_poll_cancel_req(req); found = true; } } } return found; } static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only, struct io_cancel_data *cd) { struct io_kiocb *req; u32 index = hash_long(cd->data, ctx->cancel_table.hash_bits); struct io_hash_bucket *hb = &ctx->cancel_table.hbs[index]; hlist_for_each_entry(req, &hb->list, hash_node) { if (cd->data != req->cqe.user_data) continue; if (poll_only && req->opcode != IORING_OP_POLL_ADD) continue; if (cd->flags & IORING_ASYNC_CANCEL_ALL) { if (io_cancel_match_sequence(req, cd->seq)) continue; } return req; } return NULL; } static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx, struct io_cancel_data *cd) { unsigned nr_buckets = 1U << ctx->cancel_table.hash_bits; struct io_kiocb *req; int i; for (i = 0; i < nr_buckets; i++) { struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i]; hlist_for_each_entry(req, &hb->list, hash_node) { if (io_cancel_req_match(req, cd)) return req; } } return NULL; } static int io_poll_disarm(struct io_kiocb *req) { if (!req) return -ENOENT; if (!io_poll_get_ownership(req)) return -EALREADY; io_poll_remove_entries(req); hash_del(&req->hash_node); return 0; } static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd) { struct io_kiocb *req; if (cd->flags & (IORING_ASYNC_CANCEL_FD | IORING_ASYNC_CANCEL_OP | IORING_ASYNC_CANCEL_ANY)) req = io_poll_file_find(ctx, cd); else req = io_poll_find(ctx, false, cd); if (req) { io_poll_cancel_req(req); return 0; } return -ENOENT; } int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, unsigned issue_flags) { int ret; io_ring_submit_lock(ctx, issue_flags); ret = __io_poll_cancel(ctx, cd); io_ring_submit_unlock(ctx, issue_flags); return ret; } static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe, unsigned int flags) { u32 events; events = READ_ONCE(sqe->poll32_events); #ifdef __BIG_ENDIAN events = swahw32(events); #endif if (!(flags & IORING_POLL_ADD_MULTI)) events |= EPOLLONESHOT; if (!(flags & IORING_POLL_ADD_LEVEL)) events |= EPOLLET; return demangle_poll(events) | (events & (EPOLLEXCLUSIVE|EPOLLONESHOT|EPOLLET)); } int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_poll_update *upd = io_kiocb_to_cmd(req, struct io_poll_update); u32 flags; if (sqe->buf_index || sqe->splice_fd_in) return -EINVAL; flags = READ_ONCE(sqe->len); if (flags & ~(IORING_POLL_UPDATE_EVENTS | IORING_POLL_UPDATE_USER_DATA | IORING_POLL_ADD_MULTI)) return -EINVAL; /* meaningless without update */ if (flags == IORING_POLL_ADD_MULTI) return -EINVAL; upd->old_user_data = READ_ONCE(sqe->addr); upd->update_events = flags & IORING_POLL_UPDATE_EVENTS; upd->update_user_data = flags & IORING_POLL_UPDATE_USER_DATA; upd->new_user_data = READ_ONCE(sqe->off); if (!upd->update_user_data && upd->new_user_data) return -EINVAL; if (upd->update_events) upd->events = io_poll_parse_events(sqe, flags); else if (sqe->poll32_events) return -EINVAL; return 0; } int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_poll *poll = io_kiocb_to_cmd(req, struct io_poll); u32 flags; if (sqe->buf_index || sqe->off || sqe->addr) return -EINVAL; flags = READ_ONCE(sqe->len); if (flags & ~IORING_POLL_ADD_MULTI) return -EINVAL; if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP)) return -EINVAL; poll->events = io_poll_parse_events(sqe, flags); return 0; } int io_poll_add(struct io_kiocb *req, unsigned int issue_flags) { struct io_poll *poll = io_kiocb_to_cmd(req, struct io_poll); struct io_poll_table ipt; int ret; ipt.pt._qproc = io_poll_queue_proc; ret = __io_arm_poll_handler(req, poll, &ipt, poll->events, issue_flags); if (ret > 0) { io_req_set_res(req, ipt.result_mask, 0); return IOU_COMPLETE; } return ret ?: IOU_ISSUE_SKIP_COMPLETE; } int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags) { struct io_poll_update *poll_update = io_kiocb_to_cmd(req, struct io_poll_update); struct io_ring_ctx *ctx = req->ctx; struct io_cancel_data cd = { .ctx = ctx, .data = poll_update->old_user_data, }; struct io_kiocb *preq; int ret2, ret = 0; io_ring_submit_lock(ctx, issue_flags); preq = io_poll_find(ctx, true, &cd); ret2 = io_poll_disarm(preq); if (ret2) { ret = ret2; goto out; } if (WARN_ON_ONCE(preq->opcode != IORING_OP_POLL_ADD)) { ret = -EFAULT; goto out; } if (poll_update->update_events || poll_update->update_user_data) { /* only mask one event flags, keep behavior flags */ if (poll_update->update_events) { struct io_poll *poll = io_kiocb_to_cmd(preq, struct io_poll); poll->events &= ~0xffff; poll->events |= poll_update->events & 0xffff; poll->events |= IO_POLL_UNMASK; } if (poll_update->update_user_data) preq->cqe.user_data = poll_update->new_user_data; ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED); /* successfully updated, don't complete poll request */ if (!ret2 || ret2 == -EIOCBQUEUED) goto out; } req_set_fail(preq); io_req_set_res(preq, -ECANCELED, 0); preq->io_task_work.func = io_req_task_complete; io_req_task_work_add(preq); out: io_ring_submit_unlock(ctx, issue_flags); if (ret < 0) { req_set_fail(req); return ret; } /* complete update request, we're done with it */ io_req_set_res(req, ret, 0); return IOU_COMPLETE; } |
| 1 1 2 1 2 1 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 3 2 2 2 2 2 2 3 2 5 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | // SPDX-License-Identifier: GPL-2.0 /* XDP sockets monitoring support * * Copyright(c) 2019 Intel Corporation. * * Author: Björn Töpel <bjorn.topel@intel.com> */ #include <linux/module.h> #include <net/xdp_sock.h> #include <linux/xdp_diag.h> #include <linux/sock_diag.h> #include "xsk_queue.h" #include "xsk.h" static int xsk_diag_put_info(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xdp_diag_info di = {}; di.ifindex = xs->dev ? xs->dev->ifindex : 0; di.queue_id = xs->queue_id; return nla_put(nlskb, XDP_DIAG_INFO, sizeof(di), &di); } static int xsk_diag_put_ring(const struct xsk_queue *queue, int nl_type, struct sk_buff *nlskb) { struct xdp_diag_ring dr = {}; dr.entries = queue->nentries; return nla_put(nlskb, nl_type, sizeof(dr), &dr); } static int xsk_diag_put_rings_cfg(const struct xdp_sock *xs, struct sk_buff *nlskb) { int err = 0; if (xs->rx) err = xsk_diag_put_ring(xs->rx, XDP_DIAG_RX_RING, nlskb); if (!err && xs->tx) err = xsk_diag_put_ring(xs->tx, XDP_DIAG_TX_RING, nlskb); return err; } static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xsk_buff_pool *pool = xs->pool; struct xdp_umem *umem = xs->umem; struct xdp_diag_umem du = {}; int err; if (!umem) return 0; du.id = umem->id; du.size = umem->size; du.num_pages = umem->npgs; du.chunk_size = umem->chunk_size; du.headroom = umem->headroom; du.ifindex = (pool && pool->netdev) ? pool->netdev->ifindex : 0; du.queue_id = pool ? pool->queue_id : 0; du.flags = 0; if (umem->zc) du.flags |= XDP_DU_F_ZEROCOPY; du.refs = refcount_read(&umem->users); err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du); if (!err && pool && pool->fq) err = xsk_diag_put_ring(pool->fq, XDP_DIAG_UMEM_FILL_RING, nlskb); if (!err && pool && pool->cq) err = xsk_diag_put_ring(pool->cq, XDP_DIAG_UMEM_COMPLETION_RING, nlskb); return err; } static int xsk_diag_put_stats(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xdp_diag_stats du = {}; du.n_rx_dropped = xs->rx_dropped; du.n_rx_invalid = xskq_nb_invalid_descs(xs->rx); du.n_rx_full = xs->rx_queue_full; du.n_fill_ring_empty = xs->pool ? xskq_nb_queue_empty_descs(xs->pool->fq) : 0; du.n_tx_invalid = xskq_nb_invalid_descs(xs->tx); du.n_tx_ring_empty = xskq_nb_queue_empty_descs(xs->tx); return nla_put(nlskb, XDP_DIAG_STATS, sizeof(du), &du); } static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb, struct xdp_diag_req *req, struct user_namespace *user_ns, u32 portid, u32 seq, u32 flags, int sk_ino) { struct xdp_sock *xs = xdp_sk(sk); struct xdp_diag_msg *msg; struct nlmsghdr *nlh; nlh = nlmsg_put(nlskb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*msg), flags); if (!nlh) return -EMSGSIZE; msg = nlmsg_data(nlh); memset(msg, 0, sizeof(*msg)); msg->xdiag_family = AF_XDP; msg->xdiag_type = sk->sk_type; msg->xdiag_ino = sk_ino; sock_diag_save_cookie(sk, msg->xdiag_cookie); mutex_lock(&xs->mutex); if (READ_ONCE(xs->state) == XSK_UNBOUND) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_INFO) && nla_put_u32(nlskb, XDP_DIAG_UID, from_kuid_munged(user_ns, sock_i_uid(sk)))) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_RING_CFG) && xsk_diag_put_rings_cfg(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_UMEM) && xsk_diag_put_umem(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_MEMINFO) && sock_diag_put_meminfo(sk, nlskb, XDP_DIAG_MEMINFO)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_STATS) && xsk_diag_put_stats(xs, nlskb)) goto out_nlmsg_trim; mutex_unlock(&xs->mutex); nlmsg_end(nlskb, nlh); return 0; out_nlmsg_trim: mutex_unlock(&xs->mutex); nlmsg_cancel(nlskb, nlh); return -EMSGSIZE; } static int xsk_diag_dump(struct sk_buff *nlskb, struct netlink_callback *cb) { struct xdp_diag_req *req = nlmsg_data(cb->nlh); struct net *net = sock_net(nlskb->sk); int num = 0, s_num = cb->args[0]; struct sock *sk; mutex_lock(&net->xdp.lock); sk_for_each(sk, &net->xdp.list) { if (!net_eq(sock_net(sk), net)) continue; if (num++ < s_num) continue; if (xsk_diag_fill(sk, nlskb, req, sk_user_ns(NETLINK_CB(cb->skb).sk), NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, sock_i_ino(sk)) < 0) { num--; break; } } mutex_unlock(&net->xdp.lock); cb->args[0] = num; return nlskb->len; } static int xsk_diag_handler_dump(struct sk_buff *nlskb, struct nlmsghdr *hdr) { struct netlink_dump_control c = { .dump = xsk_diag_dump }; int hdrlen = sizeof(struct xdp_diag_req); struct net *net = sock_net(nlskb->sk); if (nlmsg_len(hdr) < hdrlen) return -EINVAL; if (!(hdr->nlmsg_flags & NLM_F_DUMP)) return -EOPNOTSUPP; return netlink_dump_start(net->diag_nlsk, nlskb, hdr, &c); } static const struct sock_diag_handler xsk_diag_handler = { .owner = THIS_MODULE, .family = AF_XDP, .dump = xsk_diag_handler_dump, }; static int __init xsk_diag_init(void) { return sock_diag_register(&xsk_diag_handler); } static void __exit xsk_diag_exit(void) { sock_diag_unregister(&xsk_diag_handler); } module_init(xsk_diag_init); module_exit(xsk_diag_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("XDP socket monitoring via SOCK_DIAG"); MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, AF_XDP); |
| 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 | /* SPDX-License-Identifier: GPL-2.0 */ /* * fs-verity: read-only file-based authenticity protection * * This header declares the interface between the fs/verity/ support layer and * filesystems that support fs-verity. * * Copyright 2019 Google LLC */ #ifndef _LINUX_FSVERITY_H #define _LINUX_FSVERITY_H #include <linux/fs.h> #include <linux/mm.h> #include <crypto/hash_info.h> #include <crypto/sha2.h> #include <uapi/linux/fsverity.h> /* * Largest digest size among all hash algorithms supported by fs-verity. * Currently assumed to be <= size of fsverity_descriptor::root_hash. */ #define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE /* Arbitrary limit to bound the kmalloc() size. Can be changed. */ #define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384 /* Verity operations for filesystems */ struct fsverity_operations { /** * Begin enabling verity on the given file. * * @filp: a readonly file descriptor for the file * * The filesystem must do any needed filesystem-specific preparations * for enabling verity, e.g. evicting inline data. It also must return * -EBUSY if verity is already being enabled on the given file. * * i_rwsem is held for write. * * Return: 0 on success, -errno on failure */ int (*begin_enable_verity)(struct file *filp); /** * End enabling verity on the given file. * * @filp: a readonly file descriptor for the file * @desc: the verity descriptor to write, or NULL on failure * @desc_size: size of verity descriptor, or 0 on failure * @merkle_tree_size: total bytes the Merkle tree took up * * If desc == NULL, then enabling verity failed and the filesystem only * must do any necessary cleanups. Else, it must also store the given * verity descriptor to a fs-specific location associated with the inode * and do any fs-specific actions needed to mark the inode as a verity * inode, e.g. setting a bit in the on-disk inode. The filesystem is * also responsible for setting the S_VERITY flag in the VFS inode. * * i_rwsem is held for write, but it may have been dropped between * ->begin_enable_verity() and ->end_enable_verity(). * * Return: 0 on success, -errno on failure */ int (*end_enable_verity)(struct file *filp, const void *desc, size_t desc_size, u64 merkle_tree_size); /** * Get the verity descriptor of the given inode. * * @inode: an inode with the S_VERITY flag set * @buf: buffer in which to place the verity descriptor * @bufsize: size of @buf, or 0 to retrieve the size only * * If bufsize == 0, then the size of the verity descriptor is returned. * Otherwise the verity descriptor is written to 'buf' and its actual * size is returned; -ERANGE is returned if it's too large. This may be * called by multiple processes concurrently on the same inode. * * Return: the size on success, -errno on failure */ int (*get_verity_descriptor)(struct inode *inode, void *buf, size_t bufsize); /** * Read a Merkle tree page of the given inode. * * @inode: the inode * @index: 0-based index of the page within the Merkle tree * @num_ra_pages: The number of Merkle tree pages that should be * prefetched starting at @index if the page at @index * isn't already cached. Implementations may ignore this * argument; it's only a performance optimization. * * This can be called at any time on an open verity file. It may be * called by multiple processes concurrently, even with the same page. * * Note that this must retrieve a *page*, not necessarily a *block*. * * Return: the page on success, ERR_PTR() on failure */ struct page *(*read_merkle_tree_page)(struct inode *inode, pgoff_t index, unsigned long num_ra_pages); /** * Write a Merkle tree block to the given inode. * * @inode: the inode for which the Merkle tree is being built * @buf: the Merkle tree block to write * @pos: the position of the block in the Merkle tree (in bytes) * @size: the Merkle tree block size (in bytes) * * This is only called between ->begin_enable_verity() and * ->end_enable_verity(). * * Return: 0 on success, -errno on failure */ int (*write_merkle_tree_block)(struct inode *inode, const void *buf, u64 pos, unsigned int size); }; #ifdef CONFIG_FS_VERITY static inline struct fsverity_info *fsverity_get_info(const struct inode *inode) { /* * Pairs with the cmpxchg_release() in fsverity_set_info(). * I.e., another task may publish ->i_verity_info concurrently, * executing a RELEASE barrier. We need to use smp_load_acquire() here * to safely ACQUIRE the memory the other task published. */ return smp_load_acquire(&inode->i_verity_info); } /* enable.c */ int fsverity_ioctl_enable(struct file *filp, const void __user *arg); /* measure.c */ int fsverity_ioctl_measure(struct file *filp, void __user *arg); int fsverity_get_digest(struct inode *inode, u8 raw_digest[FS_VERITY_MAX_DIGEST_SIZE], u8 *alg, enum hash_algo *halg); /* open.c */ int __fsverity_file_open(struct inode *inode, struct file *filp); int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr); void __fsverity_cleanup_inode(struct inode *inode); /** * fsverity_cleanup_inode() - free the inode's verity info, if present * @inode: an inode being evicted * * Filesystems must call this on inode eviction to free ->i_verity_info. */ static inline void fsverity_cleanup_inode(struct inode *inode) { if (inode->i_verity_info) __fsverity_cleanup_inode(inode); } /* read_metadata.c */ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg); /* verify.c */ bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset); void fsverity_verify_bio(struct bio *bio); void fsverity_enqueue_verify_work(struct work_struct *work); #else /* !CONFIG_FS_VERITY */ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode) { return NULL; } /* enable.c */ static inline int fsverity_ioctl_enable(struct file *filp, const void __user *arg) { return -EOPNOTSUPP; } /* measure.c */ static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg) { return -EOPNOTSUPP; } static inline int fsverity_get_digest(struct inode *inode, u8 raw_digest[FS_VERITY_MAX_DIGEST_SIZE], u8 *alg, enum hash_algo *halg) { /* * fsverity is not enabled in the kernel configuration, so always report * that the file doesn't have fsverity enabled (digest size 0). */ return 0; } /* open.c */ static inline int __fsverity_file_open(struct inode *inode, struct file *filp) { return -EOPNOTSUPP; } static inline int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr) { return -EOPNOTSUPP; } static inline void fsverity_cleanup_inode(struct inode *inode) { } /* read_metadata.c */ static inline int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg) { return -EOPNOTSUPP; } /* verify.c */ static inline bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset) { WARN_ON_ONCE(1); return false; } static inline void fsverity_verify_bio(struct bio *bio) { WARN_ON_ONCE(1); } static inline void fsverity_enqueue_verify_work(struct work_struct *work) { WARN_ON_ONCE(1); } #endif /* !CONFIG_FS_VERITY */ static inline bool fsverity_verify_folio(struct folio *folio) { return fsverity_verify_blocks(folio, folio_size(folio), 0); } static inline bool fsverity_verify_page(struct page *page) { return fsverity_verify_blocks(page_folio(page), PAGE_SIZE, 0); } /** * fsverity_active() - do reads from the inode need to go through fs-verity? * @inode: inode to check * * This checks whether ->i_verity_info has been set. * * Filesystems call this from ->readahead() to check whether the pages need to * be verified or not. Don't use IS_VERITY() for this purpose; it's subject to * a race condition where the file is being read concurrently with * FS_IOC_ENABLE_VERITY completing. (S_VERITY is set before ->i_verity_info.) * * Return: true if reads need to go through fs-verity, otherwise false */ static inline bool fsverity_active(const struct inode *inode) { return fsverity_get_info(inode) != NULL; } /** * fsverity_file_open() - prepare to open a verity file * @inode: the inode being opened * @filp: the struct file being set up * * When opening a verity file, deny the open if it is for writing. Otherwise, * set up the inode's ->i_verity_info if not already done. * * When combined with fscrypt, this must be called after fscrypt_file_open(). * Otherwise, we won't have the key set up to decrypt the verity metadata. * * Return: 0 on success, -errno on failure */ static inline int fsverity_file_open(struct inode *inode, struct file *filp) { if (IS_VERITY(inode)) return __fsverity_file_open(inode, filp); return 0; } /** * fsverity_prepare_setattr() - prepare to change a verity inode's attributes * @dentry: dentry through which the inode is being changed * @attr: attributes to change * * Verity files are immutable, so deny truncates. This isn't covered by the * open-time check because sys_truncate() takes a path, not a file descriptor. * * Return: 0 on success, -errno on failure */ static inline int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr) { if (IS_VERITY(d_inode(dentry))) return __fsverity_prepare_setattr(dentry, attr); return 0; } #endif /* _LINUX_FSVERITY_H */ |
| 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 | // SPDX-License-Identifier: GPL-2.0-only /* * HIDPP protocol for Logitech receivers * * Copyright (c) 2011 Logitech (c) * Copyright (c) 2012-2013 Google (c) * Copyright (c) 2013-2014 Red Hat Inc. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/device.h> #include <linux/input.h> #include <linux/usb.h> #include <linux/hid.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/clock.h> #include <linux/kfifo.h> #include <linux/input/mt.h> #include <linux/workqueue.h> #include <linux/atomic.h> #include <linux/fixp-arith.h> #include <linux/unaligned.h> #include "usbhid/usbhid.h" #include "hid-ids.h" MODULE_DESCRIPTION("Support for Logitech devices relying on the HID++ specification"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Benjamin Tissoires <benjamin.tissoires@gmail.com>"); MODULE_AUTHOR("Nestor Lopez Casado <nlopezcasad@logitech.com>"); MODULE_AUTHOR("Bastien Nocera <hadess@hadess.net>"); static bool disable_tap_to_click; module_param(disable_tap_to_click, bool, 0644); MODULE_PARM_DESC(disable_tap_to_click, "Disable Tap-To-Click mode reporting for touchpads (only on the K400 currently)."); /* Define a non-zero software ID to identify our own requests */ #define LINUX_KERNEL_SW_ID 0x01 #define REPORT_ID_HIDPP_SHORT 0x10 #define REPORT_ID_HIDPP_LONG 0x11 #define REPORT_ID_HIDPP_VERY_LONG 0x12 #define HIDPP_REPORT_SHORT_LENGTH 7 #define HIDPP_REPORT_LONG_LENGTH 20 #define HIDPP_REPORT_VERY_LONG_MAX_LENGTH 64 #define HIDPP_REPORT_SHORT_SUPPORTED BIT(0) #define HIDPP_REPORT_LONG_SUPPORTED BIT(1) #define HIDPP_REPORT_VERY_LONG_SUPPORTED BIT(2) #define HIDPP_SUB_ID_CONSUMER_VENDOR_KEYS 0x03 #define HIDPP_SUB_ID_ROLLER 0x05 #define HIDPP_SUB_ID_MOUSE_EXTRA_BTNS 0x06 #define HIDPP_SUB_ID_USER_IFACE_EVENT 0x08 #define HIDPP_USER_IFACE_EVENT_ENCRYPTION_KEY_LOST BIT(5) #define HIDPP_QUIRK_CLASS_WTP BIT(0) #define HIDPP_QUIRK_CLASS_M560 BIT(1) #define HIDPP_QUIRK_CLASS_K400 BIT(2) #define HIDPP_QUIRK_CLASS_G920 BIT(3) #define HIDPP_QUIRK_CLASS_K750 BIT(4) /* bits 2..20 are reserved for classes */ /* #define HIDPP_QUIRK_CONNECT_EVENTS BIT(21) disabled */ #define HIDPP_QUIRK_WTP_PHYSICAL_BUTTONS BIT(22) #define HIDPP_QUIRK_DELAYED_INIT BIT(23) #define HIDPP_QUIRK_FORCE_OUTPUT_REPORTS BIT(24) #define HIDPP_QUIRK_HIDPP_WHEELS BIT(25) #define HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS BIT(26) #define HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS BIT(27) #define HIDPP_QUIRK_HI_RES_SCROLL_1P0 BIT(28) #define HIDPP_QUIRK_WIRELESS_STATUS BIT(29) /* These are just aliases for now */ #define HIDPP_QUIRK_KBD_SCROLL_WHEEL HIDPP_QUIRK_HIDPP_WHEELS #define HIDPP_QUIRK_KBD_ZOOM_WHEEL HIDPP_QUIRK_HIDPP_WHEELS /* Convenience constant to check for any high-res support. */ #define HIDPP_CAPABILITY_HI_RES_SCROLL (HIDPP_CAPABILITY_HIDPP10_FAST_SCROLL | \ HIDPP_CAPABILITY_HIDPP20_HI_RES_SCROLL | \ HIDPP_CAPABILITY_HIDPP20_HI_RES_WHEEL) #define HIDPP_CAPABILITY_HIDPP10_BATTERY BIT(0) #define HIDPP_CAPABILITY_HIDPP20_BATTERY BIT(1) #define HIDPP_CAPABILITY_BATTERY_MILEAGE BIT(2) #define HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS BIT(3) #define HIDPP_CAPABILITY_BATTERY_VOLTAGE BIT(4) #define HIDPP_CAPABILITY_BATTERY_PERCENTAGE BIT(5) #define HIDPP_CAPABILITY_UNIFIED_BATTERY BIT(6) #define HIDPP_CAPABILITY_HIDPP20_HI_RES_WHEEL BIT(7) #define HIDPP_CAPABILITY_HIDPP20_HI_RES_SCROLL BIT(8) #define HIDPP_CAPABILITY_HIDPP10_FAST_SCROLL BIT(9) #define HIDPP_CAPABILITY_ADC_MEASUREMENT BIT(10) #define lg_map_key_clear(c) hid_map_usage_clear(hi, usage, bit, max, EV_KEY, (c)) /* * There are two hidpp protocols in use, the first version hidpp10 is known * as register access protocol or RAP, the second version hidpp20 is known as * feature access protocol or FAP * * Most older devices (including the Unifying usb receiver) use the RAP protocol * where as most newer devices use the FAP protocol. Both protocols are * compatible with the underlying transport, which could be usb, Unifiying, or * bluetooth. The message lengths are defined by the hid vendor specific report * descriptor for the HIDPP_SHORT report type (total message lenth 7 bytes) and * the HIDPP_LONG report type (total message length 20 bytes) * * The RAP protocol uses both report types, whereas the FAP only uses HIDPP_LONG * messages. The Unifying receiver itself responds to RAP messages (device index * is 0xFF for the receiver), and all messages (short or long) with a device * index between 1 and 6 are passed untouched to the corresponding paired * Unifying device. * * The paired device can be RAP or FAP, it will receive the message untouched * from the Unifiying receiver. */ struct fap { u8 feature_index; u8 funcindex_clientid; u8 params[HIDPP_REPORT_VERY_LONG_MAX_LENGTH - 4U]; }; struct rap { u8 sub_id; u8 reg_address; u8 params[HIDPP_REPORT_VERY_LONG_MAX_LENGTH - 4U]; }; struct hidpp_report { u8 report_id; u8 device_index; union { struct fap fap; struct rap rap; u8 rawbytes[sizeof(struct fap)]; }; } __packed; struct hidpp_battery { u8 feature_index; u8 solar_feature_index; u8 voltage_feature_index; u8 adc_measurement_feature_index; struct power_supply_desc desc; struct power_supply *ps; char name[64]; int status; int capacity; int level; int voltage; int charge_type; bool online; u8 supported_levels_1004; }; /** * struct hidpp_scroll_counter - Utility class for processing high-resolution * scroll events. * @dev: the input device for which events should be reported. * @wheel_multiplier: the scalar multiplier to be applied to each wheel event * @remainder: counts the number of high-resolution units moved since the last * low-resolution event (REL_WHEEL or REL_HWHEEL) was sent. Should * only be used by class methods. * @direction: direction of last movement (1 or -1) * @last_time: last event time, used to reset remainder after inactivity */ struct hidpp_scroll_counter { int wheel_multiplier; int remainder; int direction; unsigned long long last_time; }; struct hidpp_device { struct hid_device *hid_dev; struct input_dev *input; struct mutex send_mutex; void *send_receive_buf; char *name; /* will never be NULL and should not be freed */ wait_queue_head_t wait; int very_long_report_length; bool answer_available; u8 protocol_major; u8 protocol_minor; void *private_data; struct work_struct work; struct kfifo delayed_work_fifo; struct input_dev *delayed_input; unsigned long quirks; unsigned long capabilities; u8 supported_reports; struct hidpp_battery battery; struct hidpp_scroll_counter vertical_wheel_counter; u8 wireless_feature_index; bool connected_once; }; /* HID++ 1.0 error codes */ #define HIDPP_ERROR 0x8f #define HIDPP_ERROR_SUCCESS 0x00 #define HIDPP_ERROR_INVALID_SUBID 0x01 #define HIDPP_ERROR_INVALID_ADRESS 0x02 #define HIDPP_ERROR_INVALID_VALUE 0x03 #define HIDPP_ERROR_CONNECT_FAIL 0x04 #define HIDPP_ERROR_TOO_MANY_DEVICES 0x05 #define HIDPP_ERROR_ALREADY_EXISTS 0x06 #define HIDPP_ERROR_BUSY 0x07 #define HIDPP_ERROR_UNKNOWN_DEVICE 0x08 #define HIDPP_ERROR_RESOURCE_ERROR 0x09 #define HIDPP_ERROR_REQUEST_UNAVAILABLE 0x0a #define HIDPP_ERROR_INVALID_PARAM_VALUE 0x0b #define HIDPP_ERROR_WRONG_PIN_CODE 0x0c /* HID++ 2.0 error codes */ #define HIDPP20_ERROR_NO_ERROR 0x00 #define HIDPP20_ERROR_UNKNOWN 0x01 #define HIDPP20_ERROR_INVALID_ARGS 0x02 #define HIDPP20_ERROR_OUT_OF_RANGE 0x03 #define HIDPP20_ERROR_HW_ERROR 0x04 #define HIDPP20_ERROR_NOT_ALLOWED 0x05 #define HIDPP20_ERROR_INVALID_FEATURE_INDEX 0x06 #define HIDPP20_ERROR_INVALID_FUNCTION_ID 0x07 #define HIDPP20_ERROR_BUSY 0x08 #define HIDPP20_ERROR_UNSUPPORTED 0x09 #define HIDPP20_ERROR 0xff static int __hidpp_send_report(struct hid_device *hdev, struct hidpp_report *hidpp_report) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); int fields_count, ret; switch (hidpp_report->report_id) { case REPORT_ID_HIDPP_SHORT: fields_count = HIDPP_REPORT_SHORT_LENGTH; break; case REPORT_ID_HIDPP_LONG: fields_count = HIDPP_REPORT_LONG_LENGTH; break; case REPORT_ID_HIDPP_VERY_LONG: fields_count = hidpp->very_long_report_length; break; default: return -ENODEV; } /* * set the device_index as the receiver, it will be overwritten by * hid_hw_request if needed */ hidpp_report->device_index = 0xff; if (hidpp->quirks & HIDPP_QUIRK_FORCE_OUTPUT_REPORTS) { ret = hid_hw_output_report(hdev, (u8 *)hidpp_report, fields_count); } else { ret = hid_hw_raw_request(hdev, hidpp_report->report_id, (u8 *)hidpp_report, fields_count, HID_OUTPUT_REPORT, HID_REQ_SET_REPORT); } return ret == fields_count ? 0 : -1; } /* * Effectively send the message to the device, waiting for its answer. * * Must be called with hidpp->send_mutex locked * * Same return protocol than hidpp_send_message_sync(): * - success on 0 * - negative error means transport error * - positive value means protocol error */ static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp, struct hidpp_report *message, struct hidpp_report *response) { int ret; __must_hold(&hidpp->send_mutex); hidpp->send_receive_buf = response; hidpp->answer_available = false; /* * So that we can later validate the answer when it arrives * in hidpp_raw_event */ *response = *message; ret = __hidpp_send_report(hidpp->hid_dev, message); if (ret) { dbg_hid("__hidpp_send_report returned err: %d\n", ret); memset(response, 0, sizeof(struct hidpp_report)); return ret; } if (!wait_event_timeout(hidpp->wait, hidpp->answer_available, 5*HZ)) { dbg_hid("%s:timeout waiting for response\n", __func__); memset(response, 0, sizeof(struct hidpp_report)); return -ETIMEDOUT; } if (response->report_id == REPORT_ID_HIDPP_SHORT && response->rap.sub_id == HIDPP_ERROR) { ret = response->rap.params[1]; dbg_hid("%s:got hidpp error %02X\n", __func__, ret); return ret; } if ((response->report_id == REPORT_ID_HIDPP_LONG || response->report_id == REPORT_ID_HIDPP_VERY_LONG) && response->fap.feature_index == HIDPP20_ERROR) { ret = response->fap.params[1]; dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret); return ret; } return 0; } /* * hidpp_send_message_sync() returns 0 in case of success, and something else * in case of a failure. * * See __do_hidpp_send_message_sync() for a detailed explanation of the returned * value. */ static int hidpp_send_message_sync(struct hidpp_device *hidpp, struct hidpp_report *message, struct hidpp_report *response) { int ret; int max_retries = 3; mutex_lock(&hidpp->send_mutex); do { ret = __do_hidpp_send_message_sync(hidpp, message, response); if (ret != HIDPP20_ERROR_BUSY) break; dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret); } while (--max_retries); mutex_unlock(&hidpp->send_mutex); return ret; } /* * hidpp_send_fap_command_sync() returns 0 in case of success, and something else * in case of a failure. * * See __do_hidpp_send_message_sync() for a detailed explanation of the returned * value. */ static int hidpp_send_fap_command_sync(struct hidpp_device *hidpp, u8 feat_index, u8 funcindex_clientid, u8 *params, int param_count, struct hidpp_report *response) { struct hidpp_report *message; int ret; if (param_count > sizeof(message->fap.params)) { hid_dbg(hidpp->hid_dev, "Invalid number of parameters passed to command (%d != %llu)\n", param_count, (unsigned long long) sizeof(message->fap.params)); return -EINVAL; } message = kzalloc(sizeof(struct hidpp_report), GFP_KERNEL); if (!message) return -ENOMEM; if (param_count > (HIDPP_REPORT_LONG_LENGTH - 4)) message->report_id = REPORT_ID_HIDPP_VERY_LONG; else message->report_id = REPORT_ID_HIDPP_LONG; message->fap.feature_index = feat_index; message->fap.funcindex_clientid = funcindex_clientid | LINUX_KERNEL_SW_ID; memcpy(&message->fap.params, params, param_count); ret = hidpp_send_message_sync(hidpp, message, response); kfree(message); return ret; } /* * hidpp_send_rap_command_sync() returns 0 in case of success, and something else * in case of a failure. * * See __do_hidpp_send_message_sync() for a detailed explanation of the returned * value. */ static int hidpp_send_rap_command_sync(struct hidpp_device *hidpp_dev, u8 report_id, u8 sub_id, u8 reg_address, u8 *params, int param_count, struct hidpp_report *response) { struct hidpp_report *message; int ret, max_count; /* Send as long report if short reports are not supported. */ if (report_id == REPORT_ID_HIDPP_SHORT && !(hidpp_dev->supported_reports & HIDPP_REPORT_SHORT_SUPPORTED)) report_id = REPORT_ID_HIDPP_LONG; switch (report_id) { case REPORT_ID_HIDPP_SHORT: max_count = HIDPP_REPORT_SHORT_LENGTH - 4; break; case REPORT_ID_HIDPP_LONG: max_count = HIDPP_REPORT_LONG_LENGTH - 4; break; case REPORT_ID_HIDPP_VERY_LONG: max_count = hidpp_dev->very_long_report_length - 4; break; default: return -EINVAL; } if (param_count > max_count) return -EINVAL; message = kzalloc(sizeof(struct hidpp_report), GFP_KERNEL); if (!message) return -ENOMEM; message->report_id = report_id; message->rap.sub_id = sub_id; message->rap.reg_address = reg_address; memcpy(&message->rap.params, params, param_count); ret = hidpp_send_message_sync(hidpp_dev, message, response); kfree(message); return ret; } static inline bool hidpp_match_answer(struct hidpp_report *question, struct hidpp_report *answer) { return (answer->fap.feature_index == question->fap.feature_index) && (answer->fap.funcindex_clientid == question->fap.funcindex_clientid); } static inline bool hidpp_match_error(struct hidpp_report *question, struct hidpp_report *answer) { return ((answer->rap.sub_id == HIDPP_ERROR) || (answer->fap.feature_index == HIDPP20_ERROR)) && (answer->fap.funcindex_clientid == question->fap.feature_index) && (answer->fap.params[0] == question->fap.funcindex_clientid); } static inline bool hidpp_report_is_connect_event(struct hidpp_device *hidpp, struct hidpp_report *report) { return (hidpp->wireless_feature_index && (report->fap.feature_index == hidpp->wireless_feature_index)) || ((report->report_id == REPORT_ID_HIDPP_SHORT) && (report->rap.sub_id == 0x41)); } /* * hidpp_prefix_name() prefixes the current given name with "Logitech ". */ static void hidpp_prefix_name(char **name, int name_length) { #define PREFIX_LENGTH 9 /* "Logitech " */ int new_length; char *new_name; if (name_length > PREFIX_LENGTH && strncmp(*name, "Logitech ", PREFIX_LENGTH) == 0) /* The prefix has is already in the name */ return; new_length = PREFIX_LENGTH + name_length; new_name = kzalloc(new_length, GFP_KERNEL); if (!new_name) return; snprintf(new_name, new_length, "Logitech %s", *name); kfree(*name); *name = new_name; } /* * Updates the USB wireless_status based on whether the headset * is turned on and reachable. */ static void hidpp_update_usb_wireless_status(struct hidpp_device *hidpp) { struct hid_device *hdev = hidpp->hid_dev; struct usb_interface *intf; if (!(hidpp->quirks & HIDPP_QUIRK_WIRELESS_STATUS)) return; if (!hid_is_usb(hdev)) return; intf = to_usb_interface(hdev->dev.parent); usb_set_wireless_status(intf, hidpp->battery.online ? USB_WIRELESS_STATUS_CONNECTED : USB_WIRELESS_STATUS_DISCONNECTED); } /** * hidpp_scroll_counter_handle_scroll() - Send high- and low-resolution scroll * events given a high-resolution wheel * movement. * @input_dev: Pointer to the input device * @counter: a hid_scroll_counter struct describing the wheel. * @hi_res_value: the movement of the wheel, in the mouse's high-resolution * units. * * Given a high-resolution movement, this function converts the movement into * fractions of 120 and emits high-resolution scroll events for the input * device. It also uses the multiplier from &struct hid_scroll_counter to * emit low-resolution scroll events when appropriate for * backwards-compatibility with userspace input libraries. */ static void hidpp_scroll_counter_handle_scroll(struct input_dev *input_dev, struct hidpp_scroll_counter *counter, int hi_res_value) { int low_res_value, remainder, direction; unsigned long long now, previous; hi_res_value = hi_res_value * 120/counter->wheel_multiplier; input_report_rel(input_dev, REL_WHEEL_HI_RES, hi_res_value); remainder = counter->remainder; direction = hi_res_value > 0 ? 1 : -1; now = sched_clock(); previous = counter->last_time; counter->last_time = now; /* * Reset the remainder after a period of inactivity or when the * direction changes. This prevents the REL_WHEEL emulation point * from sliding for devices that don't always provide the same * number of movements per detent. */ if (now - previous > 1000000000 || direction != counter->direction) remainder = 0; counter->direction = direction; remainder += hi_res_value; /* Some wheels will rest 7/8ths of a detent from the previous detent * after slow movement, so we want the threshold for low-res events to * be in the middle between two detents (e.g. after 4/8ths) as * opposed to on the detents themselves (8/8ths). */ if (abs(remainder) >= 60) { /* Add (or subtract) 1 because we want to trigger when the wheel * is half-way to the next detent (i.e. scroll 1 detent after a * 1/2 detent movement, 2 detents after a 1 1/2 detent movement, * etc.). */ low_res_value = remainder / 120; if (low_res_value == 0) low_res_value = (hi_res_value > 0 ? 1 : -1); input_report_rel(input_dev, REL_WHEEL, low_res_value); remainder -= low_res_value * 120; } counter->remainder = remainder; } /* -------------------------------------------------------------------------- */ /* HIDP++ 1.0 commands */ /* -------------------------------------------------------------------------- */ #define HIDPP_SET_REGISTER 0x80 #define HIDPP_GET_REGISTER 0x81 #define HIDPP_SET_LONG_REGISTER 0x82 #define HIDPP_GET_LONG_REGISTER 0x83 /** * hidpp10_set_register - Modify a HID++ 1.0 register. * @hidpp_dev: the device to set the register on. * @register_address: the address of the register to modify. * @byte: the byte of the register to modify. Should be less than 3. * @mask: mask of the bits to modify * @value: new values for the bits in mask * Return: 0 if successful, otherwise a negative error code. */ static int hidpp10_set_register(struct hidpp_device *hidpp_dev, u8 register_address, u8 byte, u8 mask, u8 value) { struct hidpp_report response; int ret; u8 params[3] = { 0 }; ret = hidpp_send_rap_command_sync(hidpp_dev, REPORT_ID_HIDPP_SHORT, HIDPP_GET_REGISTER, register_address, NULL, 0, &response); if (ret) return ret; memcpy(params, response.rap.params, 3); params[byte] &= ~mask; params[byte] |= value & mask; return hidpp_send_rap_command_sync(hidpp_dev, REPORT_ID_HIDPP_SHORT, HIDPP_SET_REGISTER, register_address, params, 3, &response); } #define HIDPP_REG_ENABLE_REPORTS 0x00 #define HIDPP_ENABLE_CONSUMER_REPORT BIT(0) #define HIDPP_ENABLE_WHEEL_REPORT BIT(2) #define HIDPP_ENABLE_MOUSE_EXTRA_BTN_REPORT BIT(3) #define HIDPP_ENABLE_BAT_REPORT BIT(4) #define HIDPP_ENABLE_HWHEEL_REPORT BIT(5) static int hidpp10_enable_battery_reporting(struct hidpp_device *hidpp_dev) { return hidpp10_set_register(hidpp_dev, HIDPP_REG_ENABLE_REPORTS, 0, HIDPP_ENABLE_BAT_REPORT, HIDPP_ENABLE_BAT_REPORT); } #define HIDPP_REG_FEATURES 0x01 #define HIDPP_ENABLE_SPECIAL_BUTTON_FUNC BIT(1) #define HIDPP_ENABLE_FAST_SCROLL BIT(6) /* On HID++ 1.0 devices, high-res scroll was called "scrolling acceleration". */ static int hidpp10_enable_scrolling_acceleration(struct hidpp_device *hidpp_dev) { return hidpp10_set_register(hidpp_dev, HIDPP_REG_FEATURES, 0, HIDPP_ENABLE_FAST_SCROLL, HIDPP_ENABLE_FAST_SCROLL); } #define HIDPP_REG_BATTERY_STATUS 0x07 static int hidpp10_battery_status_map_level(u8 param) { int level; switch (param) { case 1 ... 2: level = POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL; break; case 3 ... 4: level = POWER_SUPPLY_CAPACITY_LEVEL_LOW; break; case 5 ... 6: level = POWER_SUPPLY_CAPACITY_LEVEL_NORMAL; break; case 7: level = POWER_SUPPLY_CAPACITY_LEVEL_HIGH; break; default: level = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN; } return level; } static int hidpp10_battery_status_map_status(u8 param) { int status; switch (param) { case 0x00: /* discharging (in use) */ status = POWER_SUPPLY_STATUS_DISCHARGING; break; case 0x21: /* (standard) charging */ case 0x24: /* fast charging */ case 0x25: /* slow charging */ status = POWER_SUPPLY_STATUS_CHARGING; break; case 0x26: /* topping charge */ case 0x22: /* charge complete */ status = POWER_SUPPLY_STATUS_FULL; break; case 0x20: /* unknown */ status = POWER_SUPPLY_STATUS_UNKNOWN; break; /* * 0x01...0x1F = reserved (not charging) * 0x23 = charging error * 0x27..0xff = reserved */ default: status = POWER_SUPPLY_STATUS_NOT_CHARGING; break; } return status; } static int hidpp10_query_battery_status(struct hidpp_device *hidpp) { struct hidpp_report response; int ret, status; ret = hidpp_send_rap_command_sync(hidpp, REPORT_ID_HIDPP_SHORT, HIDPP_GET_REGISTER, HIDPP_REG_BATTERY_STATUS, NULL, 0, &response); if (ret) return ret; hidpp->battery.level = hidpp10_battery_status_map_level(response.rap.params[0]); status = hidpp10_battery_status_map_status(response.rap.params[1]); hidpp->battery.status = status; /* the capacity is only available when discharging or full */ hidpp->battery.online = status == POWER_SUPPLY_STATUS_DISCHARGING || status == POWER_SUPPLY_STATUS_FULL; return 0; } #define HIDPP_REG_BATTERY_MILEAGE 0x0D static int hidpp10_battery_mileage_map_status(u8 param) { int status; switch (param >> 6) { case 0x00: /* discharging (in use) */ status = POWER_SUPPLY_STATUS_DISCHARGING; break; case 0x01: /* charging */ status = POWER_SUPPLY_STATUS_CHARGING; break; case 0x02: /* charge complete */ status = POWER_SUPPLY_STATUS_FULL; break; /* * 0x03 = charging error */ default: status = POWER_SUPPLY_STATUS_NOT_CHARGING; break; } return status; } static int hidpp10_query_battery_mileage(struct hidpp_device *hidpp) { struct hidpp_report response; int ret, status; ret = hidpp_send_rap_command_sync(hidpp, REPORT_ID_HIDPP_SHORT, HIDPP_GET_REGISTER, HIDPP_REG_BATTERY_MILEAGE, NULL, 0, &response); if (ret) return ret; hidpp->battery.capacity = response.rap.params[0]; status = hidpp10_battery_mileage_map_status(response.rap.params[2]); hidpp->battery.status = status; /* the capacity is only available when discharging or full */ hidpp->battery.online = status == POWER_SUPPLY_STATUS_DISCHARGING || status == POWER_SUPPLY_STATUS_FULL; return 0; } static int hidpp10_battery_event(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; int status, capacity, level; bool changed; if (report->report_id != REPORT_ID_HIDPP_SHORT) return 0; switch (report->rap.sub_id) { case HIDPP_REG_BATTERY_STATUS: capacity = hidpp->battery.capacity; level = hidpp10_battery_status_map_level(report->rawbytes[1]); status = hidpp10_battery_status_map_status(report->rawbytes[2]); break; case HIDPP_REG_BATTERY_MILEAGE: capacity = report->rap.params[0]; level = hidpp->battery.level; status = hidpp10_battery_mileage_map_status(report->rawbytes[3]); break; default: return 0; } changed = capacity != hidpp->battery.capacity || level != hidpp->battery.level || status != hidpp->battery.status; /* the capacity is only available when discharging or full */ hidpp->battery.online = status == POWER_SUPPLY_STATUS_DISCHARGING || status == POWER_SUPPLY_STATUS_FULL; if (changed) { hidpp->battery.level = level; hidpp->battery.status = status; if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); } return 0; } #define HIDPP_REG_PAIRING_INFORMATION 0xB5 #define HIDPP_EXTENDED_PAIRING 0x30 #define HIDPP_DEVICE_NAME 0x40 static char *hidpp_unifying_get_name(struct hidpp_device *hidpp_dev) { struct hidpp_report response; int ret; u8 params[1] = { HIDPP_DEVICE_NAME }; char *name; int len; ret = hidpp_send_rap_command_sync(hidpp_dev, REPORT_ID_HIDPP_SHORT, HIDPP_GET_LONG_REGISTER, HIDPP_REG_PAIRING_INFORMATION, params, 1, &response); if (ret) return NULL; len = response.rap.params[1]; if (2 + len > sizeof(response.rap.params)) return NULL; if (len < 4) /* logitech devices are usually at least Xddd */ return NULL; name = kzalloc(len + 1, GFP_KERNEL); if (!name) return NULL; memcpy(name, &response.rap.params[2], len); /* include the terminating '\0' */ hidpp_prefix_name(&name, len + 1); return name; } static int hidpp_unifying_get_serial(struct hidpp_device *hidpp, u32 *serial) { struct hidpp_report response; int ret; u8 params[1] = { HIDPP_EXTENDED_PAIRING }; ret = hidpp_send_rap_command_sync(hidpp, REPORT_ID_HIDPP_SHORT, HIDPP_GET_LONG_REGISTER, HIDPP_REG_PAIRING_INFORMATION, params, 1, &response); if (ret) return ret; /* * We don't care about LE or BE, we will output it as a string * with %4phD, so we need to keep the order. */ *serial = *((u32 *)&response.rap.params[1]); return 0; } static int hidpp_unifying_init(struct hidpp_device *hidpp) { struct hid_device *hdev = hidpp->hid_dev; const char *name; u32 serial; int ret; ret = hidpp_unifying_get_serial(hidpp, &serial); if (ret) return ret; snprintf(hdev->uniq, sizeof(hdev->uniq), "%4phD", &serial); dbg_hid("HID++ Unifying: Got serial: %s\n", hdev->uniq); name = hidpp_unifying_get_name(hidpp); if (!name) return -EIO; snprintf(hdev->name, sizeof(hdev->name), "%s", name); dbg_hid("HID++ Unifying: Got name: %s\n", name); kfree(name); return 0; } /* -------------------------------------------------------------------------- */ /* 0x0000: Root */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_ROOT 0x0000 #define HIDPP_PAGE_ROOT_IDX 0x00 #define CMD_ROOT_GET_FEATURE 0x00 #define CMD_ROOT_GET_PROTOCOL_VERSION 0x10 static int hidpp_root_get_feature(struct hidpp_device *hidpp, u16 feature, u8 *feature_index) { struct hidpp_report response; int ret; u8 params[2] = { feature >> 8, feature & 0x00FF }; ret = hidpp_send_fap_command_sync(hidpp, HIDPP_PAGE_ROOT_IDX, CMD_ROOT_GET_FEATURE, params, 2, &response); if (ret) return ret; if (response.fap.params[0] == 0) return -ENOENT; *feature_index = response.fap.params[0]; return ret; } static int hidpp_root_get_protocol_version(struct hidpp_device *hidpp) { const u8 ping_byte = 0x5a; u8 ping_data[3] = { 0, 0, ping_byte }; struct hidpp_report response; int ret; ret = hidpp_send_rap_command_sync(hidpp, REPORT_ID_HIDPP_SHORT, HIDPP_PAGE_ROOT_IDX, CMD_ROOT_GET_PROTOCOL_VERSION | LINUX_KERNEL_SW_ID, ping_data, sizeof(ping_data), &response); if (ret == HIDPP_ERROR_INVALID_SUBID) { hidpp->protocol_major = 1; hidpp->protocol_minor = 0; goto print_version; } /* the device might not be connected */ if (ret == HIDPP_ERROR_RESOURCE_ERROR) return -EIO; if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; if (response.rap.params[2] != ping_byte) { hid_err(hidpp->hid_dev, "%s: ping mismatch 0x%02x != 0x%02x\n", __func__, response.rap.params[2], ping_byte); return -EPROTO; } hidpp->protocol_major = response.rap.params[0]; hidpp->protocol_minor = response.rap.params[1]; print_version: if (!hidpp->connected_once) { hid_info(hidpp->hid_dev, "HID++ %u.%u device connected.\n", hidpp->protocol_major, hidpp->protocol_minor); hidpp->connected_once = true; } else hid_dbg(hidpp->hid_dev, "HID++ %u.%u device connected.\n", hidpp->protocol_major, hidpp->protocol_minor); return 0; } /* -------------------------------------------------------------------------- */ /* 0x0003: Device Information */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_DEVICE_INFORMATION 0x0003 #define CMD_GET_DEVICE_INFO 0x00 static int hidpp_get_serial(struct hidpp_device *hidpp, u32 *serial) { struct hidpp_report response; u8 feature_index; int ret; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_DEVICE_INFORMATION, &feature_index); if (ret) return ret; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_GET_DEVICE_INFO, NULL, 0, &response); if (ret) return ret; /* See hidpp_unifying_get_serial() */ *serial = *((u32 *)&response.rap.params[1]); return 0; } static int hidpp_serial_init(struct hidpp_device *hidpp) { struct hid_device *hdev = hidpp->hid_dev; u32 serial; int ret; ret = hidpp_get_serial(hidpp, &serial); if (ret) return ret; snprintf(hdev->uniq, sizeof(hdev->uniq), "%4phD", &serial); dbg_hid("HID++ DeviceInformation: Got serial: %s\n", hdev->uniq); return 0; } /* -------------------------------------------------------------------------- */ /* 0x0005: GetDeviceNameType */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_GET_DEVICE_NAME_TYPE 0x0005 #define CMD_GET_DEVICE_NAME_TYPE_GET_COUNT 0x00 #define CMD_GET_DEVICE_NAME_TYPE_GET_DEVICE_NAME 0x10 #define CMD_GET_DEVICE_NAME_TYPE_GET_TYPE 0x20 static int hidpp_devicenametype_get_count(struct hidpp_device *hidpp, u8 feature_index, u8 *nameLength) { struct hidpp_report response; int ret; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_GET_DEVICE_NAME_TYPE_GET_COUNT, NULL, 0, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; *nameLength = response.fap.params[0]; return ret; } static int hidpp_devicenametype_get_device_name(struct hidpp_device *hidpp, u8 feature_index, u8 char_index, char *device_name, int len_buf) { struct hidpp_report response; int ret, i; int count; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_GET_DEVICE_NAME_TYPE_GET_DEVICE_NAME, &char_index, 1, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; switch (response.report_id) { case REPORT_ID_HIDPP_VERY_LONG: count = hidpp->very_long_report_length - 4; break; case REPORT_ID_HIDPP_LONG: count = HIDPP_REPORT_LONG_LENGTH - 4; break; case REPORT_ID_HIDPP_SHORT: count = HIDPP_REPORT_SHORT_LENGTH - 4; break; default: return -EPROTO; } if (len_buf < count) count = len_buf; for (i = 0; i < count; i++) device_name[i] = response.fap.params[i]; return count; } static char *hidpp_get_device_name(struct hidpp_device *hidpp) { u8 feature_index; u8 __name_length; char *name; unsigned index = 0; int ret; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_GET_DEVICE_NAME_TYPE, &feature_index); if (ret) return NULL; ret = hidpp_devicenametype_get_count(hidpp, feature_index, &__name_length); if (ret) return NULL; name = kzalloc(__name_length + 1, GFP_KERNEL); if (!name) return NULL; while (index < __name_length) { ret = hidpp_devicenametype_get_device_name(hidpp, feature_index, index, name + index, __name_length - index); if (ret <= 0) { kfree(name); return NULL; } index += ret; } /* include the terminating '\0' */ hidpp_prefix_name(&name, __name_length + 1); return name; } /* -------------------------------------------------------------------------- */ /* 0x1000: Battery level status */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_BATTERY_LEVEL_STATUS 0x1000 #define CMD_BATTERY_LEVEL_STATUS_GET_BATTERY_LEVEL_STATUS 0x00 #define CMD_BATTERY_LEVEL_STATUS_GET_BATTERY_CAPABILITY 0x10 #define EVENT_BATTERY_LEVEL_STATUS_BROADCAST 0x00 #define FLAG_BATTERY_LEVEL_DISABLE_OSD BIT(0) #define FLAG_BATTERY_LEVEL_MILEAGE BIT(1) #define FLAG_BATTERY_LEVEL_RECHARGEABLE BIT(2) static int hidpp_map_battery_level(int capacity) { if (capacity < 11) return POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL; /* * The spec says this should be < 31 but some devices report 30 * with brand new batteries and Windows reports 30 as "Good". */ else if (capacity < 30) return POWER_SUPPLY_CAPACITY_LEVEL_LOW; else if (capacity < 81) return POWER_SUPPLY_CAPACITY_LEVEL_NORMAL; return POWER_SUPPLY_CAPACITY_LEVEL_FULL; } static int hidpp20_batterylevel_map_status_capacity(u8 data[3], int *capacity, int *next_capacity, int *level) { int status; *capacity = data[0]; *next_capacity = data[1]; *level = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN; /* When discharging, we can rely on the device reported capacity. * For all other states the device reports 0 (unknown). */ switch (data[2]) { case 0: /* discharging (in use) */ status = POWER_SUPPLY_STATUS_DISCHARGING; *level = hidpp_map_battery_level(*capacity); break; case 1: /* recharging */ status = POWER_SUPPLY_STATUS_CHARGING; break; case 2: /* charge in final stage */ status = POWER_SUPPLY_STATUS_CHARGING; break; case 3: /* charge complete */ status = POWER_SUPPLY_STATUS_FULL; *level = POWER_SUPPLY_CAPACITY_LEVEL_FULL; *capacity = 100; break; case 4: /* recharging below optimal speed */ status = POWER_SUPPLY_STATUS_CHARGING; break; /* 5 = invalid battery type 6 = thermal error 7 = other charging error */ default: status = POWER_SUPPLY_STATUS_NOT_CHARGING; break; } return status; } static int hidpp20_batterylevel_get_battery_capacity(struct hidpp_device *hidpp, u8 feature_index, int *status, int *capacity, int *next_capacity, int *level) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_BATTERY_LEVEL_STATUS_GET_BATTERY_LEVEL_STATUS, NULL, 0, &response); /* Ignore these intermittent errors */ if (ret == HIDPP_ERROR_RESOURCE_ERROR) return -EIO; if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; *status = hidpp20_batterylevel_map_status_capacity(params, capacity, next_capacity, level); return 0; } static int hidpp20_batterylevel_get_battery_info(struct hidpp_device *hidpp, u8 feature_index) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; unsigned int level_count, flags; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_BATTERY_LEVEL_STATUS_GET_BATTERY_CAPABILITY, NULL, 0, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; level_count = params[0]; flags = params[1]; if (level_count < 10 || !(flags & FLAG_BATTERY_LEVEL_MILEAGE)) hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS; else hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_MILEAGE; return 0; } static int hidpp20_query_battery_info_1000(struct hidpp_device *hidpp) { int ret; int status, capacity, next_capacity, level; if (hidpp->battery.feature_index == 0xff) { ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_BATTERY_LEVEL_STATUS, &hidpp->battery.feature_index); if (ret) return ret; } ret = hidpp20_batterylevel_get_battery_capacity(hidpp, hidpp->battery.feature_index, &status, &capacity, &next_capacity, &level); if (ret) return ret; ret = hidpp20_batterylevel_get_battery_info(hidpp, hidpp->battery.feature_index); if (ret) return ret; hidpp->battery.status = status; hidpp->battery.capacity = capacity; hidpp->battery.level = level; /* the capacity is only available when discharging or full */ hidpp->battery.online = status == POWER_SUPPLY_STATUS_DISCHARGING || status == POWER_SUPPLY_STATUS_FULL; return 0; } static int hidpp20_battery_event_1000(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; int status, capacity, next_capacity, level; bool changed; if (report->fap.feature_index != hidpp->battery.feature_index || report->fap.funcindex_clientid != EVENT_BATTERY_LEVEL_STATUS_BROADCAST) return 0; status = hidpp20_batterylevel_map_status_capacity(report->fap.params, &capacity, &next_capacity, &level); /* the capacity is only available when discharging or full */ hidpp->battery.online = status == POWER_SUPPLY_STATUS_DISCHARGING || status == POWER_SUPPLY_STATUS_FULL; changed = capacity != hidpp->battery.capacity || level != hidpp->battery.level || status != hidpp->battery.status; if (changed) { hidpp->battery.level = level; hidpp->battery.capacity = capacity; hidpp->battery.status = status; if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); } return 0; } /* -------------------------------------------------------------------------- */ /* 0x1001: Battery voltage */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_BATTERY_VOLTAGE 0x1001 #define CMD_BATTERY_VOLTAGE_GET_BATTERY_VOLTAGE 0x00 #define EVENT_BATTERY_VOLTAGE_STATUS_BROADCAST 0x00 static int hidpp20_battery_map_status_voltage(u8 data[3], int *voltage, int *level, int *charge_type) { int status; long flags = (long) data[2]; *level = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN; if (flags & 0x80) switch (flags & 0x07) { case 0: status = POWER_SUPPLY_STATUS_CHARGING; break; case 1: status = POWER_SUPPLY_STATUS_FULL; *level = POWER_SUPPLY_CAPACITY_LEVEL_FULL; break; case 2: status = POWER_SUPPLY_STATUS_NOT_CHARGING; break; default: status = POWER_SUPPLY_STATUS_UNKNOWN; break; } else status = POWER_SUPPLY_STATUS_DISCHARGING; *charge_type = POWER_SUPPLY_CHARGE_TYPE_STANDARD; if (test_bit(3, &flags)) { *charge_type = POWER_SUPPLY_CHARGE_TYPE_FAST; } if (test_bit(4, &flags)) { *charge_type = POWER_SUPPLY_CHARGE_TYPE_TRICKLE; } if (test_bit(5, &flags)) { *level = POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL; } *voltage = get_unaligned_be16(data); return status; } static int hidpp20_battery_get_battery_voltage(struct hidpp_device *hidpp, u8 feature_index, int *status, int *voltage, int *level, int *charge_type) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_BATTERY_VOLTAGE_GET_BATTERY_VOLTAGE, NULL, 0, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_VOLTAGE; *status = hidpp20_battery_map_status_voltage(params, voltage, level, charge_type); return 0; } static int hidpp20_map_battery_capacity(struct hid_device *hid_dev, int voltage) { /* NB: This voltage curve doesn't necessarily map perfectly to all * devices that implement the BATTERY_VOLTAGE feature. This is because * there are a few devices that use different battery technology. */ static const int voltages[100] = { 4186, 4156, 4143, 4133, 4122, 4113, 4103, 4094, 4086, 4075, 4067, 4059, 4051, 4043, 4035, 4027, 4019, 4011, 4003, 3997, 3989, 3983, 3976, 3969, 3961, 3955, 3949, 3942, 3935, 3929, 3922, 3916, 3909, 3902, 3896, 3890, 3883, 3877, 3870, 3865, 3859, 3853, 3848, 3842, 3837, 3833, 3828, 3824, 3819, 3815, 3811, 3808, 3804, 3800, 3797, 3793, 3790, 3787, 3784, 3781, 3778, 3775, 3772, 3770, 3767, 3764, 3762, 3759, 3757, 3754, 3751, 3748, 3744, 3741, 3737, 3734, 3730, 3726, 3724, 3720, 3717, 3714, 3710, 3706, 3702, 3697, 3693, 3688, 3683, 3677, 3671, 3666, 3662, 3658, 3654, 3646, 3633, 3612, 3579, 3537 }; int i; if (unlikely(voltage < 3500 || voltage >= 5000)) hid_warn_once(hid_dev, "%s: possibly using the wrong voltage curve\n", __func__); for (i = 0; i < ARRAY_SIZE(voltages); i++) { if (voltage >= voltages[i]) return ARRAY_SIZE(voltages) - i; } return 0; } static int hidpp20_query_battery_voltage_info(struct hidpp_device *hidpp) { int ret; int status, voltage, level, charge_type; if (hidpp->battery.voltage_feature_index == 0xff) { ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_BATTERY_VOLTAGE, &hidpp->battery.voltage_feature_index); if (ret) return ret; } ret = hidpp20_battery_get_battery_voltage(hidpp, hidpp->battery.voltage_feature_index, &status, &voltage, &level, &charge_type); if (ret) return ret; hidpp->battery.status = status; hidpp->battery.voltage = voltage; hidpp->battery.capacity = hidpp20_map_battery_capacity(hidpp->hid_dev, voltage); hidpp->battery.level = level; hidpp->battery.charge_type = charge_type; hidpp->battery.online = status != POWER_SUPPLY_STATUS_NOT_CHARGING; return 0; } static int hidpp20_battery_voltage_event(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; int status, voltage, level, charge_type; if (report->fap.feature_index != hidpp->battery.voltage_feature_index || report->fap.funcindex_clientid != EVENT_BATTERY_VOLTAGE_STATUS_BROADCAST) return 0; status = hidpp20_battery_map_status_voltage(report->fap.params, &voltage, &level, &charge_type); hidpp->battery.online = status != POWER_SUPPLY_STATUS_NOT_CHARGING; if (voltage != hidpp->battery.voltage || status != hidpp->battery.status) { hidpp->battery.voltage = voltage; hidpp->battery.capacity = hidpp20_map_battery_capacity(hidpp->hid_dev, voltage); hidpp->battery.status = status; hidpp->battery.level = level; hidpp->battery.charge_type = charge_type; if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); } return 0; } /* -------------------------------------------------------------------------- */ /* 0x1004: Unified battery */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_UNIFIED_BATTERY 0x1004 #define CMD_UNIFIED_BATTERY_GET_CAPABILITIES 0x00 #define CMD_UNIFIED_BATTERY_GET_STATUS 0x10 #define EVENT_UNIFIED_BATTERY_STATUS_EVENT 0x00 #define FLAG_UNIFIED_BATTERY_LEVEL_CRITICAL BIT(0) #define FLAG_UNIFIED_BATTERY_LEVEL_LOW BIT(1) #define FLAG_UNIFIED_BATTERY_LEVEL_GOOD BIT(2) #define FLAG_UNIFIED_BATTERY_LEVEL_FULL BIT(3) #define FLAG_UNIFIED_BATTERY_FLAGS_RECHARGEABLE BIT(0) #define FLAG_UNIFIED_BATTERY_FLAGS_STATE_OF_CHARGE BIT(1) static int hidpp20_unifiedbattery_get_capabilities(struct hidpp_device *hidpp, u8 feature_index) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS || hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_PERCENTAGE) { /* we have already set the device capabilities, so let's skip */ return 0; } ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_UNIFIED_BATTERY_GET_CAPABILITIES, NULL, 0, &response); /* Ignore these intermittent errors */ if (ret == HIDPP_ERROR_RESOURCE_ERROR) return -EIO; if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; /* * If the device supports state of charge (battery percentage) we won't * export the battery level information. there are 4 possible battery * levels and they all are optional, this means that the device might * not support any of them, we are just better off with the battery * percentage. */ if (params[1] & FLAG_UNIFIED_BATTERY_FLAGS_STATE_OF_CHARGE) { hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_PERCENTAGE; hidpp->battery.supported_levels_1004 = 0; } else { hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS; hidpp->battery.supported_levels_1004 = params[0]; } return 0; } static int hidpp20_unifiedbattery_map_status(struct hidpp_device *hidpp, u8 charging_status, u8 external_power_status) { int status; switch (charging_status) { case 0: /* discharging */ status = POWER_SUPPLY_STATUS_DISCHARGING; break; case 1: /* charging */ case 2: /* charging slow */ status = POWER_SUPPLY_STATUS_CHARGING; break; case 3: /* complete */ status = POWER_SUPPLY_STATUS_FULL; break; case 4: /* error */ status = POWER_SUPPLY_STATUS_NOT_CHARGING; hid_info(hidpp->hid_dev, "%s: charging error", hidpp->name); break; default: status = POWER_SUPPLY_STATUS_NOT_CHARGING; break; } return status; } static int hidpp20_unifiedbattery_map_level(struct hidpp_device *hidpp, u8 battery_level) { /* cler unsupported level bits */ battery_level &= hidpp->battery.supported_levels_1004; if (battery_level & FLAG_UNIFIED_BATTERY_LEVEL_FULL) return POWER_SUPPLY_CAPACITY_LEVEL_FULL; else if (battery_level & FLAG_UNIFIED_BATTERY_LEVEL_GOOD) return POWER_SUPPLY_CAPACITY_LEVEL_NORMAL; else if (battery_level & FLAG_UNIFIED_BATTERY_LEVEL_LOW) return POWER_SUPPLY_CAPACITY_LEVEL_LOW; else if (battery_level & FLAG_UNIFIED_BATTERY_LEVEL_CRITICAL) return POWER_SUPPLY_CAPACITY_LEVEL_CRITICAL; return POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN; } static int hidpp20_unifiedbattery_get_status(struct hidpp_device *hidpp, u8 feature_index, u8 *state_of_charge, int *status, int *level) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_UNIFIED_BATTERY_GET_STATUS, NULL, 0, &response); /* Ignore these intermittent errors */ if (ret == HIDPP_ERROR_RESOURCE_ERROR) return -EIO; if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; *state_of_charge = params[0]; *status = hidpp20_unifiedbattery_map_status(hidpp, params[2], params[3]); *level = hidpp20_unifiedbattery_map_level(hidpp, params[1]); return 0; } static int hidpp20_query_battery_info_1004(struct hidpp_device *hidpp) { int ret; u8 state_of_charge; int status, level; if (hidpp->battery.feature_index == 0xff) { ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_UNIFIED_BATTERY, &hidpp->battery.feature_index); if (ret) return ret; } ret = hidpp20_unifiedbattery_get_capabilities(hidpp, hidpp->battery.feature_index); if (ret) return ret; ret = hidpp20_unifiedbattery_get_status(hidpp, hidpp->battery.feature_index, &state_of_charge, &status, &level); if (ret) return ret; hidpp->capabilities |= HIDPP_CAPABILITY_UNIFIED_BATTERY; hidpp->battery.capacity = state_of_charge; hidpp->battery.status = status; hidpp->battery.level = level; hidpp->battery.online = true; return 0; } static int hidpp20_battery_event_1004(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; u8 *params = (u8 *)report->fap.params; int state_of_charge, status, level; bool changed; if (report->fap.feature_index != hidpp->battery.feature_index || report->fap.funcindex_clientid != EVENT_UNIFIED_BATTERY_STATUS_EVENT) return 0; state_of_charge = params[0]; status = hidpp20_unifiedbattery_map_status(hidpp, params[2], params[3]); level = hidpp20_unifiedbattery_map_level(hidpp, params[1]); changed = status != hidpp->battery.status || (state_of_charge != hidpp->battery.capacity && hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_PERCENTAGE) || (level != hidpp->battery.level && hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS); if (changed) { hidpp->battery.capacity = state_of_charge; hidpp->battery.status = status; hidpp->battery.level = level; if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); } return 0; } /* -------------------------------------------------------------------------- */ /* Battery feature helpers */ /* -------------------------------------------------------------------------- */ static enum power_supply_property hidpp_battery_props[] = { POWER_SUPPLY_PROP_ONLINE, POWER_SUPPLY_PROP_STATUS, POWER_SUPPLY_PROP_SCOPE, POWER_SUPPLY_PROP_MODEL_NAME, POWER_SUPPLY_PROP_MANUFACTURER, POWER_SUPPLY_PROP_SERIAL_NUMBER, 0, /* placeholder for POWER_SUPPLY_PROP_CAPACITY, */ 0, /* placeholder for POWER_SUPPLY_PROP_CAPACITY_LEVEL, */ 0, /* placeholder for POWER_SUPPLY_PROP_VOLTAGE_NOW, */ }; static int hidpp_battery_get_property(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { struct hidpp_device *hidpp = power_supply_get_drvdata(psy); int ret = 0; switch(psp) { case POWER_SUPPLY_PROP_STATUS: val->intval = hidpp->battery.status; break; case POWER_SUPPLY_PROP_CAPACITY: val->intval = hidpp->battery.capacity; break; case POWER_SUPPLY_PROP_CAPACITY_LEVEL: val->intval = hidpp->battery.level; break; case POWER_SUPPLY_PROP_SCOPE: val->intval = POWER_SUPPLY_SCOPE_DEVICE; break; case POWER_SUPPLY_PROP_ONLINE: val->intval = hidpp->battery.online; break; case POWER_SUPPLY_PROP_MODEL_NAME: if (!strncmp(hidpp->name, "Logitech ", 9)) val->strval = hidpp->name + 9; else val->strval = hidpp->name; break; case POWER_SUPPLY_PROP_MANUFACTURER: val->strval = "Logitech"; break; case POWER_SUPPLY_PROP_SERIAL_NUMBER: val->strval = hidpp->hid_dev->uniq; break; case POWER_SUPPLY_PROP_VOLTAGE_NOW: /* hardware reports voltage in mV. sysfs expects uV */ val->intval = hidpp->battery.voltage * 1000; break; case POWER_SUPPLY_PROP_CHARGE_TYPE: val->intval = hidpp->battery.charge_type; break; default: ret = -EINVAL; break; } return ret; } /* -------------------------------------------------------------------------- */ /* 0x1d4b: Wireless device status */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_WIRELESS_DEVICE_STATUS 0x1d4b static int hidpp_get_wireless_feature_index(struct hidpp_device *hidpp, u8 *feature_index) { return hidpp_root_get_feature(hidpp, HIDPP_PAGE_WIRELESS_DEVICE_STATUS, feature_index); } /* -------------------------------------------------------------------------- */ /* 0x1f20: ADC measurement */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_ADC_MEASUREMENT 0x1f20 #define CMD_ADC_MEASUREMENT_GET_ADC_MEASUREMENT 0x00 #define EVENT_ADC_MEASUREMENT_STATUS_BROADCAST 0x00 static int hidpp20_map_adc_measurement_1f20_capacity(struct hid_device *hid_dev, int voltage) { /* NB: This voltage curve doesn't necessarily map perfectly to all * devices that implement the ADC_MEASUREMENT feature. This is because * there are a few devices that use different battery technology. * * Adapted from: * https://github.com/Sapd/HeadsetControl/blob/acd972be0468e039b93aae81221f20a54d2d60f7/src/devices/logitech_g633_g933_935.c#L44-L52 */ static const int voltages[100] = { 4030, 4024, 4018, 4011, 4003, 3994, 3985, 3975, 3963, 3951, 3937, 3922, 3907, 3893, 3880, 3868, 3857, 3846, 3837, 3828, 3820, 3812, 3805, 3798, 3791, 3785, 3779, 3773, 3768, 3762, 3757, 3752, 3747, 3742, 3738, 3733, 3729, 3724, 3720, 3716, 3712, 3708, 3704, 3700, 3696, 3692, 3688, 3685, 3681, 3677, 3674, 3670, 3667, 3663, 3660, 3657, 3653, 3650, 3646, 3643, 3640, 3637, 3633, 3630, 3627, 3624, 3620, 3617, 3614, 3611, 3608, 3604, 3601, 3598, 3595, 3592, 3589, 3585, 3582, 3579, 3576, 3573, 3569, 3566, 3563, 3560, 3556, 3553, 3550, 3546, 3543, 3539, 3536, 3532, 3529, 3525, 3499, 3466, 3433, 3399, }; int i; if (voltage == 0) return 0; if (unlikely(voltage < 3400 || voltage >= 5000)) hid_warn_once(hid_dev, "%s: possibly using the wrong voltage curve\n", __func__); for (i = 0; i < ARRAY_SIZE(voltages); i++) { if (voltage >= voltages[i]) return ARRAY_SIZE(voltages) - i; } return 0; } static int hidpp20_map_adc_measurement_1f20(u8 data[3], int *voltage) { int status; u8 flags; flags = data[2]; switch (flags) { case 0x01: status = POWER_SUPPLY_STATUS_DISCHARGING; break; case 0x03: status = POWER_SUPPLY_STATUS_CHARGING; break; case 0x07: status = POWER_SUPPLY_STATUS_FULL; break; case 0x0F: default: status = POWER_SUPPLY_STATUS_UNKNOWN; break; } *voltage = get_unaligned_be16(data); dbg_hid("Parsed 1f20 data as flag 0x%02x voltage %dmV\n", flags, *voltage); return status; } /* Return value is whether the device is online */ static bool hidpp20_get_adc_measurement_1f20(struct hidpp_device *hidpp, u8 feature_index, int *status, int *voltage) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; *status = POWER_SUPPLY_STATUS_UNKNOWN; *voltage = 0; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_ADC_MEASUREMENT_GET_ADC_MEASUREMENT, NULL, 0, &response); if (ret > 0) { hid_dbg(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return false; } *status = hidpp20_map_adc_measurement_1f20(params, voltage); return true; } static int hidpp20_query_adc_measurement_info_1f20(struct hidpp_device *hidpp) { if (hidpp->battery.adc_measurement_feature_index == 0xff) { int ret; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_ADC_MEASUREMENT, &hidpp->battery.adc_measurement_feature_index); if (ret) return ret; hidpp->capabilities |= HIDPP_CAPABILITY_ADC_MEASUREMENT; } hidpp->battery.online = hidpp20_get_adc_measurement_1f20(hidpp, hidpp->battery.adc_measurement_feature_index, &hidpp->battery.status, &hidpp->battery.voltage); hidpp->battery.capacity = hidpp20_map_adc_measurement_1f20_capacity(hidpp->hid_dev, hidpp->battery.voltage); hidpp_update_usb_wireless_status(hidpp); return 0; } static int hidpp20_adc_measurement_event_1f20(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; int status, voltage; if (report->fap.feature_index != hidpp->battery.adc_measurement_feature_index || report->fap.funcindex_clientid != EVENT_ADC_MEASUREMENT_STATUS_BROADCAST) return 0; status = hidpp20_map_adc_measurement_1f20(report->fap.params, &voltage); hidpp->battery.online = status != POWER_SUPPLY_STATUS_UNKNOWN; if (voltage != hidpp->battery.voltage || status != hidpp->battery.status) { hidpp->battery.status = status; hidpp->battery.voltage = voltage; hidpp->battery.capacity = hidpp20_map_adc_measurement_1f20_capacity(hidpp->hid_dev, voltage); if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); hidpp_update_usb_wireless_status(hidpp); } return 0; } /* -------------------------------------------------------------------------- */ /* 0x2120: Hi-resolution scrolling */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_HI_RESOLUTION_SCROLLING 0x2120 #define CMD_HI_RESOLUTION_SCROLLING_SET_HIGHRES_SCROLLING_MODE 0x10 static int hidpp_hrs_set_highres_scrolling_mode(struct hidpp_device *hidpp, bool enabled, u8 *multiplier) { u8 feature_index; int ret; u8 params[1]; struct hidpp_report response; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_HI_RESOLUTION_SCROLLING, &feature_index); if (ret) return ret; params[0] = enabled ? BIT(0) : 0; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_HI_RESOLUTION_SCROLLING_SET_HIGHRES_SCROLLING_MODE, params, sizeof(params), &response); if (ret) return ret; *multiplier = response.fap.params[1]; return 0; } /* -------------------------------------------------------------------------- */ /* 0x2121: HiRes Wheel */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_HIRES_WHEEL 0x2121 #define CMD_HIRES_WHEEL_GET_WHEEL_CAPABILITY 0x00 #define CMD_HIRES_WHEEL_SET_WHEEL_MODE 0x20 static int hidpp_hrw_get_wheel_capability(struct hidpp_device *hidpp, u8 *multiplier) { u8 feature_index; int ret; struct hidpp_report response; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_HIRES_WHEEL, &feature_index); if (ret) goto return_default; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_HIRES_WHEEL_GET_WHEEL_CAPABILITY, NULL, 0, &response); if (ret) goto return_default; *multiplier = response.fap.params[0]; return 0; return_default: hid_warn(hidpp->hid_dev, "Couldn't get wheel multiplier (error %d)\n", ret); return ret; } static int hidpp_hrw_set_wheel_mode(struct hidpp_device *hidpp, bool invert, bool high_resolution, bool use_hidpp) { u8 feature_index; int ret; u8 params[1]; struct hidpp_report response; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_HIRES_WHEEL, &feature_index); if (ret) return ret; params[0] = (invert ? BIT(2) : 0) | (high_resolution ? BIT(1) : 0) | (use_hidpp ? BIT(0) : 0); return hidpp_send_fap_command_sync(hidpp, feature_index, CMD_HIRES_WHEEL_SET_WHEEL_MODE, params, sizeof(params), &response); } /* -------------------------------------------------------------------------- */ /* 0x4301: Solar Keyboard */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_SOLAR_KEYBOARD 0x4301 #define CMD_SOLAR_SET_LIGHT_MEASURE 0x00 #define EVENT_SOLAR_BATTERY_BROADCAST 0x00 #define EVENT_SOLAR_BATTERY_LIGHT_MEASURE 0x10 #define EVENT_SOLAR_CHECK_LIGHT_BUTTON 0x20 static int hidpp_solar_request_battery_event(struct hidpp_device *hidpp) { struct hidpp_report response; u8 params[2] = { 1, 1 }; int ret; if (hidpp->battery.feature_index == 0xff) { ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_SOLAR_KEYBOARD, &hidpp->battery.solar_feature_index); if (ret) return ret; } ret = hidpp_send_fap_command_sync(hidpp, hidpp->battery.solar_feature_index, CMD_SOLAR_SET_LIGHT_MEASURE, params, 2, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_MILEAGE; return 0; } static int hidpp_solar_battery_event(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *report = (struct hidpp_report *)data; int capacity, lux, status; u8 function; function = report->fap.funcindex_clientid; if (report->fap.feature_index != hidpp->battery.solar_feature_index || !(function == EVENT_SOLAR_BATTERY_BROADCAST || function == EVENT_SOLAR_BATTERY_LIGHT_MEASURE || function == EVENT_SOLAR_CHECK_LIGHT_BUTTON)) return 0; capacity = report->fap.params[0]; switch (function) { case EVENT_SOLAR_BATTERY_LIGHT_MEASURE: lux = (report->fap.params[1] << 8) | report->fap.params[2]; if (lux > 200) status = POWER_SUPPLY_STATUS_CHARGING; else status = POWER_SUPPLY_STATUS_DISCHARGING; break; case EVENT_SOLAR_CHECK_LIGHT_BUTTON: default: if (capacity < hidpp->battery.capacity) status = POWER_SUPPLY_STATUS_DISCHARGING; else status = POWER_SUPPLY_STATUS_CHARGING; } if (capacity == 100) status = POWER_SUPPLY_STATUS_FULL; hidpp->battery.online = true; if (capacity != hidpp->battery.capacity || status != hidpp->battery.status) { hidpp->battery.capacity = capacity; hidpp->battery.status = status; if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); } return 0; } /* -------------------------------------------------------------------------- */ /* 0x6010: Touchpad FW items */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_TOUCHPAD_FW_ITEMS 0x6010 #define CMD_TOUCHPAD_FW_ITEMS_SET 0x10 struct hidpp_touchpad_fw_items { uint8_t presence; uint8_t desired_state; uint8_t state; uint8_t persistent; }; /* * send a set state command to the device by reading the current items->state * field. items is then filled with the current state. */ static int hidpp_touchpad_fw_items_set(struct hidpp_device *hidpp, u8 feature_index, struct hidpp_touchpad_fw_items *items) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_TOUCHPAD_FW_ITEMS_SET, &items->state, 1, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; items->presence = params[0]; items->desired_state = params[1]; items->state = params[2]; items->persistent = params[3]; return 0; } /* -------------------------------------------------------------------------- */ /* 0x6100: TouchPadRawXY */ /* -------------------------------------------------------------------------- */ #define HIDPP_PAGE_TOUCHPAD_RAW_XY 0x6100 #define CMD_TOUCHPAD_GET_RAW_INFO 0x00 #define CMD_TOUCHPAD_SET_RAW_REPORT_STATE 0x20 #define EVENT_TOUCHPAD_RAW_XY 0x00 #define TOUCHPAD_RAW_XY_ORIGIN_LOWER_LEFT 0x01 #define TOUCHPAD_RAW_XY_ORIGIN_UPPER_LEFT 0x03 struct hidpp_touchpad_raw_info { u16 x_size; u16 y_size; u8 z_range; u8 area_range; u8 timestamp_unit; u8 maxcontacts; u8 origin; u16 res; }; struct hidpp_touchpad_raw_xy_finger { u8 contact_type; u8 contact_status; u16 x; u16 y; u8 z; u8 area; u8 finger_id; }; struct hidpp_touchpad_raw_xy { u16 timestamp; struct hidpp_touchpad_raw_xy_finger fingers[2]; u8 spurious_flag; u8 end_of_frame; u8 finger_count; u8 button; }; static int hidpp_touchpad_get_raw_info(struct hidpp_device *hidpp, u8 feature_index, struct hidpp_touchpad_raw_info *raw_info) { struct hidpp_report response; int ret; u8 *params = (u8 *)response.fap.params; ret = hidpp_send_fap_command_sync(hidpp, feature_index, CMD_TOUCHPAD_GET_RAW_INFO, NULL, 0, &response); if (ret > 0) { hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } if (ret) return ret; raw_info->x_size = get_unaligned_be16(¶ms[0]); raw_info->y_size = get_unaligned_be16(¶ms[2]); raw_info->z_range = params[4]; raw_info->area_range = params[5]; raw_info->maxcontacts = params[7]; raw_info->origin = params[8]; /* res is given in unit per inch */ raw_info->res = get_unaligned_be16(¶ms[13]) * 2 / 51; return ret; } static int hidpp_touchpad_set_raw_report_state(struct hidpp_device *hidpp_dev, u8 feature_index, bool send_raw_reports, bool sensor_enhanced_settings) { struct hidpp_report response; /* * Params: * bit 0 - enable raw * bit 1 - 16bit Z, no area * bit 2 - enhanced sensitivity * bit 3 - width, height (4 bits each) instead of area * bit 4 - send raw + gestures (degrades smoothness) * remaining bits - reserved */ u8 params = send_raw_reports | (sensor_enhanced_settings << 2); return hidpp_send_fap_command_sync(hidpp_dev, feature_index, CMD_TOUCHPAD_SET_RAW_REPORT_STATE, ¶ms, 1, &response); } static void hidpp_touchpad_touch_event(u8 *data, struct hidpp_touchpad_raw_xy_finger *finger) { u8 x_m = data[0] << 2; u8 y_m = data[2] << 2; finger->x = x_m << 6 | data[1]; finger->y = y_m << 6 | data[3]; finger->contact_type = data[0] >> 6; finger->contact_status = data[2] >> 6; finger->z = data[4]; finger->area = data[5]; finger->finger_id = data[6] >> 4; } static void hidpp_touchpad_raw_xy_event(struct hidpp_device *hidpp_dev, u8 *data, struct hidpp_touchpad_raw_xy *raw_xy) { memset(raw_xy, 0, sizeof(struct hidpp_touchpad_raw_xy)); raw_xy->end_of_frame = data[8] & 0x01; raw_xy->spurious_flag = (data[8] >> 1) & 0x01; raw_xy->finger_count = data[15] & 0x0f; raw_xy->button = (data[8] >> 2) & 0x01; if (raw_xy->finger_count) { hidpp_touchpad_touch_event(&data[2], &raw_xy->fingers[0]); hidpp_touchpad_touch_event(&data[9], &raw_xy->fingers[1]); } } /* -------------------------------------------------------------------------- */ /* 0x8123: Force feedback support */ /* -------------------------------------------------------------------------- */ #define HIDPP_FF_GET_INFO 0x01 #define HIDPP_FF_RESET_ALL 0x11 #define HIDPP_FF_DOWNLOAD_EFFECT 0x21 #define HIDPP_FF_SET_EFFECT_STATE 0x31 #define HIDPP_FF_DESTROY_EFFECT 0x41 #define HIDPP_FF_GET_APERTURE 0x51 #define HIDPP_FF_SET_APERTURE 0x61 #define HIDPP_FF_GET_GLOBAL_GAINS 0x71 #define HIDPP_FF_SET_GLOBAL_GAINS 0x81 #define HIDPP_FF_EFFECT_STATE_GET 0x00 #define HIDPP_FF_EFFECT_STATE_STOP 0x01 #define HIDPP_FF_EFFECT_STATE_PLAY 0x02 #define HIDPP_FF_EFFECT_STATE_PAUSE 0x03 #define HIDPP_FF_EFFECT_CONSTANT 0x00 #define HIDPP_FF_EFFECT_PERIODIC_SINE 0x01 #define HIDPP_FF_EFFECT_PERIODIC_SQUARE 0x02 #define HIDPP_FF_EFFECT_PERIODIC_TRIANGLE 0x03 #define HIDPP_FF_EFFECT_PERIODIC_SAWTOOTHUP 0x04 #define HIDPP_FF_EFFECT_PERIODIC_SAWTOOTHDOWN 0x05 #define HIDPP_FF_EFFECT_SPRING 0x06 #define HIDPP_FF_EFFECT_DAMPER 0x07 #define HIDPP_FF_EFFECT_FRICTION 0x08 #define HIDPP_FF_EFFECT_INERTIA 0x09 #define HIDPP_FF_EFFECT_RAMP 0x0A #define HIDPP_FF_EFFECT_AUTOSTART 0x80 #define HIDPP_FF_EFFECTID_NONE -1 #define HIDPP_FF_EFFECTID_AUTOCENTER -2 #define HIDPP_AUTOCENTER_PARAMS_LENGTH 18 #define HIDPP_FF_MAX_PARAMS 20 #define HIDPP_FF_RESERVED_SLOTS 1 struct hidpp_ff_private_data { struct hidpp_device *hidpp; u8 feature_index; u8 version; u16 gain; s16 range; u8 slot_autocenter; u8 num_effects; int *effect_ids; struct workqueue_struct *wq; atomic_t workqueue_size; }; struct hidpp_ff_work_data { struct work_struct work; struct hidpp_ff_private_data *data; int effect_id; u8 command; u8 params[HIDPP_FF_MAX_PARAMS]; u8 size; }; static const signed short hidpp_ff_effects[] = { FF_CONSTANT, FF_PERIODIC, FF_SINE, FF_SQUARE, FF_SAW_UP, FF_SAW_DOWN, FF_TRIANGLE, FF_SPRING, FF_DAMPER, FF_AUTOCENTER, FF_GAIN, -1 }; static const signed short hidpp_ff_effects_v2[] = { FF_RAMP, FF_FRICTION, FF_INERTIA, -1 }; static const u8 HIDPP_FF_CONDITION_CMDS[] = { HIDPP_FF_EFFECT_SPRING, HIDPP_FF_EFFECT_FRICTION, HIDPP_FF_EFFECT_DAMPER, HIDPP_FF_EFFECT_INERTIA }; static const char *HIDPP_FF_CONDITION_NAMES[] = { "spring", "friction", "damper", "inertia" }; static u8 hidpp_ff_find_effect(struct hidpp_ff_private_data *data, int effect_id) { int i; for (i = 0; i < data->num_effects; i++) if (data->effect_ids[i] == effect_id) return i+1; return 0; } static void hidpp_ff_work_handler(struct work_struct *w) { struct hidpp_ff_work_data *wd = container_of(w, struct hidpp_ff_work_data, work); struct hidpp_ff_private_data *data = wd->data; struct hidpp_report response; u8 slot; int ret; /* add slot number if needed */ switch (wd->effect_id) { case HIDPP_FF_EFFECTID_AUTOCENTER: wd->params[0] = data->slot_autocenter; break; case HIDPP_FF_EFFECTID_NONE: /* leave slot as zero */ break; default: /* find current slot for effect */ wd->params[0] = hidpp_ff_find_effect(data, wd->effect_id); break; } /* send command and wait for reply */ ret = hidpp_send_fap_command_sync(data->hidpp, data->feature_index, wd->command, wd->params, wd->size, &response); if (ret) { hid_err(data->hidpp->hid_dev, "Failed to send command to device!\n"); goto out; } /* parse return data */ switch (wd->command) { case HIDPP_FF_DOWNLOAD_EFFECT: slot = response.fap.params[0]; if (slot > 0 && slot <= data->num_effects) { if (wd->effect_id >= 0) /* regular effect uploaded */ data->effect_ids[slot-1] = wd->effect_id; else if (wd->effect_id >= HIDPP_FF_EFFECTID_AUTOCENTER) /* autocenter spring uploaded */ data->slot_autocenter = slot; } break; case HIDPP_FF_DESTROY_EFFECT: if (wd->effect_id >= 0) /* regular effect destroyed */ data->effect_ids[wd->params[0]-1] = -1; else if (wd->effect_id >= HIDPP_FF_EFFECTID_AUTOCENTER) /* autocenter spring destroyed */ data->slot_autocenter = 0; break; case HIDPP_FF_SET_GLOBAL_GAINS: data->gain = (wd->params[0] << 8) + wd->params[1]; break; case HIDPP_FF_SET_APERTURE: data->range = (wd->params[0] << 8) + wd->params[1]; break; default: /* no action needed */ break; } out: atomic_dec(&data->workqueue_size); kfree(wd); } static int hidpp_ff_queue_work(struct hidpp_ff_private_data *data, int effect_id, u8 command, u8 *params, u8 size) { struct hidpp_ff_work_data *wd = kzalloc(sizeof(*wd), GFP_KERNEL); int s; if (!wd) return -ENOMEM; INIT_WORK(&wd->work, hidpp_ff_work_handler); wd->data = data; wd->effect_id = effect_id; wd->command = command; wd->size = size; memcpy(wd->params, params, size); s = atomic_inc_return(&data->workqueue_size); queue_work(data->wq, &wd->work); /* warn about excessive queue size */ if (s >= 20 && s % 20 == 0) hid_warn(data->hidpp->hid_dev, "Force feedback command queue contains %d commands, causing substantial delays!", s); return 0; } static int hidpp_ff_upload_effect(struct input_dev *dev, struct ff_effect *effect, struct ff_effect *old) { struct hidpp_ff_private_data *data = dev->ff->private; u8 params[20]; u8 size; int force; /* set common parameters */ params[2] = effect->replay.length >> 8; params[3] = effect->replay.length & 255; params[4] = effect->replay.delay >> 8; params[5] = effect->replay.delay & 255; switch (effect->type) { case FF_CONSTANT: force = (effect->u.constant.level * fixp_sin16((effect->direction * 360) >> 16)) >> 15; params[1] = HIDPP_FF_EFFECT_CONSTANT; params[6] = force >> 8; params[7] = force & 255; params[8] = effect->u.constant.envelope.attack_level >> 7; params[9] = effect->u.constant.envelope.attack_length >> 8; params[10] = effect->u.constant.envelope.attack_length & 255; params[11] = effect->u.constant.envelope.fade_level >> 7; params[12] = effect->u.constant.envelope.fade_length >> 8; params[13] = effect->u.constant.envelope.fade_length & 255; size = 14; dbg_hid("Uploading constant force level=%d in dir %d = %d\n", effect->u.constant.level, effect->direction, force); dbg_hid(" envelope attack=(%d, %d ms) fade=(%d, %d ms)\n", effect->u.constant.envelope.attack_level, effect->u.constant.envelope.attack_length, effect->u.constant.envelope.fade_level, effect->u.constant.envelope.fade_length); break; case FF_PERIODIC: { switch (effect->u.periodic.waveform) { case FF_SINE: params[1] = HIDPP_FF_EFFECT_PERIODIC_SINE; break; case FF_SQUARE: params[1] = HIDPP_FF_EFFECT_PERIODIC_SQUARE; break; case FF_SAW_UP: params[1] = HIDPP_FF_EFFECT_PERIODIC_SAWTOOTHUP; break; case FF_SAW_DOWN: params[1] = HIDPP_FF_EFFECT_PERIODIC_SAWTOOTHDOWN; break; case FF_TRIANGLE: params[1] = HIDPP_FF_EFFECT_PERIODIC_TRIANGLE; break; default: hid_err(data->hidpp->hid_dev, "Unexpected periodic waveform type %i!\n", effect->u.periodic.waveform); return -EINVAL; } force = (effect->u.periodic.magnitude * fixp_sin16((effect->direction * 360) >> 16)) >> 15; params[6] = effect->u.periodic.magnitude >> 8; params[7] = effect->u.periodic.magnitude & 255; params[8] = effect->u.periodic.offset >> 8; params[9] = effect->u.periodic.offset & 255; params[10] = effect->u.periodic.period >> 8; params[11] = effect->u.periodic.period & 255; params[12] = effect->u.periodic.phase >> 8; params[13] = effect->u.periodic.phase & 255; params[14] = effect->u.periodic.envelope.attack_level >> 7; params[15] = effect->u.periodic.envelope.attack_length >> 8; params[16] = effect->u.periodic.envelope.attack_length & 255; params[17] = effect->u.periodic.envelope.fade_level >> 7; params[18] = effect->u.periodic.envelope.fade_length >> 8; params[19] = effect->u.periodic.envelope.fade_length & 255; size = 20; dbg_hid("Uploading periodic force mag=%d/dir=%d, offset=%d, period=%d ms, phase=%d\n", effect->u.periodic.magnitude, effect->direction, effect->u.periodic.offset, effect->u.periodic.period, effect->u.periodic.phase); dbg_hid(" envelope attack=(%d, %d ms) fade=(%d, %d ms)\n", effect->u.periodic.envelope.attack_level, effect->u.periodic.envelope.attack_length, effect->u.periodic.envelope.fade_level, effect->u.periodic.envelope.fade_length); break; } case FF_RAMP: params[1] = HIDPP_FF_EFFECT_RAMP; force = (effect->u.ramp.start_level * fixp_sin16((effect->direction * 360) >> 16)) >> 15; params[6] = force >> 8; params[7] = force & 255; force = (effect->u.ramp.end_level * fixp_sin16((effect->direction * 360) >> 16)) >> 15; params[8] = force >> 8; params[9] = force & 255; params[10] = effect->u.ramp.envelope.attack_level >> 7; params[11] = effect->u.ramp.envelope.attack_length >> 8; params[12] = effect->u.ramp.envelope.attack_length & 255; params[13] = effect->u.ramp.envelope.fade_level >> 7; params[14] = effect->u.ramp.envelope.fade_length >> 8; params[15] = effect->u.ramp.envelope.fade_length & 255; size = 16; dbg_hid("Uploading ramp force level=%d -> %d in dir %d = %d\n", effect->u.ramp.start_level, effect->u.ramp.end_level, effect->direction, force); dbg_hid(" envelope attack=(%d, %d ms) fade=(%d, %d ms)\n", effect->u.ramp.envelope.attack_level, effect->u.ramp.envelope.attack_length, effect->u.ramp.envelope.fade_level, effect->u.ramp.envelope.fade_length); break; case FF_FRICTION: case FF_INERTIA: case FF_SPRING: case FF_DAMPER: params[1] = HIDPP_FF_CONDITION_CMDS[effect->type - FF_SPRING]; params[6] = effect->u.condition[0].left_saturation >> 9; params[7] = (effect->u.condition[0].left_saturation >> 1) & 255; params[8] = effect->u.condition[0].left_coeff >> 8; params[9] = effect->u.condition[0].left_coeff & 255; params[10] = effect->u.condition[0].deadband >> 9; params[11] = (effect->u.condition[0].deadband >> 1) & 255; params[12] = effect->u.condition[0].center >> 8; params[13] = effect->u.condition[0].center & 255; params[14] = effect->u.condition[0].right_coeff >> 8; params[15] = effect->u.condition[0].right_coeff & 255; params[16] = effect->u.condition[0].right_saturation >> 9; params[17] = (effect->u.condition[0].right_saturation >> 1) & 255; size = 18; dbg_hid("Uploading %s force left coeff=%d, left sat=%d, right coeff=%d, right sat=%d\n", HIDPP_FF_CONDITION_NAMES[effect->type - FF_SPRING], effect->u.condition[0].left_coeff, effect->u.condition[0].left_saturation, effect->u.condition[0].right_coeff, effect->u.condition[0].right_saturation); dbg_hid(" deadband=%d, center=%d\n", effect->u.condition[0].deadband, effect->u.condition[0].center); break; default: hid_err(data->hidpp->hid_dev, "Unexpected force type %i!\n", effect->type); return -EINVAL; } return hidpp_ff_queue_work(data, effect->id, HIDPP_FF_DOWNLOAD_EFFECT, params, size); } static int hidpp_ff_playback(struct input_dev *dev, int effect_id, int value) { struct hidpp_ff_private_data *data = dev->ff->private; u8 params[2]; params[1] = value ? HIDPP_FF_EFFECT_STATE_PLAY : HIDPP_FF_EFFECT_STATE_STOP; dbg_hid("St%sing playback of effect %d.\n", value?"art":"opp", effect_id); return hidpp_ff_queue_work(data, effect_id, HIDPP_FF_SET_EFFECT_STATE, params, ARRAY_SIZE(params)); } static int hidpp_ff_erase_effect(struct input_dev *dev, int effect_id) { struct hidpp_ff_private_data *data = dev->ff->private; u8 slot = 0; dbg_hid("Erasing effect %d.\n", effect_id); return hidpp_ff_queue_work(data, effect_id, HIDPP_FF_DESTROY_EFFECT, &slot, 1); } static void hidpp_ff_set_autocenter(struct input_dev *dev, u16 magnitude) { struct hidpp_ff_private_data *data = dev->ff->private; u8 params[HIDPP_AUTOCENTER_PARAMS_LENGTH]; dbg_hid("Setting autocenter to %d.\n", magnitude); /* start a standard spring effect */ params[1] = HIDPP_FF_EFFECT_SPRING | HIDPP_FF_EFFECT_AUTOSTART; /* zero delay and duration */ params[2] = params[3] = params[4] = params[5] = 0; /* set coeff to 25% of saturation */ params[8] = params[14] = magnitude >> 11; params[9] = params[15] = (magnitude >> 3) & 255; params[6] = params[16] = magnitude >> 9; params[7] = params[17] = (magnitude >> 1) & 255; /* zero deadband and center */ params[10] = params[11] = params[12] = params[13] = 0; hidpp_ff_queue_work(data, HIDPP_FF_EFFECTID_AUTOCENTER, HIDPP_FF_DOWNLOAD_EFFECT, params, ARRAY_SIZE(params)); } static void hidpp_ff_set_gain(struct input_dev *dev, u16 gain) { struct hidpp_ff_private_data *data = dev->ff->private; u8 params[4]; dbg_hid("Setting gain to %d.\n", gain); params[0] = gain >> 8; params[1] = gain & 255; params[2] = 0; /* no boost */ params[3] = 0; hidpp_ff_queue_work(data, HIDPP_FF_EFFECTID_NONE, HIDPP_FF_SET_GLOBAL_GAINS, params, ARRAY_SIZE(params)); } static ssize_t hidpp_ff_range_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hid_device *hid = to_hid_device(dev); struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list); struct input_dev *idev = hidinput->input; struct hidpp_ff_private_data *data = idev->ff->private; return scnprintf(buf, PAGE_SIZE, "%u\n", data->range); } static ssize_t hidpp_ff_range_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct hid_device *hid = to_hid_device(dev); struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list); struct input_dev *idev = hidinput->input; struct hidpp_ff_private_data *data = idev->ff->private; u8 params[2]; int range = simple_strtoul(buf, NULL, 10); range = clamp(range, 180, 900); params[0] = range >> 8; params[1] = range & 0x00FF; hidpp_ff_queue_work(data, -1, HIDPP_FF_SET_APERTURE, params, ARRAY_SIZE(params)); return count; } static DEVICE_ATTR(range, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH, hidpp_ff_range_show, hidpp_ff_range_store); static void hidpp_ff_destroy(struct ff_device *ff) { struct hidpp_ff_private_data *data = ff->private; struct hid_device *hid = data->hidpp->hid_dev; hid_info(hid, "Unloading HID++ force feedback.\n"); device_remove_file(&hid->dev, &dev_attr_range); destroy_workqueue(data->wq); kfree(data->effect_ids); } static int hidpp_ff_init(struct hidpp_device *hidpp, struct hidpp_ff_private_data *data) { struct hid_device *hid = hidpp->hid_dev; struct hid_input *hidinput; struct input_dev *dev; struct usb_device_descriptor *udesc; u16 bcdDevice; struct ff_device *ff; int error, j, num_slots = data->num_effects; u8 version; if (!hid_is_usb(hid)) { hid_err(hid, "device is not USB\n"); return -ENODEV; } if (list_empty(&hid->inputs)) { hid_err(hid, "no inputs found\n"); return -ENODEV; } hidinput = list_entry(hid->inputs.next, struct hid_input, list); dev = hidinput->input; if (!dev) { hid_err(hid, "Struct input_dev not set!\n"); return -EINVAL; } /* Get firmware release */ udesc = &(hid_to_usb_dev(hid)->descriptor); bcdDevice = le16_to_cpu(udesc->bcdDevice); version = bcdDevice & 255; /* Set supported force feedback capabilities */ for (j = 0; hidpp_ff_effects[j] >= 0; j++) set_bit(hidpp_ff_effects[j], dev->ffbit); if (version > 1) for (j = 0; hidpp_ff_effects_v2[j] >= 0; j++) set_bit(hidpp_ff_effects_v2[j], dev->ffbit); error = input_ff_create(dev, num_slots); if (error) { hid_err(dev, "Failed to create FF device!\n"); return error; } /* * Create a copy of passed data, so we can transfer memory * ownership to FF core */ data = kmemdup(data, sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; data->effect_ids = kcalloc(num_slots, sizeof(int), GFP_KERNEL); if (!data->effect_ids) { kfree(data); return -ENOMEM; } data->wq = create_singlethread_workqueue("hidpp-ff-sendqueue"); if (!data->wq) { kfree(data->effect_ids); kfree(data); return -ENOMEM; } data->hidpp = hidpp; data->version = version; for (j = 0; j < num_slots; j++) data->effect_ids[j] = -1; ff = dev->ff; ff->private = data; ff->upload = hidpp_ff_upload_effect; ff->erase = hidpp_ff_erase_effect; ff->playback = hidpp_ff_playback; ff->set_gain = hidpp_ff_set_gain; ff->set_autocenter = hidpp_ff_set_autocenter; ff->destroy = hidpp_ff_destroy; /* Create sysfs interface */ error = device_create_file(&(hidpp->hid_dev->dev), &dev_attr_range); if (error) hid_warn(hidpp->hid_dev, "Unable to create sysfs interface for \"range\", errno %d!\n", error); /* init the hardware command queue */ atomic_set(&data->workqueue_size, 0); hid_info(hid, "Force feedback support loaded (firmware release %d).\n", version); return 0; } /* ************************************************************************** */ /* */ /* Device Support */ /* */ /* ************************************************************************** */ /* -------------------------------------------------------------------------- */ /* Touchpad HID++ devices */ /* -------------------------------------------------------------------------- */ #define WTP_MANUAL_RESOLUTION 39 struct wtp_data { u16 x_size, y_size; u8 finger_count; u8 mt_feature_index; u8 button_feature_index; u8 maxcontacts; bool flip_y; unsigned int resolution; }; static int wtp_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { return -1; } static void wtp_populate_input(struct hidpp_device *hidpp, struct input_dev *input_dev) { struct wtp_data *wd = hidpp->private_data; __set_bit(EV_ABS, input_dev->evbit); __set_bit(EV_KEY, input_dev->evbit); __clear_bit(EV_REL, input_dev->evbit); __clear_bit(EV_LED, input_dev->evbit); input_set_abs_params(input_dev, ABS_MT_POSITION_X, 0, wd->x_size, 0, 0); input_abs_set_res(input_dev, ABS_MT_POSITION_X, wd->resolution); input_set_abs_params(input_dev, ABS_MT_POSITION_Y, 0, wd->y_size, 0, 0); input_abs_set_res(input_dev, ABS_MT_POSITION_Y, wd->resolution); /* Max pressure is not given by the devices, pick one */ input_set_abs_params(input_dev, ABS_MT_PRESSURE, 0, 50, 0, 0); input_set_capability(input_dev, EV_KEY, BTN_LEFT); if (hidpp->quirks & HIDPP_QUIRK_WTP_PHYSICAL_BUTTONS) input_set_capability(input_dev, EV_KEY, BTN_RIGHT); else __set_bit(INPUT_PROP_BUTTONPAD, input_dev->propbit); input_mt_init_slots(input_dev, wd->maxcontacts, INPUT_MT_POINTER | INPUT_MT_DROP_UNUSED); } static void wtp_touch_event(struct hidpp_device *hidpp, struct hidpp_touchpad_raw_xy_finger *touch_report) { struct wtp_data *wd = hidpp->private_data; int slot; if (!touch_report->finger_id || touch_report->contact_type) /* no actual data */ return; slot = input_mt_get_slot_by_key(hidpp->input, touch_report->finger_id); input_mt_slot(hidpp->input, slot); input_mt_report_slot_state(hidpp->input, MT_TOOL_FINGER, touch_report->contact_status); if (touch_report->contact_status) { input_event(hidpp->input, EV_ABS, ABS_MT_POSITION_X, touch_report->x); input_event(hidpp->input, EV_ABS, ABS_MT_POSITION_Y, wd->flip_y ? wd->y_size - touch_report->y : touch_report->y); input_event(hidpp->input, EV_ABS, ABS_MT_PRESSURE, touch_report->area); } } static void wtp_send_raw_xy_event(struct hidpp_device *hidpp, struct hidpp_touchpad_raw_xy *raw) { int i; for (i = 0; i < 2; i++) wtp_touch_event(hidpp, &(raw->fingers[i])); if (raw->end_of_frame && !(hidpp->quirks & HIDPP_QUIRK_WTP_PHYSICAL_BUTTONS)) input_event(hidpp->input, EV_KEY, BTN_LEFT, raw->button); if (raw->end_of_frame || raw->finger_count <= 2) { input_mt_sync_frame(hidpp->input); input_sync(hidpp->input); } } static int wtp_mouse_raw_xy_event(struct hidpp_device *hidpp, u8 *data) { struct wtp_data *wd = hidpp->private_data; u8 c1_area = ((data[7] & 0xf) * (data[7] & 0xf) + (data[7] >> 4) * (data[7] >> 4)) / 2; u8 c2_area = ((data[13] & 0xf) * (data[13] & 0xf) + (data[13] >> 4) * (data[13] >> 4)) / 2; struct hidpp_touchpad_raw_xy raw = { .timestamp = data[1], .fingers = { { .contact_type = 0, .contact_status = !!data[7], .x = get_unaligned_le16(&data[3]), .y = get_unaligned_le16(&data[5]), .z = c1_area, .area = c1_area, .finger_id = data[2], }, { .contact_type = 0, .contact_status = !!data[13], .x = get_unaligned_le16(&data[9]), .y = get_unaligned_le16(&data[11]), .z = c2_area, .area = c2_area, .finger_id = data[8], } }, .finger_count = wd->maxcontacts, .spurious_flag = 0, .end_of_frame = (data[0] >> 7) == 0, .button = data[0] & 0x01, }; wtp_send_raw_xy_event(hidpp, &raw); return 1; } static int wtp_raw_event(struct hid_device *hdev, u8 *data, int size) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct wtp_data *wd = hidpp->private_data; struct hidpp_report *report = (struct hidpp_report *)data; struct hidpp_touchpad_raw_xy raw; if (!wd || !hidpp->input) return 1; switch (data[0]) { case 0x02: if (size < 2) { hid_err(hdev, "Received HID report of bad size (%d)", size); return 1; } if (hidpp->quirks & HIDPP_QUIRK_WTP_PHYSICAL_BUTTONS) { input_event(hidpp->input, EV_KEY, BTN_LEFT, !!(data[1] & 0x01)); input_event(hidpp->input, EV_KEY, BTN_RIGHT, !!(data[1] & 0x02)); input_sync(hidpp->input); return 0; } else { if (size < 21) return 1; return wtp_mouse_raw_xy_event(hidpp, &data[7]); } case REPORT_ID_HIDPP_LONG: /* size is already checked in hidpp_raw_event. */ if ((report->fap.feature_index != wd->mt_feature_index) || (report->fap.funcindex_clientid != EVENT_TOUCHPAD_RAW_XY)) return 1; hidpp_touchpad_raw_xy_event(hidpp, data + 4, &raw); wtp_send_raw_xy_event(hidpp, &raw); return 0; } return 0; } static int wtp_get_config(struct hidpp_device *hidpp) { struct wtp_data *wd = hidpp->private_data; struct hidpp_touchpad_raw_info raw_info = {0}; int ret; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_TOUCHPAD_RAW_XY, &wd->mt_feature_index); if (ret) /* means that the device is not powered up */ return ret; ret = hidpp_touchpad_get_raw_info(hidpp, wd->mt_feature_index, &raw_info); if (ret) return ret; wd->x_size = raw_info.x_size; wd->y_size = raw_info.y_size; wd->maxcontacts = raw_info.maxcontacts; wd->flip_y = raw_info.origin == TOUCHPAD_RAW_XY_ORIGIN_LOWER_LEFT; wd->resolution = raw_info.res; if (!wd->resolution) wd->resolution = WTP_MANUAL_RESOLUTION; return 0; } static int wtp_allocate(struct hid_device *hdev, const struct hid_device_id *id) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct wtp_data *wd; wd = devm_kzalloc(&hdev->dev, sizeof(struct wtp_data), GFP_KERNEL); if (!wd) return -ENOMEM; hidpp->private_data = wd; return 0; }; static int wtp_connect(struct hid_device *hdev) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct wtp_data *wd = hidpp->private_data; int ret; if (!wd->x_size) { ret = wtp_get_config(hidpp); if (ret) { hid_err(hdev, "Can not get wtp config: %d\n", ret); return ret; } } return hidpp_touchpad_set_raw_report_state(hidpp, wd->mt_feature_index, true, true); } /* ------------------------------------------------------------------------- */ /* Logitech M560 devices */ /* ------------------------------------------------------------------------- */ /* * Logitech M560 protocol overview * * The Logitech M560 mouse, is designed for windows 8. When the middle and/or * the sides buttons are pressed, it sends some keyboard keys events * instead of buttons ones. * To complicate things further, the middle button keys sequence * is different from the odd press and the even press. * * forward button -> Super_R * backward button -> Super_L+'d' (press only) * middle button -> 1st time: Alt_L+SuperL+XF86TouchpadOff (press only) * 2nd time: left-click (press only) * NB: press-only means that when the button is pressed, the * KeyPress/ButtonPress and KeyRelease/ButtonRelease events are generated * together sequentially; instead when the button is released, no event is * generated ! * * With the command * 10<xx>0a 3500af03 (where <xx> is the mouse id), * the mouse reacts differently: * - it never sends a keyboard key event * - for the three mouse button it sends: * middle button press 11<xx>0a 3500af00... * side 1 button (forward) press 11<xx>0a 3500b000... * side 2 button (backward) press 11<xx>0a 3500ae00... * middle/side1/side2 button release 11<xx>0a 35000000... */ static const u8 m560_config_parameter[] = {0x00, 0xaf, 0x03}; /* how buttons are mapped in the report */ #define M560_MOUSE_BTN_LEFT 0x01 #define M560_MOUSE_BTN_RIGHT 0x02 #define M560_MOUSE_BTN_WHEEL_LEFT 0x08 #define M560_MOUSE_BTN_WHEEL_RIGHT 0x10 #define M560_SUB_ID 0x0a #define M560_BUTTON_MODE_REGISTER 0x35 static int m560_send_config_command(struct hid_device *hdev) { struct hidpp_report response; struct hidpp_device *hidpp_dev; hidpp_dev = hid_get_drvdata(hdev); return hidpp_send_rap_command_sync( hidpp_dev, REPORT_ID_HIDPP_SHORT, M560_SUB_ID, M560_BUTTON_MODE_REGISTER, (u8 *)m560_config_parameter, sizeof(m560_config_parameter), &response ); } static int m560_raw_event(struct hid_device *hdev, u8 *data, int size) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); /* sanity check */ if (!hidpp->input) { hid_err(hdev, "error in parameter\n"); return -EINVAL; } if (size < 7) { hid_err(hdev, "error in report\n"); return 0; } if (data[0] == REPORT_ID_HIDPP_LONG && data[2] == M560_SUB_ID && data[6] == 0x00) { /* * m560 mouse report for middle, forward and backward button * * data[0] = 0x11 * data[1] = device-id * data[2] = 0x0a * data[5] = 0xaf -> middle * 0xb0 -> forward * 0xae -> backward * 0x00 -> release all * data[6] = 0x00 */ switch (data[5]) { case 0xaf: input_report_key(hidpp->input, BTN_MIDDLE, 1); break; case 0xb0: input_report_key(hidpp->input, BTN_FORWARD, 1); break; case 0xae: input_report_key(hidpp->input, BTN_BACK, 1); break; case 0x00: input_report_key(hidpp->input, BTN_BACK, 0); input_report_key(hidpp->input, BTN_FORWARD, 0); input_report_key(hidpp->input, BTN_MIDDLE, 0); break; default: hid_err(hdev, "error in report\n"); return 0; } input_sync(hidpp->input); } else if (data[0] == 0x02) { /* * Logitech M560 mouse report * * data[0] = type (0x02) * data[1..2] = buttons * data[3..5] = xy * data[6] = wheel */ int v; input_report_key(hidpp->input, BTN_LEFT, !!(data[1] & M560_MOUSE_BTN_LEFT)); input_report_key(hidpp->input, BTN_RIGHT, !!(data[1] & M560_MOUSE_BTN_RIGHT)); if (data[1] & M560_MOUSE_BTN_WHEEL_LEFT) { input_report_rel(hidpp->input, REL_HWHEEL, -1); input_report_rel(hidpp->input, REL_HWHEEL_HI_RES, -120); } else if (data[1] & M560_MOUSE_BTN_WHEEL_RIGHT) { input_report_rel(hidpp->input, REL_HWHEEL, 1); input_report_rel(hidpp->input, REL_HWHEEL_HI_RES, 120); } v = sign_extend32(hid_field_extract(hdev, data + 3, 0, 12), 11); input_report_rel(hidpp->input, REL_X, v); v = sign_extend32(hid_field_extract(hdev, data + 3, 12, 12), 11); input_report_rel(hidpp->input, REL_Y, v); v = sign_extend32(data[6], 7); if (v != 0) hidpp_scroll_counter_handle_scroll(hidpp->input, &hidpp->vertical_wheel_counter, v); input_sync(hidpp->input); } return 1; } static void m560_populate_input(struct hidpp_device *hidpp, struct input_dev *input_dev) { __set_bit(EV_KEY, input_dev->evbit); __set_bit(BTN_MIDDLE, input_dev->keybit); __set_bit(BTN_RIGHT, input_dev->keybit); __set_bit(BTN_LEFT, input_dev->keybit); __set_bit(BTN_BACK, input_dev->keybit); __set_bit(BTN_FORWARD, input_dev->keybit); __set_bit(EV_REL, input_dev->evbit); __set_bit(REL_X, input_dev->relbit); __set_bit(REL_Y, input_dev->relbit); __set_bit(REL_WHEEL, input_dev->relbit); __set_bit(REL_HWHEEL, input_dev->relbit); __set_bit(REL_WHEEL_HI_RES, input_dev->relbit); __set_bit(REL_HWHEEL_HI_RES, input_dev->relbit); } static int m560_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { return -1; } /* ------------------------------------------------------------------------- */ /* Logitech K400 devices */ /* ------------------------------------------------------------------------- */ /* * The Logitech K400 keyboard has an embedded touchpad which is seen * as a mouse from the OS point of view. There is a hardware shortcut to disable * tap-to-click but the setting is not remembered accross reset, annoying some * users. * * We can toggle this feature from the host by using the feature 0x6010: * Touchpad FW items */ struct k400_private_data { u8 feature_index; }; static int k400_disable_tap_to_click(struct hidpp_device *hidpp) { struct k400_private_data *k400 = hidpp->private_data; struct hidpp_touchpad_fw_items items = {}; int ret; if (!k400->feature_index) { ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_TOUCHPAD_FW_ITEMS, &k400->feature_index); if (ret) /* means that the device is not powered up */ return ret; } ret = hidpp_touchpad_fw_items_set(hidpp, k400->feature_index, &items); if (ret) return ret; return 0; } static int k400_allocate(struct hid_device *hdev) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct k400_private_data *k400; k400 = devm_kzalloc(&hdev->dev, sizeof(struct k400_private_data), GFP_KERNEL); if (!k400) return -ENOMEM; hidpp->private_data = k400; return 0; }; static int k400_connect(struct hid_device *hdev) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!disable_tap_to_click) return 0; return k400_disable_tap_to_click(hidpp); } /* ------------------------------------------------------------------------- */ /* Logitech G920 Driving Force Racing Wheel for Xbox One */ /* ------------------------------------------------------------------------- */ #define HIDPP_PAGE_G920_FORCE_FEEDBACK 0x8123 static int g920_ff_set_autocenter(struct hidpp_device *hidpp, struct hidpp_ff_private_data *data) { struct hidpp_report response; u8 params[HIDPP_AUTOCENTER_PARAMS_LENGTH] = { [1] = HIDPP_FF_EFFECT_SPRING | HIDPP_FF_EFFECT_AUTOSTART, }; int ret; /* initialize with zero autocenter to get wheel in usable state */ dbg_hid("Setting autocenter to 0.\n"); ret = hidpp_send_fap_command_sync(hidpp, data->feature_index, HIDPP_FF_DOWNLOAD_EFFECT, params, ARRAY_SIZE(params), &response); if (ret) hid_warn(hidpp->hid_dev, "Failed to autocenter device!\n"); else data->slot_autocenter = response.fap.params[0]; return ret; } static int g920_get_config(struct hidpp_device *hidpp, struct hidpp_ff_private_data *data) { struct hidpp_report response; int ret; memset(data, 0, sizeof(*data)); /* Find feature and store for later use */ ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_G920_FORCE_FEEDBACK, &data->feature_index); if (ret) return ret; /* Read number of slots available in device */ ret = hidpp_send_fap_command_sync(hidpp, data->feature_index, HIDPP_FF_GET_INFO, NULL, 0, &response); if (ret) { if (ret < 0) return ret; hid_err(hidpp->hid_dev, "%s: received protocol error 0x%02x\n", __func__, ret); return -EPROTO; } data->num_effects = response.fap.params[0] - HIDPP_FF_RESERVED_SLOTS; /* reset all forces */ ret = hidpp_send_fap_command_sync(hidpp, data->feature_index, HIDPP_FF_RESET_ALL, NULL, 0, &response); if (ret) hid_warn(hidpp->hid_dev, "Failed to reset all forces!\n"); ret = hidpp_send_fap_command_sync(hidpp, data->feature_index, HIDPP_FF_GET_APERTURE, NULL, 0, &response); if (ret) { hid_warn(hidpp->hid_dev, "Failed to read range from device!\n"); } data->range = ret ? 900 : get_unaligned_be16(&response.fap.params[0]); /* Read the current gain values */ ret = hidpp_send_fap_command_sync(hidpp, data->feature_index, HIDPP_FF_GET_GLOBAL_GAINS, NULL, 0, &response); if (ret) hid_warn(hidpp->hid_dev, "Failed to read gain values from device!\n"); data->gain = ret ? 0xffff : get_unaligned_be16(&response.fap.params[0]); /* ignore boost value at response.fap.params[2] */ return g920_ff_set_autocenter(hidpp, data); } /* -------------------------------------------------------------------------- */ /* Logitech Dinovo Mini keyboard with builtin touchpad */ /* -------------------------------------------------------------------------- */ #define DINOVO_MINI_PRODUCT_ID 0xb30c static int lg_dinovo_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_LOGIVENDOR) return 0; switch (usage->hid & HID_USAGE) { case 0x00d: lg_map_key_clear(KEY_MEDIA); break; default: return 0; } return 1; } /* -------------------------------------------------------------------------- */ /* HID++1.0 devices which use HID++ reports for their wheels */ /* -------------------------------------------------------------------------- */ static int hidpp10_wheel_connect(struct hidpp_device *hidpp) { return hidpp10_set_register(hidpp, HIDPP_REG_ENABLE_REPORTS, 0, HIDPP_ENABLE_WHEEL_REPORT | HIDPP_ENABLE_HWHEEL_REPORT, HIDPP_ENABLE_WHEEL_REPORT | HIDPP_ENABLE_HWHEEL_REPORT); } static int hidpp10_wheel_raw_event(struct hidpp_device *hidpp, u8 *data, int size) { s8 value, hvalue; if (!hidpp->input) return -EINVAL; if (size < 7) return 0; if (data[0] != REPORT_ID_HIDPP_SHORT || data[2] != HIDPP_SUB_ID_ROLLER) return 0; value = data[3]; hvalue = data[4]; input_report_rel(hidpp->input, REL_WHEEL, value); input_report_rel(hidpp->input, REL_WHEEL_HI_RES, value * 120); input_report_rel(hidpp->input, REL_HWHEEL, hvalue); input_report_rel(hidpp->input, REL_HWHEEL_HI_RES, hvalue * 120); input_sync(hidpp->input); return 1; } static void hidpp10_wheel_populate_input(struct hidpp_device *hidpp, struct input_dev *input_dev) { __set_bit(EV_REL, input_dev->evbit); __set_bit(REL_WHEEL, input_dev->relbit); __set_bit(REL_WHEEL_HI_RES, input_dev->relbit); __set_bit(REL_HWHEEL, input_dev->relbit); __set_bit(REL_HWHEEL_HI_RES, input_dev->relbit); } /* -------------------------------------------------------------------------- */ /* HID++1.0 mice which use HID++ reports for extra mouse buttons */ /* -------------------------------------------------------------------------- */ static int hidpp10_extra_mouse_buttons_connect(struct hidpp_device *hidpp) { return hidpp10_set_register(hidpp, HIDPP_REG_ENABLE_REPORTS, 0, HIDPP_ENABLE_MOUSE_EXTRA_BTN_REPORT, HIDPP_ENABLE_MOUSE_EXTRA_BTN_REPORT); } static int hidpp10_extra_mouse_buttons_raw_event(struct hidpp_device *hidpp, u8 *data, int size) { int i; if (!hidpp->input) return -EINVAL; if (size < 7) return 0; if (data[0] != REPORT_ID_HIDPP_SHORT || data[2] != HIDPP_SUB_ID_MOUSE_EXTRA_BTNS) return 0; /* * Buttons are either delivered through the regular mouse report *or* * through the extra buttons report. At least for button 6 how it is * delivered differs per receiver firmware version. Even receivers with * the same usb-id show different behavior, so we handle both cases. */ for (i = 0; i < 8; i++) input_report_key(hidpp->input, BTN_MOUSE + i, (data[3] & (1 << i))); /* Some mice report events on button 9+, use BTN_MISC */ for (i = 0; i < 8; i++) input_report_key(hidpp->input, BTN_MISC + i, (data[4] & (1 << i))); input_sync(hidpp->input); return 1; } static void hidpp10_extra_mouse_buttons_populate_input( struct hidpp_device *hidpp, struct input_dev *input_dev) { /* BTN_MOUSE - BTN_MOUSE+7 are set already by the descriptor */ __set_bit(BTN_0, input_dev->keybit); __set_bit(BTN_1, input_dev->keybit); __set_bit(BTN_2, input_dev->keybit); __set_bit(BTN_3, input_dev->keybit); __set_bit(BTN_4, input_dev->keybit); __set_bit(BTN_5, input_dev->keybit); __set_bit(BTN_6, input_dev->keybit); __set_bit(BTN_7, input_dev->keybit); } /* -------------------------------------------------------------------------- */ /* HID++1.0 kbds which only report 0x10xx consumer usages through sub-id 0x03 */ /* -------------------------------------------------------------------------- */ /* Find the consumer-page input report desc and change Maximums to 0x107f */ static u8 *hidpp10_consumer_keys_report_fixup(struct hidpp_device *hidpp, u8 *_rdesc, unsigned int *rsize) { /* Note 0 terminated so we can use strnstr to search for this. */ static const char consumer_rdesc_start[] = { 0x05, 0x0C, /* USAGE_PAGE (Consumer Devices) */ 0x09, 0x01, /* USAGE (Consumer Control) */ 0xA1, 0x01, /* COLLECTION (Application) */ 0x85, 0x03, /* REPORT_ID = 3 */ 0x75, 0x10, /* REPORT_SIZE (16) */ 0x95, 0x02, /* REPORT_COUNT (2) */ 0x15, 0x01, /* LOGICAL_MIN (1) */ 0x26, 0x00 /* LOGICAL_MAX (... */ }; char *consumer_rdesc, *rdesc = (char *)_rdesc; unsigned int size; consumer_rdesc = strnstr(rdesc, consumer_rdesc_start, *rsize); size = *rsize - (consumer_rdesc - rdesc); if (consumer_rdesc && size >= 25) { consumer_rdesc[15] = 0x7f; consumer_rdesc[16] = 0x10; consumer_rdesc[20] = 0x7f; consumer_rdesc[21] = 0x10; } return _rdesc; } static int hidpp10_consumer_keys_connect(struct hidpp_device *hidpp) { return hidpp10_set_register(hidpp, HIDPP_REG_ENABLE_REPORTS, 0, HIDPP_ENABLE_CONSUMER_REPORT, HIDPP_ENABLE_CONSUMER_REPORT); } static int hidpp10_consumer_keys_raw_event(struct hidpp_device *hidpp, u8 *data, int size) { u8 consumer_report[5]; if (size < 7) return 0; if (data[0] != REPORT_ID_HIDPP_SHORT || data[2] != HIDPP_SUB_ID_CONSUMER_VENDOR_KEYS) return 0; /* * Build a normal consumer report (3) out of the data, this detour * is necessary to get some keyboards to report their 0x10xx usages. */ consumer_report[0] = 0x03; memcpy(&consumer_report[1], &data[3], 4); /* We are called from atomic context */ hid_report_raw_event(hidpp->hid_dev, HID_INPUT_REPORT, consumer_report, 5, 1); return 1; } /* -------------------------------------------------------------------------- */ /* High-resolution scroll wheels */ /* -------------------------------------------------------------------------- */ static int hi_res_scroll_enable(struct hidpp_device *hidpp) { int ret; u8 multiplier = 1; if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP20_HI_RES_WHEEL) { ret = hidpp_hrw_set_wheel_mode(hidpp, false, true, false); if (ret == 0) ret = hidpp_hrw_get_wheel_capability(hidpp, &multiplier); } else if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP20_HI_RES_SCROLL) { ret = hidpp_hrs_set_highres_scrolling_mode(hidpp, true, &multiplier); } else /* if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP10_FAST_SCROLL) */ { ret = hidpp10_enable_scrolling_acceleration(hidpp); multiplier = 8; } if (ret) { hid_dbg(hidpp->hid_dev, "Could not enable hi-res scrolling: %d\n", ret); return ret; } if (multiplier == 0) { hid_dbg(hidpp->hid_dev, "Invalid multiplier 0 from device, setting it to 1\n"); multiplier = 1; } hidpp->vertical_wheel_counter.wheel_multiplier = multiplier; hid_dbg(hidpp->hid_dev, "wheel multiplier = %d\n", multiplier); return 0; } static int hidpp_initialize_hires_scroll(struct hidpp_device *hidpp) { int ret; unsigned long capabilities; capabilities = hidpp->capabilities; if (hidpp->protocol_major >= 2) { u8 feature_index; ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_HIRES_WHEEL, &feature_index); if (!ret) { hidpp->capabilities |= HIDPP_CAPABILITY_HIDPP20_HI_RES_WHEEL; hid_dbg(hidpp->hid_dev, "Detected HID++ 2.0 hi-res scroll wheel\n"); return 0; } ret = hidpp_root_get_feature(hidpp, HIDPP_PAGE_HI_RESOLUTION_SCROLLING, &feature_index); if (!ret) { hidpp->capabilities |= HIDPP_CAPABILITY_HIDPP20_HI_RES_SCROLL; hid_dbg(hidpp->hid_dev, "Detected HID++ 2.0 hi-res scrolling\n"); } } else { /* We cannot detect fast scrolling support on HID++ 1.0 devices */ if (hidpp->quirks & HIDPP_QUIRK_HI_RES_SCROLL_1P0) { hidpp->capabilities |= HIDPP_CAPABILITY_HIDPP10_FAST_SCROLL; hid_dbg(hidpp->hid_dev, "Detected HID++ 1.0 fast scroll\n"); } } if (hidpp->capabilities == capabilities) hid_dbg(hidpp->hid_dev, "Did not detect HID++ hi-res scrolling hardware support\n"); return 0; } /* -------------------------------------------------------------------------- */ /* Generic HID++ devices */ /* -------------------------------------------------------------------------- */ static const u8 *hidpp_report_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *rsize) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!hidpp) return rdesc; /* For 27 MHz keyboards the quirk gets set after hid_parse. */ if (hdev->group == HID_GROUP_LOGITECH_27MHZ_DEVICE || (hidpp->quirks & HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS)) rdesc = hidpp10_consumer_keys_report_fixup(hidpp, rdesc, rsize); return rdesc; } static int hidpp_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!hidpp) return 0; if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) return wtp_input_mapping(hdev, hi, field, usage, bit, max); else if (hidpp->quirks & HIDPP_QUIRK_CLASS_M560 && field->application != HID_GD_MOUSE) return m560_input_mapping(hdev, hi, field, usage, bit, max); if (hdev->product == DINOVO_MINI_PRODUCT_ID) return lg_dinovo_input_mapping(hdev, hi, field, usage, bit, max); return 0; } static int hidpp_input_mapped(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!hidpp) return 0; /* Ensure that Logitech G920 is not given a default fuzz/flat value */ if (hidpp->quirks & HIDPP_QUIRK_CLASS_G920) { if (usage->type == EV_ABS && (usage->code == ABS_X || usage->code == ABS_Y || usage->code == ABS_Z || usage->code == ABS_RZ)) { field->application = HID_GD_MULTIAXIS; } } return 0; } static void hidpp_populate_input(struct hidpp_device *hidpp, struct input_dev *input) { hidpp->input = input; if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) wtp_populate_input(hidpp, input); else if (hidpp->quirks & HIDPP_QUIRK_CLASS_M560) m560_populate_input(hidpp, input); if (hidpp->quirks & HIDPP_QUIRK_HIDPP_WHEELS) hidpp10_wheel_populate_input(hidpp, input); if (hidpp->quirks & HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS) hidpp10_extra_mouse_buttons_populate_input(hidpp, input); } static int hidpp_input_configured(struct hid_device *hdev, struct hid_input *hidinput) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct input_dev *input = hidinput->input; if (!hidpp) return 0; hidpp_populate_input(hidpp, input); return 0; } static int hidpp_raw_hidpp_event(struct hidpp_device *hidpp, u8 *data, int size) { struct hidpp_report *question = hidpp->send_receive_buf; struct hidpp_report *answer = hidpp->send_receive_buf; struct hidpp_report *report = (struct hidpp_report *)data; int ret; /* * If the mutex is locked then we have a pending answer from a * previously sent command. */ if (unlikely(mutex_is_locked(&hidpp->send_mutex))) { /* * Check for a correct hidpp20 answer or the corresponding * error */ if (hidpp_match_answer(question, report) || hidpp_match_error(question, report)) { *answer = *report; hidpp->answer_available = true; wake_up(&hidpp->wait); /* * This was an answer to a command that this driver sent * We return 1 to hid-core to avoid forwarding the * command upstream as it has been treated by the driver */ return 1; } } if (unlikely(hidpp_report_is_connect_event(hidpp, report))) { if (schedule_work(&hidpp->work) == 0) dbg_hid("%s: connect event already queued\n", __func__); return 1; } if (hidpp->hid_dev->group == HID_GROUP_LOGITECH_27MHZ_DEVICE && data[0] == REPORT_ID_HIDPP_SHORT && data[2] == HIDPP_SUB_ID_USER_IFACE_EVENT && (data[3] & HIDPP_USER_IFACE_EVENT_ENCRYPTION_KEY_LOST)) { dev_err_ratelimited(&hidpp->hid_dev->dev, "Error the keyboard's wireless encryption key has been lost, your keyboard will not work unless you re-configure encryption.\n"); dev_err_ratelimited(&hidpp->hid_dev->dev, "See: https://gitlab.freedesktop.org/jwrdegoede/logitech-27mhz-keyboard-encryption-setup/\n"); } if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP20_BATTERY) { ret = hidpp20_battery_event_1000(hidpp, data, size); if (ret != 0) return ret; ret = hidpp20_battery_event_1004(hidpp, data, size); if (ret != 0) return ret; ret = hidpp_solar_battery_event(hidpp, data, size); if (ret != 0) return ret; ret = hidpp20_battery_voltage_event(hidpp, data, size); if (ret != 0) return ret; ret = hidpp20_adc_measurement_event_1f20(hidpp, data, size); if (ret != 0) return ret; } if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP10_BATTERY) { ret = hidpp10_battery_event(hidpp, data, size); if (ret != 0) return ret; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_WHEELS) { ret = hidpp10_wheel_raw_event(hidpp, data, size); if (ret != 0) return ret; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS) { ret = hidpp10_extra_mouse_buttons_raw_event(hidpp, data, size); if (ret != 0) return ret; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS) { ret = hidpp10_consumer_keys_raw_event(hidpp, data, size); if (ret != 0) return ret; } return 0; } static int hidpp_raw_event(struct hid_device *hdev, struct hid_report *report, u8 *data, int size) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); int ret = 0; if (!hidpp) return 0; /* Generic HID++ processing. */ switch (data[0]) { case REPORT_ID_HIDPP_VERY_LONG: if (size != hidpp->very_long_report_length) { hid_err(hdev, "received hid++ report of bad size (%d)", size); return 1; } ret = hidpp_raw_hidpp_event(hidpp, data, size); break; case REPORT_ID_HIDPP_LONG: if (size != HIDPP_REPORT_LONG_LENGTH) { hid_err(hdev, "received hid++ report of bad size (%d)", size); return 1; } ret = hidpp_raw_hidpp_event(hidpp, data, size); break; case REPORT_ID_HIDPP_SHORT: if (size != HIDPP_REPORT_SHORT_LENGTH) { hid_err(hdev, "received hid++ report of bad size (%d)", size); return 1; } ret = hidpp_raw_hidpp_event(hidpp, data, size); break; } /* If no report is available for further processing, skip calling * raw_event of subclasses. */ if (ret != 0) return ret; if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) return wtp_raw_event(hdev, data, size); else if (hidpp->quirks & HIDPP_QUIRK_CLASS_M560) return m560_raw_event(hdev, data, size); return 0; } static int hidpp_event(struct hid_device *hdev, struct hid_field *field, struct hid_usage *usage, __s32 value) { /* This function will only be called for scroll events, due to the * restriction imposed in hidpp_usages. */ struct hidpp_device *hidpp = hid_get_drvdata(hdev); struct hidpp_scroll_counter *counter; if (!hidpp) return 0; counter = &hidpp->vertical_wheel_counter; /* A scroll event may occur before the multiplier has been retrieved or * the input device set, or high-res scroll enabling may fail. In such * cases we must return early (falling back to default behaviour) to * avoid a crash in hidpp_scroll_counter_handle_scroll. */ if (!(hidpp->capabilities & HIDPP_CAPABILITY_HI_RES_SCROLL) || value == 0 || hidpp->input == NULL || counter->wheel_multiplier == 0) return 0; hidpp_scroll_counter_handle_scroll(hidpp->input, counter, value); return 1; } static int hidpp_initialize_battery(struct hidpp_device *hidpp) { static atomic_t battery_no = ATOMIC_INIT(0); struct power_supply_config cfg = { .drv_data = hidpp }; struct power_supply_desc *desc = &hidpp->battery.desc; enum power_supply_property *battery_props; struct hidpp_battery *battery; unsigned int num_battery_props; unsigned long n; int ret; if (hidpp->battery.ps) return 0; hidpp->battery.feature_index = 0xff; hidpp->battery.solar_feature_index = 0xff; hidpp->battery.voltage_feature_index = 0xff; hidpp->battery.adc_measurement_feature_index = 0xff; if (hidpp->protocol_major >= 2) { if (hidpp->quirks & HIDPP_QUIRK_CLASS_K750) ret = hidpp_solar_request_battery_event(hidpp); else { /* we only support one battery feature right now, so let's first check the ones that support battery level first and leave voltage for last */ ret = hidpp20_query_battery_info_1000(hidpp); if (ret) ret = hidpp20_query_battery_info_1004(hidpp); if (ret) ret = hidpp20_query_battery_voltage_info(hidpp); if (ret) ret = hidpp20_query_adc_measurement_info_1f20(hidpp); } if (ret) return ret; hidpp->capabilities |= HIDPP_CAPABILITY_HIDPP20_BATTERY; } else { ret = hidpp10_query_battery_status(hidpp); if (ret) { ret = hidpp10_query_battery_mileage(hidpp); if (ret) return -ENOENT; hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_MILEAGE; } else { hidpp->capabilities |= HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS; } hidpp->capabilities |= HIDPP_CAPABILITY_HIDPP10_BATTERY; } battery_props = devm_kmemdup(&hidpp->hid_dev->dev, hidpp_battery_props, sizeof(hidpp_battery_props), GFP_KERNEL); if (!battery_props) return -ENOMEM; num_battery_props = ARRAY_SIZE(hidpp_battery_props) - 3; if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_MILEAGE || hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_PERCENTAGE || hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_VOLTAGE || hidpp->capabilities & HIDPP_CAPABILITY_ADC_MEASUREMENT) battery_props[num_battery_props++] = POWER_SUPPLY_PROP_CAPACITY; if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_LEVEL_STATUS) battery_props[num_battery_props++] = POWER_SUPPLY_PROP_CAPACITY_LEVEL; if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_VOLTAGE || hidpp->capabilities & HIDPP_CAPABILITY_ADC_MEASUREMENT) battery_props[num_battery_props++] = POWER_SUPPLY_PROP_VOLTAGE_NOW; battery = &hidpp->battery; n = atomic_inc_return(&battery_no) - 1; desc->properties = battery_props; desc->num_properties = num_battery_props; desc->get_property = hidpp_battery_get_property; sprintf(battery->name, "hidpp_battery_%ld", n); desc->name = battery->name; desc->type = POWER_SUPPLY_TYPE_BATTERY; desc->use_for_apm = 0; battery->ps = devm_power_supply_register(&hidpp->hid_dev->dev, &battery->desc, &cfg); if (IS_ERR(battery->ps)) return PTR_ERR(battery->ps); power_supply_powers(battery->ps, &hidpp->hid_dev->dev); return ret; } /* Get name + serial for USB and Bluetooth HID++ devices */ static void hidpp_non_unifying_init(struct hidpp_device *hidpp) { struct hid_device *hdev = hidpp->hid_dev; char *name; /* Bluetooth devices already have their serialnr set */ if (hid_is_usb(hdev)) hidpp_serial_init(hidpp); name = hidpp_get_device_name(hidpp); if (name) { dbg_hid("HID++: Got name: %s\n", name); snprintf(hdev->name, sizeof(hdev->name), "%s", name); kfree(name); } } static int hidpp_input_open(struct input_dev *dev) { struct hid_device *hid = input_get_drvdata(dev); return hid_hw_open(hid); } static void hidpp_input_close(struct input_dev *dev) { struct hid_device *hid = input_get_drvdata(dev); hid_hw_close(hid); } static struct input_dev *hidpp_allocate_input(struct hid_device *hdev) { struct input_dev *input_dev = devm_input_allocate_device(&hdev->dev); struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!input_dev) return NULL; input_set_drvdata(input_dev, hdev); input_dev->open = hidpp_input_open; input_dev->close = hidpp_input_close; input_dev->name = hidpp->name; input_dev->phys = hdev->phys; input_dev->uniq = hdev->uniq; input_dev->id.bustype = hdev->bus; input_dev->id.vendor = hdev->vendor; input_dev->id.product = hdev->product; input_dev->id.version = hdev->version; input_dev->dev.parent = &hdev->dev; return input_dev; } static void hidpp_connect_event(struct work_struct *work) { struct hidpp_device *hidpp = container_of(work, struct hidpp_device, work); struct hid_device *hdev = hidpp->hid_dev; struct input_dev *input; char *name, *devm_name; int ret; /* Get device version to check if it is connected */ ret = hidpp_root_get_protocol_version(hidpp); if (ret) { hid_dbg(hidpp->hid_dev, "Disconnected\n"); if (hidpp->battery.ps) { hidpp->battery.online = false; hidpp->battery.status = POWER_SUPPLY_STATUS_UNKNOWN; hidpp->battery.level = POWER_SUPPLY_CAPACITY_LEVEL_UNKNOWN; power_supply_changed(hidpp->battery.ps); } return; } if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) { ret = wtp_connect(hdev); if (ret) return; } else if (hidpp->quirks & HIDPP_QUIRK_CLASS_M560) { ret = m560_send_config_command(hdev); if (ret) return; } else if (hidpp->quirks & HIDPP_QUIRK_CLASS_K400) { ret = k400_connect(hdev); if (ret) return; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_WHEELS) { ret = hidpp10_wheel_connect(hidpp); if (ret) return; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS) { ret = hidpp10_extra_mouse_buttons_connect(hidpp); if (ret) return; } if (hidpp->quirks & HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS) { ret = hidpp10_consumer_keys_connect(hidpp); if (ret) return; } if (hidpp->protocol_major >= 2) { u8 feature_index; if (!hidpp_get_wireless_feature_index(hidpp, &feature_index)) hidpp->wireless_feature_index = feature_index; } if (hidpp->name == hdev->name && hidpp->protocol_major >= 2) { name = hidpp_get_device_name(hidpp); if (name) { devm_name = devm_kasprintf(&hdev->dev, GFP_KERNEL, "%s", name); kfree(name); if (!devm_name) return; hidpp->name = devm_name; } } hidpp_initialize_battery(hidpp); if (!hid_is_usb(hidpp->hid_dev)) hidpp_initialize_hires_scroll(hidpp); /* forward current battery state */ if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP10_BATTERY) { hidpp10_enable_battery_reporting(hidpp); if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_MILEAGE) hidpp10_query_battery_mileage(hidpp); else hidpp10_query_battery_status(hidpp); } else if (hidpp->capabilities & HIDPP_CAPABILITY_HIDPP20_BATTERY) { if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_VOLTAGE) hidpp20_query_battery_voltage_info(hidpp); else if (hidpp->capabilities & HIDPP_CAPABILITY_UNIFIED_BATTERY) hidpp20_query_battery_info_1004(hidpp); else if (hidpp->capabilities & HIDPP_CAPABILITY_ADC_MEASUREMENT) hidpp20_query_adc_measurement_info_1f20(hidpp); else hidpp20_query_battery_info_1000(hidpp); } if (hidpp->battery.ps) power_supply_changed(hidpp->battery.ps); if (hidpp->capabilities & HIDPP_CAPABILITY_HI_RES_SCROLL) hi_res_scroll_enable(hidpp); if (!(hidpp->quirks & HIDPP_QUIRK_DELAYED_INIT) || hidpp->delayed_input) /* if the input nodes are already created, we can stop now */ return; input = hidpp_allocate_input(hdev); if (!input) { hid_err(hdev, "cannot allocate new input device: %d\n", ret); return; } hidpp_populate_input(hidpp, input); ret = input_register_device(input); if (ret) { input_free_device(input); return; } hidpp->delayed_input = input; } static DEVICE_ATTR(builtin_power_supply, 0000, NULL, NULL); static struct attribute *sysfs_attrs[] = { &dev_attr_builtin_power_supply.attr, NULL }; static const struct attribute_group ps_attribute_group = { .attrs = sysfs_attrs }; static int hidpp_get_report_length(struct hid_device *hdev, int id) { struct hid_report_enum *re; struct hid_report *report; re = &(hdev->report_enum[HID_OUTPUT_REPORT]); report = re->report_id_hash[id]; if (!report) return 0; return report->field[0]->report_count + 1; } static u8 hidpp_validate_device(struct hid_device *hdev) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); int id, report_length; u8 supported_reports = 0; id = REPORT_ID_HIDPP_SHORT; report_length = hidpp_get_report_length(hdev, id); if (report_length) { if (report_length < HIDPP_REPORT_SHORT_LENGTH) goto bad_device; supported_reports |= HIDPP_REPORT_SHORT_SUPPORTED; } id = REPORT_ID_HIDPP_LONG; report_length = hidpp_get_report_length(hdev, id); if (report_length) { if (report_length < HIDPP_REPORT_LONG_LENGTH) goto bad_device; supported_reports |= HIDPP_REPORT_LONG_SUPPORTED; } id = REPORT_ID_HIDPP_VERY_LONG; report_length = hidpp_get_report_length(hdev, id); if (report_length) { if (report_length < HIDPP_REPORT_LONG_LENGTH || report_length > HIDPP_REPORT_VERY_LONG_MAX_LENGTH) goto bad_device; supported_reports |= HIDPP_REPORT_VERY_LONG_SUPPORTED; hidpp->very_long_report_length = report_length; } return supported_reports; bad_device: hid_warn(hdev, "not enough values in hidpp report %d\n", id); return false; } static bool hidpp_application_equals(struct hid_device *hdev, unsigned int application) { struct list_head *report_list; struct hid_report *report; report_list = &hdev->report_enum[HID_INPUT_REPORT].report_list; report = list_first_entry_or_null(report_list, struct hid_report, list); return report && report->application == application; } static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) { struct hidpp_device *hidpp; int ret; unsigned int connect_mask = HID_CONNECT_DEFAULT; /* report_fixup needs drvdata to be set before we call hid_parse */ hidpp = devm_kzalloc(&hdev->dev, sizeof(*hidpp), GFP_KERNEL); if (!hidpp) return -ENOMEM; hidpp->hid_dev = hdev; hidpp->name = hdev->name; hidpp->quirks = id->driver_data; hid_set_drvdata(hdev, hidpp); ret = hid_parse(hdev); if (ret) { hid_err(hdev, "%s:parse failed\n", __func__); return ret; } /* * Make sure the device is HID++ capable, otherwise treat as generic HID */ hidpp->supported_reports = hidpp_validate_device(hdev); if (!hidpp->supported_reports) { hid_set_drvdata(hdev, NULL); devm_kfree(&hdev->dev, hidpp); return hid_hw_start(hdev, HID_CONNECT_DEFAULT); } if (id->group == HID_GROUP_LOGITECH_27MHZ_DEVICE && hidpp_application_equals(hdev, HID_GD_MOUSE)) hidpp->quirks |= HIDPP_QUIRK_HIDPP_WHEELS | HIDPP_QUIRK_HIDPP_EXTRA_MOUSE_BTNS; if (id->group == HID_GROUP_LOGITECH_27MHZ_DEVICE && hidpp_application_equals(hdev, HID_GD_KEYBOARD)) hidpp->quirks |= HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS; if (hidpp->quirks & HIDPP_QUIRK_CLASS_WTP) { ret = wtp_allocate(hdev, id); if (ret) return ret; } else if (hidpp->quirks & HIDPP_QUIRK_CLASS_K400) { ret = k400_allocate(hdev); if (ret) return ret; } INIT_WORK(&hidpp->work, hidpp_connect_event); mutex_init(&hidpp->send_mutex); init_waitqueue_head(&hidpp->wait); /* indicates we are handling the battery properties in the kernel */ ret = sysfs_create_group(&hdev->dev.kobj, &ps_attribute_group); if (ret) hid_warn(hdev, "Cannot allocate sysfs group for %s\n", hdev->name); /* * First call hid_hw_start(hdev, 0) to allow IO without connecting any * hid subdrivers (hid-input, hidraw). This allows retrieving the dev's * name and serial number and store these in hdev->name and hdev->uniq, * before the hid-input and hidraw drivers expose these to userspace. */ ret = hid_hw_start(hdev, 0); if (ret) { hid_err(hdev, "hw start failed\n"); goto hid_hw_start_fail; } ret = hid_hw_open(hdev); if (ret < 0) { dev_err(&hdev->dev, "%s:hid_hw_open returned error:%d\n", __func__, ret); goto hid_hw_open_fail; } /* Allow incoming packets */ hid_device_io_start(hdev); /* Get name + serial, store in hdev->name + hdev->uniq */ if (id->group == HID_GROUP_LOGITECH_DJ_DEVICE) hidpp_unifying_init(hidpp); else hidpp_non_unifying_init(hidpp); if (hidpp->quirks & HIDPP_QUIRK_DELAYED_INIT) connect_mask &= ~HID_CONNECT_HIDINPUT; /* Now export the actual inputs and hidraw nodes to the world */ hid_device_io_stop(hdev); ret = hid_connect(hdev, connect_mask); if (ret) { hid_err(hdev, "%s:hid_connect returned error %d\n", __func__, ret); goto hid_hw_init_fail; } /* Check for connected devices now that incoming packets will not be disabled again */ hid_device_io_start(hdev); schedule_work(&hidpp->work); flush_work(&hidpp->work); if (hidpp->quirks & HIDPP_QUIRK_CLASS_G920) { struct hidpp_ff_private_data data; ret = g920_get_config(hidpp, &data); if (!ret) ret = hidpp_ff_init(hidpp, &data); if (ret) hid_warn(hidpp->hid_dev, "Unable to initialize force feedback support, errno %d\n", ret); } /* * This relies on logi_dj_ll_close() being a no-op so that DJ connection * events will still be received. */ hid_hw_close(hdev); return ret; hid_hw_init_fail: hid_hw_close(hdev); hid_hw_open_fail: hid_hw_stop(hdev); hid_hw_start_fail: sysfs_remove_group(&hdev->dev.kobj, &ps_attribute_group); cancel_work_sync(&hidpp->work); mutex_destroy(&hidpp->send_mutex); return ret; } static void hidpp_remove(struct hid_device *hdev) { struct hidpp_device *hidpp = hid_get_drvdata(hdev); if (!hidpp) return hid_hw_stop(hdev); sysfs_remove_group(&hdev->dev.kobj, &ps_attribute_group); hid_hw_stop(hdev); cancel_work_sync(&hidpp->work); mutex_destroy(&hidpp->send_mutex); } #define LDJ_DEVICE(product) \ HID_DEVICE(BUS_USB, HID_GROUP_LOGITECH_DJ_DEVICE, \ USB_VENDOR_ID_LOGITECH, (product)) #define L27MHZ_DEVICE(product) \ HID_DEVICE(BUS_USB, HID_GROUP_LOGITECH_27MHZ_DEVICE, \ USB_VENDOR_ID_LOGITECH, (product)) static const struct hid_device_id hidpp_devices[] = { { /* wireless touchpad */ LDJ_DEVICE(0x4011), .driver_data = HIDPP_QUIRK_CLASS_WTP | HIDPP_QUIRK_DELAYED_INIT | HIDPP_QUIRK_WTP_PHYSICAL_BUTTONS }, { /* wireless touchpad T650 */ LDJ_DEVICE(0x4101), .driver_data = HIDPP_QUIRK_CLASS_WTP | HIDPP_QUIRK_DELAYED_INIT }, { /* wireless touchpad T651 */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_T651), .driver_data = HIDPP_QUIRK_CLASS_WTP | HIDPP_QUIRK_DELAYED_INIT }, { /* Mouse Logitech Anywhere MX */ LDJ_DEVICE(0x1017), .driver_data = HIDPP_QUIRK_HI_RES_SCROLL_1P0 }, { /* Mouse logitech M560 */ LDJ_DEVICE(0x402d), .driver_data = HIDPP_QUIRK_DELAYED_INIT | HIDPP_QUIRK_CLASS_M560 }, { /* Mouse Logitech M705 (firmware RQM17) */ LDJ_DEVICE(0x101b), .driver_data = HIDPP_QUIRK_HI_RES_SCROLL_1P0 }, { /* Mouse Logitech Performance MX */ LDJ_DEVICE(0x101a), .driver_data = HIDPP_QUIRK_HI_RES_SCROLL_1P0 }, { /* Keyboard logitech K400 */ LDJ_DEVICE(0x4024), .driver_data = HIDPP_QUIRK_CLASS_K400 }, { /* Solar Keyboard Logitech K750 */ LDJ_DEVICE(0x4002), .driver_data = HIDPP_QUIRK_CLASS_K750 }, { /* Keyboard MX5000 (Bluetooth-receiver in HID proxy mode) */ LDJ_DEVICE(0xb305), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* Dinovo Edge (Bluetooth-receiver in HID proxy mode) */ LDJ_DEVICE(0xb309), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* Keyboard MX5500 (Bluetooth-receiver in HID proxy mode) */ LDJ_DEVICE(0xb30b), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { LDJ_DEVICE(HID_ANY_ID) }, { /* Keyboard LX501 (Y-RR53) */ L27MHZ_DEVICE(0x0049), .driver_data = HIDPP_QUIRK_KBD_ZOOM_WHEEL }, { /* Keyboard MX3000 (Y-RAM74) */ L27MHZ_DEVICE(0x0057), .driver_data = HIDPP_QUIRK_KBD_SCROLL_WHEEL }, { /* Keyboard MX3200 (Y-RAV80) */ L27MHZ_DEVICE(0x005c), .driver_data = HIDPP_QUIRK_KBD_ZOOM_WHEEL }, { /* S510 Media Remote */ L27MHZ_DEVICE(0x00fe), .driver_data = HIDPP_QUIRK_KBD_SCROLL_WHEEL }, { L27MHZ_DEVICE(HID_ANY_ID) }, { /* Logitech G403 Wireless Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC082) }, { /* Logitech G502 Lightspeed Wireless Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC08D) }, { /* Logitech G703 Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC087) }, { /* Logitech G703 Hero Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC090) }, { /* Logitech G900 Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC081) }, { /* Logitech G903 Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC086) }, { /* Logitech G Pro Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC088) }, { /* MX Vertical over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC08A) }, { /* Logitech G703 Hero Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC090) }, { /* Logitech G903 Hero Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC091) }, { /* Logitech G915 TKL Keyboard over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC343) }, { /* Logitech G920 Wheel over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G920_WHEEL), .driver_data = HIDPP_QUIRK_CLASS_G920 | HIDPP_QUIRK_FORCE_OUTPUT_REPORTS}, { /* Logitech G923 Wheel (Xbox version) over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G923_XBOX_WHEEL), .driver_data = HIDPP_QUIRK_CLASS_G920 | HIDPP_QUIRK_FORCE_OUTPUT_REPORTS }, { /* Logitech G Pro X Superlight Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC094) }, { /* Logitech G Pro X Superlight 2 Gaming Mouse over USB */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC09b) }, { /* G935 Gaming Headset */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0x0a87), .driver_data = HIDPP_QUIRK_WIRELESS_STATUS }, { /* MX5000 keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb305), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* Dinovo Edge keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb309), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* MX5500 keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb30b), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* Logitech G915 TKL keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb35f) }, { /* M-RCQ142 V470 Cordless Laser Mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb008) }, { /* MX Master mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb012) }, { /* M720 Triathlon mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb015) }, { /* MX Master 2S mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb019) }, { /* MX Ergo trackball over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb01d) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb01e) }, { /* MX Vertical mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb020) }, { /* Signature M650 over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb02a) }, { /* MX Master 3 mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb023) }, { /* MX Anywhere 3 mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb025) }, { /* MX Master 3S mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb034) }, { /* MX Anywhere 3SB mouse over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb038) }, {} }; MODULE_DEVICE_TABLE(hid, hidpp_devices); static const struct hid_usage_id hidpp_usages[] = { { HID_GD_WHEEL, EV_REL, REL_WHEEL_HI_RES }, { HID_ANY_ID - 1, HID_ANY_ID - 1, HID_ANY_ID - 1} }; static struct hid_driver hidpp_driver = { .name = "logitech-hidpp-device", .id_table = hidpp_devices, .report_fixup = hidpp_report_fixup, .probe = hidpp_probe, .remove = hidpp_remove, .raw_event = hidpp_raw_event, .usage_table = hidpp_usages, .event = hidpp_event, .input_configured = hidpp_input_configured, .input_mapping = hidpp_input_mapping, .input_mapped = hidpp_input_mapped, }; module_hid_driver(hidpp_driver); |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 | // SPDX-License-Identifier: GPL-2.0 /* Copyright 2011-2014 Autronica Fire and Security AS * * Author(s): * 2011-2014 Arvid Brodin, arvid.brodin@alten.se * * Frame router for HSR and PRP. */ #include "hsr_forward.h" #include <linux/types.h> #include <linux/skbuff.h> #include <linux/etherdevice.h> #include <linux/if_vlan.h> #include "hsr_main.h" #include "hsr_framereg.h" struct hsr_node; /* The uses I can see for these HSR supervision frames are: * 1) Use the frames that are sent after node initialization ("HSR_TLV.Type = * 22") to reset any sequence_nr counters belonging to that node. Useful if * the other node's counter has been reset for some reason. * -- * Or not - resetting the counter and bridging the frame would create a * loop, unfortunately. * * 2) Use the LifeCheck frames to detect ring breaks. I.e. if no LifeCheck * frame is received from a particular node, we know something is wrong. * We just register these (as with normal frames) and throw them away. * * 3) Allow different MAC addresses for the two slave interfaces, using the * MacAddressA field. */ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb) { struct ethhdr *eth_hdr; struct hsr_sup_tag *hsr_sup_tag; struct hsrv1_ethhdr_sp *hsr_V1_hdr; struct hsr_sup_tlv *hsr_sup_tlv; u16 total_length = 0; WARN_ON_ONCE(!skb_mac_header_was_set(skb)); eth_hdr = (struct ethhdr *)skb_mac_header(skb); /* Correct addr? */ if (!ether_addr_equal(eth_hdr->h_dest, hsr->sup_multicast_addr)) return false; /* Correct ether type?. */ if (!(eth_hdr->h_proto == htons(ETH_P_PRP) || eth_hdr->h_proto == htons(ETH_P_HSR))) return false; /* Get the supervision header from correct location. */ if (eth_hdr->h_proto == htons(ETH_P_HSR)) { /* Okay HSRv1. */ total_length = sizeof(struct hsrv1_ethhdr_sp); if (!pskb_may_pull(skb, total_length)) return false; hsr_V1_hdr = (struct hsrv1_ethhdr_sp *)skb_mac_header(skb); if (hsr_V1_hdr->hsr.encap_proto != htons(ETH_P_PRP)) return false; hsr_sup_tag = &hsr_V1_hdr->hsr_sup; } else { total_length = sizeof(struct hsrv0_ethhdr_sp); if (!pskb_may_pull(skb, total_length)) return false; hsr_sup_tag = &((struct hsrv0_ethhdr_sp *)skb_mac_header(skb))->hsr_sup; } if (hsr_sup_tag->tlv.HSR_TLV_type != HSR_TLV_ANNOUNCE && hsr_sup_tag->tlv.HSR_TLV_type != HSR_TLV_LIFE_CHECK && hsr_sup_tag->tlv.HSR_TLV_type != PRP_TLV_LIFE_CHECK_DD && hsr_sup_tag->tlv.HSR_TLV_type != PRP_TLV_LIFE_CHECK_DA) return false; if (hsr_sup_tag->tlv.HSR_TLV_length != 12 && hsr_sup_tag->tlv.HSR_TLV_length != sizeof(struct hsr_sup_payload)) return false; /* Get next tlv */ total_length += hsr_sup_tag->tlv.HSR_TLV_length; if (!pskb_may_pull(skb, total_length)) return false; skb_pull(skb, total_length); hsr_sup_tlv = (struct hsr_sup_tlv *)skb->data; skb_push(skb, total_length); /* if this is a redbox supervision frame we need to verify * that more data is available */ if (hsr_sup_tlv->HSR_TLV_type == PRP_TLV_REDBOX_MAC) { /* tlv length must be a length of a mac address */ if (hsr_sup_tlv->HSR_TLV_length != sizeof(struct hsr_sup_payload)) return false; /* make sure another tlv follows */ total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tlv->HSR_TLV_length; if (!pskb_may_pull(skb, total_length)) return false; /* get next tlv */ skb_pull(skb, total_length); hsr_sup_tlv = (struct hsr_sup_tlv *)skb->data; skb_push(skb, total_length); } /* end of tlvs must follow at the end */ if (hsr_sup_tlv->HSR_TLV_type == HSR_TLV_EOT && hsr_sup_tlv->HSR_TLV_length != 0) return false; return true; } static bool is_proxy_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb) { struct hsr_sup_payload *payload; struct ethhdr *eth_hdr; u16 total_length = 0; eth_hdr = (struct ethhdr *)skb_mac_header(skb); /* Get the HSR protocol revision. */ if (eth_hdr->h_proto == htons(ETH_P_HSR)) total_length = sizeof(struct hsrv1_ethhdr_sp); else total_length = sizeof(struct hsrv0_ethhdr_sp); if (!pskb_may_pull(skb, total_length + sizeof(struct hsr_sup_payload))) return false; skb_pull(skb, total_length); payload = (struct hsr_sup_payload *)skb->data; skb_push(skb, total_length); /* For RedBox (HSR-SAN) check if we have received the supervision * frame with MAC addresses from own ProxyNodeTable. */ return hsr_is_node_in_db(&hsr->proxy_node_db, payload->macaddress_A); } static struct sk_buff *create_stripped_skb_hsr(struct sk_buff *skb_in, struct hsr_frame_info *frame) { struct sk_buff *skb; int copylen; unsigned char *dst, *src; skb_pull(skb_in, HSR_HLEN); skb = __pskb_copy(skb_in, skb_headroom(skb_in) - HSR_HLEN, GFP_ATOMIC); skb_push(skb_in, HSR_HLEN); if (!skb) return NULL; skb_reset_mac_header(skb); if (skb->ip_summed == CHECKSUM_PARTIAL) skb->csum_start -= HSR_HLEN; copylen = 2 * ETH_ALEN; if (frame->is_vlan) copylen += VLAN_HLEN; src = skb_mac_header(skb_in); dst = skb_mac_header(skb); memcpy(dst, src, copylen); skb->protocol = eth_hdr(skb)->h_proto; return skb; } struct sk_buff *hsr_get_untagged_frame(struct hsr_frame_info *frame, struct hsr_port *port) { if (!frame->skb_std) { if (frame->skb_hsr) frame->skb_std = create_stripped_skb_hsr(frame->skb_hsr, frame); else netdev_warn_once(port->dev, "Unexpected frame received in hsr_get_untagged_frame()\n"); if (!frame->skb_std) return NULL; } return skb_clone(frame->skb_std, GFP_ATOMIC); } struct sk_buff *prp_get_untagged_frame(struct hsr_frame_info *frame, struct hsr_port *port) { if (!frame->skb_std) { if (frame->skb_prp) { /* trim the skb by len - HSR_HLEN to exclude RCT */ skb_trim(frame->skb_prp, frame->skb_prp->len - HSR_HLEN); frame->skb_std = __pskb_copy(frame->skb_prp, skb_headroom(frame->skb_prp), GFP_ATOMIC); } else { /* Unexpected */ WARN_ONCE(1, "%s:%d: Unexpected frame received (port_src %s)\n", __FILE__, __LINE__, port->dev->name); return NULL; } } return skb_clone(frame->skb_std, GFP_ATOMIC); } static void prp_set_lan_id(struct prp_rct *trailer, struct hsr_port *port) { int lane_id; if (port->type == HSR_PT_SLAVE_A) lane_id = 0; else lane_id = 1; /* Add net_id in the upper 3 bits of lane_id */ lane_id |= port->hsr->net_id; set_prp_lan_id(trailer, lane_id); } /* Tailroom for PRP rct should have been created before calling this */ static struct sk_buff *prp_fill_rct(struct sk_buff *skb, struct hsr_frame_info *frame, struct hsr_port *port) { struct prp_rct *trailer; int min_size = ETH_ZLEN; int lsdu_size; if (!skb) return skb; if (frame->is_vlan) min_size = VLAN_ETH_ZLEN; if (skb_put_padto(skb, min_size)) return NULL; trailer = (struct prp_rct *)skb_put(skb, HSR_HLEN); lsdu_size = skb->len - 14; if (frame->is_vlan) lsdu_size -= 4; prp_set_lan_id(trailer, port); set_prp_LSDU_size(trailer, lsdu_size); trailer->sequence_nr = htons(frame->sequence_nr); trailer->PRP_suffix = htons(ETH_P_PRP); skb->protocol = eth_hdr(skb)->h_proto; return skb; } static void hsr_set_path_id(struct hsr_ethhdr *hsr_ethhdr, struct hsr_port *port) { int path_id; if (port->type == HSR_PT_SLAVE_A) path_id = 0; else path_id = 1; set_hsr_tag_path(&hsr_ethhdr->hsr_tag, path_id); } static struct sk_buff *hsr_fill_tag(struct sk_buff *skb, struct hsr_frame_info *frame, struct hsr_port *port, u8 proto_version) { struct hsr_ethhdr *hsr_ethhdr; unsigned char *pc; int lsdu_size; /* pad to minimum packet size which is 60 + 6 (HSR tag) */ if (skb_put_padto(skb, ETH_ZLEN + HSR_HLEN)) return NULL; lsdu_size = skb->len - 14; if (frame->is_vlan) lsdu_size -= 4; pc = skb_mac_header(skb); if (frame->is_vlan) /* This 4-byte shift (size of a vlan tag) does not * mean that the ethhdr starts there. But rather it * provides the proper environment for accessing * the fields, such as hsr_tag etc., just like * when the vlan tag is not there. This is because * the hsr tag is after the vlan tag. */ hsr_ethhdr = (struct hsr_ethhdr *)(pc + VLAN_HLEN); else hsr_ethhdr = (struct hsr_ethhdr *)pc; hsr_set_path_id(hsr_ethhdr, port); set_hsr_tag_LSDU_size(&hsr_ethhdr->hsr_tag, lsdu_size); hsr_ethhdr->hsr_tag.sequence_nr = htons(frame->sequence_nr); hsr_ethhdr->hsr_tag.encap_proto = hsr_ethhdr->ethhdr.h_proto; hsr_ethhdr->ethhdr.h_proto = htons(proto_version ? ETH_P_HSR : ETH_P_PRP); skb->protocol = hsr_ethhdr->ethhdr.h_proto; return skb; } /* If the original frame was an HSR tagged frame, just clone it to be sent * unchanged. Otherwise, create a private frame especially tagged for 'port'. */ struct sk_buff *hsr_create_tagged_frame(struct hsr_frame_info *frame, struct hsr_port *port) { unsigned char *dst, *src; struct sk_buff *skb; int movelen; if (frame->skb_hsr) { struct hsr_ethhdr *hsr_ethhdr = (struct hsr_ethhdr *)skb_mac_header(frame->skb_hsr); /* set the lane id properly */ hsr_set_path_id(hsr_ethhdr, port); return skb_clone(frame->skb_hsr, GFP_ATOMIC); } else if (port->dev->features & NETIF_F_HW_HSR_TAG_INS) { return skb_clone(frame->skb_std, GFP_ATOMIC); } /* Create the new skb with enough headroom to fit the HSR tag */ skb = __pskb_copy(frame->skb_std, skb_headroom(frame->skb_std) + HSR_HLEN, GFP_ATOMIC); if (!skb) return NULL; skb_reset_mac_header(skb); if (skb->ip_summed == CHECKSUM_PARTIAL) skb->csum_start += HSR_HLEN; movelen = ETH_HLEN; if (frame->is_vlan) movelen += VLAN_HLEN; src = skb_mac_header(skb); dst = skb_push(skb, HSR_HLEN); memmove(dst, src, movelen); skb_reset_mac_header(skb); /* skb_put_padto free skb on error and hsr_fill_tag returns NULL in * that case */ return hsr_fill_tag(skb, frame, port, port->hsr->prot_version); } struct sk_buff *prp_create_tagged_frame(struct hsr_frame_info *frame, struct hsr_port *port) { struct sk_buff *skb; if (frame->skb_prp) { struct prp_rct *trailer = skb_get_PRP_rct(frame->skb_prp); if (trailer) { prp_set_lan_id(trailer, port); } else { WARN_ONCE(!trailer, "errored PRP skb"); return NULL; } return skb_clone(frame->skb_prp, GFP_ATOMIC); } else if (port->dev->features & NETIF_F_HW_HSR_TAG_INS) { return skb_clone(frame->skb_std, GFP_ATOMIC); } skb = skb_copy_expand(frame->skb_std, skb_headroom(frame->skb_std), skb_tailroom(frame->skb_std) + HSR_HLEN, GFP_ATOMIC); return prp_fill_rct(skb, frame, port); } static void hsr_deliver_master(struct sk_buff *skb, struct net_device *dev, struct hsr_node *node_src) { bool was_multicast_frame; int res, recv_len; was_multicast_frame = (skb->pkt_type == PACKET_MULTICAST); hsr_addr_subst_source(node_src, skb); skb_pull(skb, ETH_HLEN); recv_len = skb->len; res = netif_rx(skb); if (res == NET_RX_DROP) { dev->stats.rx_dropped++; } else { dev->stats.rx_packets++; dev->stats.rx_bytes += recv_len; if (was_multicast_frame) dev->stats.multicast++; } } static int hsr_xmit(struct sk_buff *skb, struct hsr_port *port, struct hsr_frame_info *frame) { if (frame->port_rcv->type == HSR_PT_MASTER) { hsr_addr_subst_dest(frame->node_src, skb, port); /* Address substitution (IEC62439-3 pp 26, 50): replace mac * address of outgoing frame with that of the outgoing slave's. */ ether_addr_copy(eth_hdr(skb)->h_source, port->dev->dev_addr); } /* When HSR node is used as RedBox - the frame received from HSR ring * requires source MAC address (SA) replacement to one which can be * recognized by SAN devices (otherwise, frames are dropped by switch) */ if (port->type == HSR_PT_INTERLINK) ether_addr_copy(eth_hdr(skb)->h_source, port->hsr->macaddress_redbox); return dev_queue_xmit(skb); } bool prp_drop_frame(struct hsr_frame_info *frame, struct hsr_port *port) { return ((frame->port_rcv->type == HSR_PT_SLAVE_A && port->type == HSR_PT_SLAVE_B) || (frame->port_rcv->type == HSR_PT_SLAVE_B && port->type == HSR_PT_SLAVE_A)); } bool hsr_drop_frame(struct hsr_frame_info *frame, struct hsr_port *port) { struct sk_buff *skb; if (port->dev->features & NETIF_F_HW_HSR_FWD) return prp_drop_frame(frame, port); /* RedBox specific frames dropping policies * * Do not send HSR supervisory frames to SAN devices */ if (frame->is_supervision && port->type == HSR_PT_INTERLINK) return true; /* Do not forward to other HSR port (A or B) unicast frames which * are addressed to interlink port (and are in the ProxyNodeTable). */ skb = frame->skb_hsr; if (skb && prp_drop_frame(frame, port) && is_unicast_ether_addr(eth_hdr(skb)->h_dest) && hsr_is_node_in_db(&port->hsr->proxy_node_db, eth_hdr(skb)->h_dest)) { return true; } /* Do not forward to port C (Interlink) frames from nodes A and B * if DA is in NodeTable. */ if ((frame->port_rcv->type == HSR_PT_SLAVE_A || frame->port_rcv->type == HSR_PT_SLAVE_B) && port->type == HSR_PT_INTERLINK) { skb = frame->skb_hsr; if (skb && is_unicast_ether_addr(eth_hdr(skb)->h_dest) && hsr_is_node_in_db(&port->hsr->node_db, eth_hdr(skb)->h_dest)) { return true; } } /* Do not forward to port A and B unicast frames received on the * interlink port if it is addressed to one of nodes registered in * the ProxyNodeTable. */ if ((port->type == HSR_PT_SLAVE_A || port->type == HSR_PT_SLAVE_B) && frame->port_rcv->type == HSR_PT_INTERLINK) { skb = frame->skb_std; if (skb && is_unicast_ether_addr(eth_hdr(skb)->h_dest) && hsr_is_node_in_db(&port->hsr->proxy_node_db, eth_hdr(skb)->h_dest)) { return true; } } return false; } /* Forward the frame through all devices except: * - Back through the receiving device * - If it's a HSR frame: through a device where it has passed before * - if it's a PRP frame: through another PRP slave device (no bridge) * - To the local HSR master only if the frame is directly addressed to it, or * a non-supervision multicast or broadcast frame. * * HSR slave devices should insert a HSR tag into the frame, or forward the * frame unchanged if it's already tagged. Interlink devices should strip HSR * tags if they're of the non-HSR type (but only after duplicate discard). The * master device always strips HSR tags. */ static void hsr_forward_do(struct hsr_frame_info *frame) { struct hsr_port *port; struct sk_buff *skb; bool sent = false; hsr_for_each_port(frame->port_rcv->hsr, port) { struct hsr_priv *hsr = port->hsr; /* Don't send frame back the way it came */ if (port == frame->port_rcv) continue; /* Don't deliver locally unless we should */ if (port->type == HSR_PT_MASTER && !frame->is_local_dest) continue; /* Deliver frames directly addressed to us to master only */ if (port->type != HSR_PT_MASTER && frame->is_local_exclusive) continue; /* If hardware duplicate generation is enabled, only send out * one port. */ if ((port->dev->features & NETIF_F_HW_HSR_DUP) && sent) continue; /* Don't send frame over port where it has been sent before. * Also for SAN, this shouldn't be done. */ if (!frame->is_from_san && hsr->proto_ops->register_frame_out && hsr->proto_ops->register_frame_out(port, frame)) continue; if (frame->is_supervision && port->type == HSR_PT_MASTER && !frame->is_proxy_supervision) { hsr_handle_sup_frame(frame); continue; } /* Check if frame is to be dropped. Eg. for PRP no forward * between ports, or sending HSR supervision to RedBox. */ if (hsr->proto_ops->drop_frame && hsr->proto_ops->drop_frame(frame, port)) continue; if (port->type == HSR_PT_SLAVE_A || port->type == HSR_PT_SLAVE_B) skb = hsr->proto_ops->create_tagged_frame(frame, port); else skb = hsr->proto_ops->get_untagged_frame(frame, port); if (!skb) { frame->port_rcv->dev->stats.rx_dropped++; continue; } skb->dev = port->dev; if (port->type == HSR_PT_MASTER) { hsr_deliver_master(skb, port->dev, frame->node_src); } else { if (!hsr_xmit(skb, port, frame)) if (port->type == HSR_PT_SLAVE_A || port->type == HSR_PT_SLAVE_B) sent = true; } } } static void check_local_dest(struct hsr_priv *hsr, struct sk_buff *skb, struct hsr_frame_info *frame) { if (hsr_addr_is_self(hsr, eth_hdr(skb)->h_dest)) { frame->is_local_exclusive = true; skb->pkt_type = PACKET_HOST; } else { frame->is_local_exclusive = false; } if (skb->pkt_type == PACKET_HOST || skb->pkt_type == PACKET_MULTICAST || skb->pkt_type == PACKET_BROADCAST) { frame->is_local_dest = true; } else { frame->is_local_dest = false; } } static void handle_std_frame(struct sk_buff *skb, struct hsr_frame_info *frame) { struct hsr_port *port = frame->port_rcv; struct hsr_priv *hsr = port->hsr; frame->skb_hsr = NULL; frame->skb_prp = NULL; frame->skb_std = skb; if (port->type != HSR_PT_MASTER) frame->is_from_san = true; if (port->type == HSR_PT_MASTER || port->type == HSR_PT_INTERLINK) { /* Sequence nr for the master/interlink node */ lockdep_assert_held(&hsr->seqnr_lock); frame->sequence_nr = hsr->sequence_nr; hsr->sequence_nr++; } } int hsr_fill_frame_info(__be16 proto, struct sk_buff *skb, struct hsr_frame_info *frame) { struct hsr_port *port = frame->port_rcv; struct hsr_priv *hsr = port->hsr; /* HSRv0 supervisory frames double as a tag so treat them as tagged. */ if ((!hsr->prot_version && proto == htons(ETH_P_PRP)) || proto == htons(ETH_P_HSR)) { /* Check if skb contains hsr_ethhdr */ if (skb->mac_len < sizeof(struct hsr_ethhdr)) return -EINVAL; /* HSR tagged frame :- Data or Supervision */ frame->skb_std = NULL; frame->skb_prp = NULL; frame->skb_hsr = skb; frame->sequence_nr = hsr_get_skb_sequence_nr(skb); return 0; } /* Standard frame or PRP from master port */ handle_std_frame(skb, frame); return 0; } int prp_fill_frame_info(__be16 proto, struct sk_buff *skb, struct hsr_frame_info *frame) { /* Supervision frame */ struct prp_rct *rct = skb_get_PRP_rct(skb); if (rct && prp_check_lsdu_size(skb, rct, frame->is_supervision)) { frame->skb_hsr = NULL; frame->skb_std = NULL; frame->skb_prp = skb; frame->sequence_nr = prp_get_skb_sequence_nr(rct); return 0; } handle_std_frame(skb, frame); return 0; } static int fill_frame_info(struct hsr_frame_info *frame, struct sk_buff *skb, struct hsr_port *port) { struct hsr_priv *hsr = port->hsr; struct hsr_vlan_ethhdr *vlan_hdr; struct list_head *n_db; struct ethhdr *ethhdr; __be16 proto; int ret; /* Check if skb contains ethhdr */ if (skb->mac_len < sizeof(struct ethhdr)) return -EINVAL; memset(frame, 0, sizeof(*frame)); frame->is_supervision = is_supervision_frame(port->hsr, skb); if (frame->is_supervision && hsr->redbox) frame->is_proxy_supervision = is_proxy_supervision_frame(port->hsr, skb); n_db = &hsr->node_db; if (port->type == HSR_PT_INTERLINK) n_db = &hsr->proxy_node_db; frame->node_src = hsr_get_node(port, n_db, skb, frame->is_supervision, port->type); if (!frame->node_src) return -1; /* Unknown node and !is_supervision, or no mem */ ethhdr = (struct ethhdr *)skb_mac_header(skb); frame->is_vlan = false; proto = ethhdr->h_proto; if (proto == htons(ETH_P_8021Q)) frame->is_vlan = true; if (frame->is_vlan) { /* Note: skb->mac_len might be wrong here. */ if (!pskb_may_pull(skb, skb_mac_offset(skb) + offsetofend(struct hsr_vlan_ethhdr, vlanhdr))) return -EINVAL; vlan_hdr = (struct hsr_vlan_ethhdr *)skb_mac_header(skb); proto = vlan_hdr->vlanhdr.h_vlan_encapsulated_proto; } frame->is_from_san = false; frame->port_rcv = port; ret = hsr->proto_ops->fill_frame_info(proto, skb, frame); if (ret) return ret; check_local_dest(port->hsr, skb, frame); return 0; } /* Must be called holding rcu read lock (because of the port parameter) */ void hsr_forward_skb(struct sk_buff *skb, struct hsr_port *port) { struct hsr_frame_info frame; rcu_read_lock(); if (fill_frame_info(&frame, skb, port) < 0) goto out_drop; hsr_register_frame_in(frame.node_src, port, frame.sequence_nr); hsr_forward_do(&frame); rcu_read_unlock(); /* Gets called for ingress frames as well as egress from master port. * So check and increment stats for master port only here. */ if (port->type == HSR_PT_MASTER || port->type == HSR_PT_INTERLINK) { port->dev->stats.tx_packets++; port->dev->stats.tx_bytes += skb->len; } kfree_skb(frame.skb_hsr); kfree_skb(frame.skb_prp); kfree_skb(frame.skb_std); return; out_drop: rcu_read_unlock(); port->dev->stats.tx_dropped++; kfree_skb(skb); } |
| 32 8 32 33 32 32 31 29 28 33 29 29 27 4 29 4 25 15 25 28 29 29 27 19 19 19 19 18 1 32 30 32 6 32 5 5 5 5 5 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 | /* * Copyright (C) 2014 Red Hat * Author: Rob Clark <robdclark@gmail.com> * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. */ #include <drm/drm_atomic.h> #include <drm/drm_crtc.h> #include <drm/drm_device.h> #include <drm/drm_modeset_lock.h> #include <drm/drm_print.h> /** * DOC: kms locking * * As KMS moves toward more fine grained locking, and atomic ioctl where * userspace can indirectly control locking order, it becomes necessary * to use &ww_mutex and acquire-contexts to avoid deadlocks. But because * the locking is more distributed around the driver code, we want a bit * of extra utility/tracking out of our acquire-ctx. This is provided * by &struct drm_modeset_lock and &struct drm_modeset_acquire_ctx. * * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.rst * * The basic usage pattern is to:: * * drm_modeset_acquire_init(ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE) * retry: * foreach (lock in random_ordered_set_of_locks) { * ret = drm_modeset_lock(lock, ctx) * if (ret == -EDEADLK) { * ret = drm_modeset_backoff(ctx); * if (!ret) * goto retry; * } * if (ret) * goto out; * } * ... do stuff ... * out: * drm_modeset_drop_locks(ctx); * drm_modeset_acquire_fini(ctx); * * For convenience this control flow is implemented in * DRM_MODESET_LOCK_ALL_BEGIN() and DRM_MODESET_LOCK_ALL_END() for the case * where all modeset locks need to be taken through drm_modeset_lock_all_ctx(). * * If all that is needed is a single modeset lock, then the &struct * drm_modeset_acquire_ctx is not needed and the locking can be simplified * by passing a NULL instead of ctx in the drm_modeset_lock() call or * calling drm_modeset_lock_single_interruptible(). To unlock afterwards * call drm_modeset_unlock(). * * On top of these per-object locks using &ww_mutex there's also an overall * &drm_mode_config.mutex, for protecting everything else. Mostly this means * probe state of connectors, and preventing hotplug add/removal of connectors. * * Finally there's a bunch of dedicated locks to protect drm core internal * lists and lookup data structures. */ static DEFINE_WW_CLASS(crtc_ww_class); #if IS_ENABLED(CONFIG_DRM_DEBUG_MODESET_LOCK) static noinline depot_stack_handle_t __drm_stack_depot_save(void) { unsigned long entries[8]; unsigned int n; n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); return stack_depot_save(entries, n, GFP_NOWAIT | __GFP_NOWARN); } static void __drm_stack_depot_print(depot_stack_handle_t stack_depot) { struct drm_printer p = drm_dbg_printer(NULL, DRM_UT_KMS, "drm_modeset_lock"); unsigned long *entries; unsigned int nr_entries; char *buf; buf = kmalloc(PAGE_SIZE, GFP_NOWAIT | __GFP_NOWARN); if (!buf) return; nr_entries = stack_depot_fetch(stack_depot, &entries); stack_trace_snprint(buf, PAGE_SIZE, entries, nr_entries, 2); drm_printf(&p, "attempting to lock a contended lock without backoff:\n%s", buf); kfree(buf); } static void __drm_stack_depot_init(void) { stack_depot_init(); } #else /* CONFIG_DRM_DEBUG_MODESET_LOCK */ static depot_stack_handle_t __drm_stack_depot_save(void) { return 0; } static void __drm_stack_depot_print(depot_stack_handle_t stack_depot) { } static void __drm_stack_depot_init(void) { } #endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */ /** * drm_modeset_lock_all - take all modeset locks * @dev: DRM device * * This function takes all modeset locks, suitable where a more fine-grained * scheme isn't (yet) implemented. Locks must be dropped by calling the * drm_modeset_unlock_all() function. * * This function is deprecated. It allocates a lock acquisition context and * stores it in &drm_device.mode_config. This facilitate conversion of * existing code because it removes the need to manually deal with the * acquisition context, but it is also brittle because the context is global * and care must be taken not to nest calls. New code should use the * drm_modeset_lock_all_ctx() function and pass in the context explicitly. */ void drm_modeset_lock_all(struct drm_device *dev) { struct drm_mode_config *config = &dev->mode_config; struct drm_modeset_acquire_ctx *ctx; int ret; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL); if (WARN_ON(!ctx)) return; mutex_lock(&config->mutex); drm_modeset_acquire_init(ctx, 0); retry: ret = drm_modeset_lock_all_ctx(dev, ctx); if (ret < 0) { if (ret == -EDEADLK) { drm_modeset_backoff(ctx); goto retry; } drm_modeset_acquire_fini(ctx); kfree(ctx); return; } ww_acquire_done(&ctx->ww_ctx); WARN_ON(config->acquire_ctx); /* * We hold the locks now, so it is safe to stash the acquisition * context for drm_modeset_unlock_all(). */ config->acquire_ctx = ctx; drm_warn_on_modeset_not_all_locked(dev); } EXPORT_SYMBOL(drm_modeset_lock_all); /** * drm_modeset_unlock_all - drop all modeset locks * @dev: DRM device * * This function drops all modeset locks taken by a previous call to the * drm_modeset_lock_all() function. * * This function is deprecated. It uses the lock acquisition context stored * in &drm_device.mode_config. This facilitates conversion of existing * code because it removes the need to manually deal with the acquisition * context, but it is also brittle because the context is global and care must * be taken not to nest calls. New code should pass the acquisition context * directly to the drm_modeset_drop_locks() function. */ void drm_modeset_unlock_all(struct drm_device *dev) { struct drm_mode_config *config = &dev->mode_config; struct drm_modeset_acquire_ctx *ctx = config->acquire_ctx; if (WARN_ON(!ctx)) return; config->acquire_ctx = NULL; drm_modeset_drop_locks(ctx); drm_modeset_acquire_fini(ctx); kfree(ctx); mutex_unlock(&dev->mode_config.mutex); } EXPORT_SYMBOL(drm_modeset_unlock_all); /** * drm_warn_on_modeset_not_all_locked - check that all modeset locks are locked * @dev: device * * Useful as a debug assert. */ void drm_warn_on_modeset_not_all_locked(struct drm_device *dev) { struct drm_crtc *crtc; /* Locking is currently fubar in the panic handler. */ if (oops_in_progress) return; drm_for_each_crtc(crtc, dev) WARN_ON(!drm_modeset_is_locked(&crtc->mutex)); WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex)); WARN_ON(!mutex_is_locked(&dev->mode_config.mutex)); } EXPORT_SYMBOL(drm_warn_on_modeset_not_all_locked); /** * drm_modeset_acquire_init - initialize acquire context * @ctx: the acquire context * @flags: 0 or %DRM_MODESET_ACQUIRE_INTERRUPTIBLE * * When passing %DRM_MODESET_ACQUIRE_INTERRUPTIBLE to @flags, * all calls to drm_modeset_lock() will perform an interruptible * wait. */ void drm_modeset_acquire_init(struct drm_modeset_acquire_ctx *ctx, uint32_t flags) { memset(ctx, 0, sizeof(*ctx)); ww_acquire_init(&ctx->ww_ctx, &crtc_ww_class); INIT_LIST_HEAD(&ctx->locked); if (flags & DRM_MODESET_ACQUIRE_INTERRUPTIBLE) ctx->interruptible = true; } EXPORT_SYMBOL(drm_modeset_acquire_init); /** * drm_modeset_acquire_fini - cleanup acquire context * @ctx: the acquire context */ void drm_modeset_acquire_fini(struct drm_modeset_acquire_ctx *ctx) { ww_acquire_fini(&ctx->ww_ctx); } EXPORT_SYMBOL(drm_modeset_acquire_fini); /** * drm_modeset_drop_locks - drop all locks * @ctx: the acquire context * * Drop all locks currently held against this acquire context. */ void drm_modeset_drop_locks(struct drm_modeset_acquire_ctx *ctx) { if (WARN_ON(ctx->contended)) __drm_stack_depot_print(ctx->stack_depot); while (!list_empty(&ctx->locked)) { struct drm_modeset_lock *lock; lock = list_first_entry(&ctx->locked, struct drm_modeset_lock, head); drm_modeset_unlock(lock); } } EXPORT_SYMBOL(drm_modeset_drop_locks); static inline int modeset_lock(struct drm_modeset_lock *lock, struct drm_modeset_acquire_ctx *ctx, bool interruptible, bool slow) { int ret; if (WARN_ON(ctx->contended)) __drm_stack_depot_print(ctx->stack_depot); if (ctx->trylock_only) { lockdep_assert_held(&ctx->ww_ctx); if (!ww_mutex_trylock(&lock->mutex, NULL)) return -EBUSY; else return 0; } else if (interruptible && slow) { ret = ww_mutex_lock_slow_interruptible(&lock->mutex, &ctx->ww_ctx); } else if (interruptible) { ret = ww_mutex_lock_interruptible(&lock->mutex, &ctx->ww_ctx); } else if (slow) { ww_mutex_lock_slow(&lock->mutex, &ctx->ww_ctx); ret = 0; } else { ret = ww_mutex_lock(&lock->mutex, &ctx->ww_ctx); } if (!ret) { WARN_ON(!list_empty(&lock->head)); list_add(&lock->head, &ctx->locked); } else if (ret == -EALREADY) { /* we already hold the lock.. this is fine. For atomic * we will need to be able to drm_modeset_lock() things * without having to keep track of what is already locked * or not. */ ret = 0; } else if (ret == -EDEADLK) { ctx->contended = lock; ctx->stack_depot = __drm_stack_depot_save(); } return ret; } /** * drm_modeset_backoff - deadlock avoidance backoff * @ctx: the acquire context * * If deadlock is detected (ie. drm_modeset_lock() returns -EDEADLK), * you must call this function to drop all currently held locks and * block until the contended lock becomes available. * * This function returns 0 on success, or -ERESTARTSYS if this context * is initialized with %DRM_MODESET_ACQUIRE_INTERRUPTIBLE and the * wait has been interrupted. */ int drm_modeset_backoff(struct drm_modeset_acquire_ctx *ctx) { struct drm_modeset_lock *contended = ctx->contended; ctx->contended = NULL; ctx->stack_depot = 0; if (WARN_ON(!contended)) return 0; drm_modeset_drop_locks(ctx); return modeset_lock(contended, ctx, ctx->interruptible, true); } EXPORT_SYMBOL(drm_modeset_backoff); /** * drm_modeset_lock_init - initialize lock * @lock: lock to init */ void drm_modeset_lock_init(struct drm_modeset_lock *lock) { ww_mutex_init(&lock->mutex, &crtc_ww_class); INIT_LIST_HEAD(&lock->head); __drm_stack_depot_init(); } EXPORT_SYMBOL(drm_modeset_lock_init); /** * drm_modeset_lock - take modeset lock * @lock: lock to take * @ctx: acquire ctx * * If @ctx is not NULL, then its ww acquire context is used and the * lock will be tracked by the context and can be released by calling * drm_modeset_drop_locks(). If -EDEADLK is returned, this means a * deadlock scenario has been detected and it is an error to attempt * to take any more locks without first calling drm_modeset_backoff(). * * If the @ctx is not NULL and initialized with * %DRM_MODESET_ACQUIRE_INTERRUPTIBLE, this function will fail with * -ERESTARTSYS when interrupted. * * If @ctx is NULL then the function call behaves like a normal, * uninterruptible non-nesting mutex_lock() call. */ int drm_modeset_lock(struct drm_modeset_lock *lock, struct drm_modeset_acquire_ctx *ctx) { if (ctx) return modeset_lock(lock, ctx, ctx->interruptible, false); ww_mutex_lock(&lock->mutex, NULL); return 0; } EXPORT_SYMBOL(drm_modeset_lock); /** * drm_modeset_lock_single_interruptible - take a single modeset lock * @lock: lock to take * * This function behaves as drm_modeset_lock() with a NULL context, * but performs interruptible waits. * * This function returns 0 on success, or -ERESTARTSYS when interrupted. */ int drm_modeset_lock_single_interruptible(struct drm_modeset_lock *lock) { return ww_mutex_lock_interruptible(&lock->mutex, NULL); } EXPORT_SYMBOL(drm_modeset_lock_single_interruptible); /** * drm_modeset_unlock - drop modeset lock * @lock: lock to release */ void drm_modeset_unlock(struct drm_modeset_lock *lock) { list_del_init(&lock->head); ww_mutex_unlock(&lock->mutex); } EXPORT_SYMBOL(drm_modeset_unlock); /** * drm_modeset_lock_all_ctx - take all modeset locks * @dev: DRM device * @ctx: lock acquisition context * * This function takes all modeset locks, suitable where a more fine-grained * scheme isn't (yet) implemented. * * Unlike drm_modeset_lock_all(), it doesn't take the &drm_mode_config.mutex * since that lock isn't required for modeset state changes. Callers which * need to grab that lock too need to do so outside of the acquire context * @ctx. * * Locks acquired with this function should be released by calling the * drm_modeset_drop_locks() function on @ctx. * * See also: DRM_MODESET_LOCK_ALL_BEGIN() and DRM_MODESET_LOCK_ALL_END() * * Returns: 0 on success or a negative error-code on failure. */ int drm_modeset_lock_all_ctx(struct drm_device *dev, struct drm_modeset_acquire_ctx *ctx) { struct drm_private_obj *privobj; struct drm_crtc *crtc; struct drm_plane *plane; int ret; ret = drm_modeset_lock(&dev->mode_config.connection_mutex, ctx); if (ret) return ret; drm_for_each_crtc(crtc, dev) { ret = drm_modeset_lock(&crtc->mutex, ctx); if (ret) return ret; } drm_for_each_plane(plane, dev) { ret = drm_modeset_lock(&plane->mutex, ctx); if (ret) return ret; } drm_for_each_privobj(privobj, dev) { ret = drm_modeset_lock(&privobj->lock, ctx); if (ret) return ret; } return 0; } EXPORT_SYMBOL(drm_modeset_lock_all_ctx); |
| 6 20 1 5 5 5 5 5 5 10 10 1 1 1 1 1 4 1 1 1 1 4 2 2 1 1 1 1 1 1 6 3 3 3 4 3 1 3 3 1 2 2 1 2 1 1 3 4 4 4 4 4 2 3 4 2 4 2 6 6 4 6 3 2 1 3 3 3 3 2 3 6 14 14 14 4 4 4 6 6 3 3 13 3 1 15 5 2 3 3 3 6 4 4 1 3 3 14 13 12 12 1 11 11 11 3 10 10 10 10 10 1 1 1 1 1 1 1 1 1 9 10 6 7 7 10 11 3 14 16 16 16 10 7 2 2 5 7 4 7 2 1 2 9 5 4 3 2 1 1 15 14 14 4 3 1 10 12 12 6 2 6 9 9 4 9 4 9 4 3 4 1 3 3 3 5 5 5 5 5 5 4 4 4 8 1 1 7 14 15 15 15 15 15 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 | // SPDX-License-Identifier: GPL-2.0 /* * linux/ipc/msg.c * Copyright (C) 1992 Krishna Balasubramanian * * Removed all the remaining kerneld mess * Catch the -EFAULT stuff properly * Use GFP_KERNEL for messages as in 1.2 * Fixed up the unchecked user space derefs * Copyright (C) 1998 Alan Cox & Andi Kleen * * /proc/sysvipc/msg support (c) 1999 Dragos Acostachioaie <dragos@iname.com> * * mostly rewritten, threaded and wake-one semantics added * MSGMAX limit removed, sysctl's added * (c) 1999 Manfred Spraul <manfred@colorfullife.com> * * support for audit of ipc object properties and permission changes * Dustin Kirkland <dustin.kirkland@us.ibm.com> * * namespaces support * OpenVZ, SWsoft Inc. * Pavel Emelianov <xemul@openvz.org> */ #include <linux/capability.h> #include <linux/msg.h> #include <linux/spinlock.h> #include <linux/init.h> #include <linux/mm.h> #include <linux/proc_fs.h> #include <linux/list.h> #include <linux/security.h> #include <linux/sched/wake_q.h> #include <linux/syscalls.h> #include <linux/audit.h> #include <linux/seq_file.h> #include <linux/rwsem.h> #include <linux/nsproxy.h> #include <linux/ipc_namespace.h> #include <linux/rhashtable.h> #include <linux/percpu_counter.h> #include <asm/current.h> #include <linux/uaccess.h> #include "util.h" /* one msq_queue structure for each present queue on the system */ struct msg_queue { struct kern_ipc_perm q_perm; time64_t q_stime; /* last msgsnd time */ time64_t q_rtime; /* last msgrcv time */ time64_t q_ctime; /* last change time */ unsigned long q_cbytes; /* current number of bytes on queue */ unsigned long q_qnum; /* number of messages in queue */ unsigned long q_qbytes; /* max number of bytes on queue */ struct pid *q_lspid; /* pid of last msgsnd */ struct pid *q_lrpid; /* last receive pid */ struct list_head q_messages; struct list_head q_receivers; struct list_head q_senders; } __randomize_layout; /* * MSG_BARRIER Locking: * * Similar to the optimization used in ipc/mqueue.c, one syscall return path * does not acquire any locks when it sees that a message exists in * msg_receiver.r_msg. Therefore r_msg is set using smp_store_release() * and accessed using READ_ONCE()+smp_acquire__after_ctrl_dep(). In addition, * wake_q_add_safe() is used. See ipc/mqueue.c for more details */ /* one msg_receiver structure for each sleeping receiver */ struct msg_receiver { struct list_head r_list; struct task_struct *r_tsk; int r_mode; long r_msgtype; long r_maxsize; struct msg_msg *r_msg; }; /* one msg_sender for each sleeping sender */ struct msg_sender { struct list_head list; struct task_struct *tsk; size_t msgsz; }; #define SEARCH_ANY 1 #define SEARCH_EQUAL 2 #define SEARCH_NOTEQUAL 3 #define SEARCH_LESSEQUAL 4 #define SEARCH_NUMBER 5 #define msg_ids(ns) ((ns)->ids[IPC_MSG_IDS]) static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id) { struct kern_ipc_perm *ipcp = ipc_obtain_object_idr(&msg_ids(ns), id); if (IS_ERR(ipcp)) return ERR_CAST(ipcp); return container_of(ipcp, struct msg_queue, q_perm); } static inline struct msg_queue *msq_obtain_object_check(struct ipc_namespace *ns, int id) { struct kern_ipc_perm *ipcp = ipc_obtain_object_check(&msg_ids(ns), id); if (IS_ERR(ipcp)) return ERR_CAST(ipcp); return container_of(ipcp, struct msg_queue, q_perm); } static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s) { ipc_rmid(&msg_ids(ns), &s->q_perm); } static void msg_rcu_free(struct rcu_head *head) { struct kern_ipc_perm *p = container_of(head, struct kern_ipc_perm, rcu); struct msg_queue *msq = container_of(p, struct msg_queue, q_perm); security_msg_queue_free(&msq->q_perm); kfree(msq); } /** * newque - Create a new msg queue * @ns: namespace * @params: ptr to the structure that contains the key and msgflg * * Called with msg_ids.rwsem held (writer) */ static int newque(struct ipc_namespace *ns, struct ipc_params *params) { struct msg_queue *msq; int retval; key_t key = params->key; int msgflg = params->flg; msq = kmalloc(sizeof(*msq), GFP_KERNEL_ACCOUNT); if (unlikely(!msq)) return -ENOMEM; msq->q_perm.mode = msgflg & S_IRWXUGO; msq->q_perm.key = key; msq->q_perm.security = NULL; retval = security_msg_queue_alloc(&msq->q_perm); if (retval) { kfree(msq); return retval; } msq->q_stime = msq->q_rtime = 0; msq->q_ctime = ktime_get_real_seconds(); msq->q_cbytes = msq->q_qnum = 0; msq->q_qbytes = ns->msg_ctlmnb; msq->q_lspid = msq->q_lrpid = NULL; INIT_LIST_HEAD(&msq->q_messages); INIT_LIST_HEAD(&msq->q_receivers); INIT_LIST_HEAD(&msq->q_senders); /* ipc_addid() locks msq upon success. */ retval = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni); if (retval < 0) { ipc_rcu_putref(&msq->q_perm, msg_rcu_free); return retval; } ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); return msq->q_perm.id; } static inline bool msg_fits_inqueue(struct msg_queue *msq, size_t msgsz) { return msgsz + msq->q_cbytes <= msq->q_qbytes && 1 + msq->q_qnum <= msq->q_qbytes; } static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss, size_t msgsz) { mss->tsk = current; mss->msgsz = msgsz; /* * No memory barrier required: we did ipc_lock_object(), * and the waker obtains that lock before calling wake_q_add(). */ __set_current_state(TASK_INTERRUPTIBLE); list_add_tail(&mss->list, &msq->q_senders); } static inline void ss_del(struct msg_sender *mss) { if (mss->list.next) list_del(&mss->list); } static void ss_wakeup(struct msg_queue *msq, struct wake_q_head *wake_q, bool kill) { struct msg_sender *mss, *t; struct task_struct *stop_tsk = NULL; struct list_head *h = &msq->q_senders; list_for_each_entry_safe(mss, t, h, list) { if (kill) mss->list.next = NULL; /* * Stop at the first task we don't wakeup, * we've already iterated the original * sender queue. */ else if (stop_tsk == mss->tsk) break; /* * We are not in an EIDRM scenario here, therefore * verify that we really need to wakeup the task. * To maintain current semantics and wakeup order, * move the sender to the tail on behalf of the * blocked task. */ else if (!msg_fits_inqueue(msq, mss->msgsz)) { if (!stop_tsk) stop_tsk = mss->tsk; list_move_tail(&mss->list, &msq->q_senders); continue; } wake_q_add(wake_q, mss->tsk); } } static void expunge_all(struct msg_queue *msq, int res, struct wake_q_head *wake_q) { struct msg_receiver *msr, *t; list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) { struct task_struct *r_tsk; r_tsk = get_task_struct(msr->r_tsk); /* see MSG_BARRIER for purpose/pairing */ smp_store_release(&msr->r_msg, ERR_PTR(res)); wake_q_add_safe(wake_q, r_tsk); } } /* * freeque() wakes up waiters on the sender and receiver waiting queue, * removes the message queue from message queue ID IDR, and cleans up all the * messages associated with this queue. * * msg_ids.rwsem (writer) and the spinlock for this message queue are held * before freeque() is called. msg_ids.rwsem remains locked on exit. */ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) __releases(RCU) __releases(&msq->q_perm) { struct msg_msg *msg, *t; struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm); DEFINE_WAKE_Q(wake_q); expunge_all(msq, -EIDRM, &wake_q); ss_wakeup(msq, &wake_q, true); msg_rmid(ns, msq); ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { percpu_counter_sub_local(&ns->percpu_msg_hdrs, 1); free_msg(msg); } percpu_counter_sub_local(&ns->percpu_msg_bytes, msq->q_cbytes); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); } long ksys_msgget(key_t key, int msgflg) { struct ipc_namespace *ns; static const struct ipc_ops msg_ops = { .getnew = newque, .associate = security_msg_queue_associate, }; struct ipc_params msg_params; ns = current->nsproxy->ipc_ns; msg_params.key = key; msg_params.flg = msgflg; return ipcget(ns, &msg_ids(ns), &msg_ops, &msg_params); } SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg) { return ksys_msgget(key, msgflg); } static inline unsigned long copy_msqid_to_user(void __user *buf, struct msqid64_ds *in, int version) { switch (version) { case IPC_64: return copy_to_user(buf, in, sizeof(*in)); case IPC_OLD: { struct msqid_ds out; memset(&out, 0, sizeof(out)); ipc64_perm_to_ipc_perm(&in->msg_perm, &out.msg_perm); out.msg_stime = in->msg_stime; out.msg_rtime = in->msg_rtime; out.msg_ctime = in->msg_ctime; if (in->msg_cbytes > USHRT_MAX) out.msg_cbytes = USHRT_MAX; else out.msg_cbytes = in->msg_cbytes; out.msg_lcbytes = in->msg_cbytes; if (in->msg_qnum > USHRT_MAX) out.msg_qnum = USHRT_MAX; else out.msg_qnum = in->msg_qnum; if (in->msg_qbytes > USHRT_MAX) out.msg_qbytes = USHRT_MAX; else out.msg_qbytes = in->msg_qbytes; out.msg_lqbytes = in->msg_qbytes; out.msg_lspid = in->msg_lspid; out.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &out, sizeof(out)); } default: return -EINVAL; } } static inline unsigned long copy_msqid_from_user(struct msqid64_ds *out, void __user *buf, int version) { switch (version) { case IPC_64: if (copy_from_user(out, buf, sizeof(*out))) return -EFAULT; return 0; case IPC_OLD: { struct msqid_ds tbuf_old; if (copy_from_user(&tbuf_old, buf, sizeof(tbuf_old))) return -EFAULT; out->msg_perm.uid = tbuf_old.msg_perm.uid; out->msg_perm.gid = tbuf_old.msg_perm.gid; out->msg_perm.mode = tbuf_old.msg_perm.mode; if (tbuf_old.msg_qbytes == 0) out->msg_qbytes = tbuf_old.msg_lqbytes; else out->msg_qbytes = tbuf_old.msg_qbytes; return 0; } default: return -EINVAL; } } /* * This function handles some msgctl commands which require the rwsem * to be held in write mode. * NOTE: no locks must be held, the rwsem is taken inside this function. */ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd, struct ipc64_perm *perm, int msg_qbytes) { struct kern_ipc_perm *ipcp; struct msg_queue *msq; int err; down_write(&msg_ids(ns).rwsem); rcu_read_lock(); ipcp = ipcctl_obtain_check(ns, &msg_ids(ns), msqid, cmd, perm, msg_qbytes); if (IS_ERR(ipcp)) { err = PTR_ERR(ipcp); goto out_unlock1; } msq = container_of(ipcp, struct msg_queue, q_perm); err = security_msg_queue_msgctl(&msq->q_perm, cmd); if (err) goto out_unlock1; switch (cmd) { case IPC_RMID: ipc_lock_object(&msq->q_perm); /* freeque unlocks the ipc object and rcu */ freeque(ns, ipcp); goto out_up; case IPC_SET: { DEFINE_WAKE_Q(wake_q); if (msg_qbytes > ns->msg_ctlmnb && !capable(CAP_SYS_RESOURCE)) { err = -EPERM; goto out_unlock1; } ipc_lock_object(&msq->q_perm); err = ipc_update_perm(perm, ipcp); if (err) goto out_unlock0; msq->q_qbytes = msg_qbytes; msq->q_ctime = ktime_get_real_seconds(); /* * Sleeping receivers might be excluded by * stricter permissions. */ expunge_all(msq, -EAGAIN, &wake_q); /* * Sleeping senders might be able to send * due to a larger queue size. */ ss_wakeup(msq, &wake_q, false); ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); goto out_unlock1; } default: err = -EINVAL; goto out_unlock1; } out_unlock0: ipc_unlock_object(&msq->q_perm); out_unlock1: rcu_read_unlock(); out_up: up_write(&msg_ids(ns).rwsem); return err; } static int msgctl_info(struct ipc_namespace *ns, int msqid, int cmd, struct msginfo *msginfo) { int err; int max_idx; /* * We must not return kernel stack data. * due to padding, it's not enough * to set all member fields. */ err = security_msg_queue_msgctl(NULL, cmd); if (err) return err; memset(msginfo, 0, sizeof(*msginfo)); msginfo->msgmni = ns->msg_ctlmni; msginfo->msgmax = ns->msg_ctlmax; msginfo->msgmnb = ns->msg_ctlmnb; msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; max_idx = ipc_get_maxidx(&msg_ids(ns)); up_read(&msg_ids(ns).rwsem); if (cmd == MSG_INFO) { msginfo->msgmap = min_t(int, percpu_counter_sum(&ns->percpu_msg_hdrs), INT_MAX); msginfo->msgtql = min_t(int, percpu_counter_sum(&ns->percpu_msg_bytes), INT_MAX); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } return (max_idx < 0) ? 0 : max_idx; } static int msgctl_stat(struct ipc_namespace *ns, int msqid, int cmd, struct msqid64_ds *p) { struct msg_queue *msq; int err; memset(p, 0, sizeof(*p)); rcu_read_lock(); if (cmd == MSG_STAT || cmd == MSG_STAT_ANY) { msq = msq_obtain_object(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock; } } else { /* IPC_STAT */ msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock; } } /* see comment for SHM_STAT_ANY */ if (cmd == MSG_STAT_ANY) audit_ipc_obj(&msq->q_perm); else { err = -EACCES; if (ipcperms(ns, &msq->q_perm, S_IRUGO)) goto out_unlock; } err = security_msg_queue_msgctl(&msq->q_perm, cmd); if (err) goto out_unlock; ipc_lock_object(&msq->q_perm); if (!ipc_valid_object(&msq->q_perm)) { ipc_unlock_object(&msq->q_perm); err = -EIDRM; goto out_unlock; } kernel_to_ipc64_perm(&msq->q_perm, &p->msg_perm); p->msg_stime = msq->q_stime; p->msg_rtime = msq->q_rtime; p->msg_ctime = msq->q_ctime; #ifndef CONFIG_64BIT p->msg_stime_high = msq->q_stime >> 32; p->msg_rtime_high = msq->q_rtime >> 32; p->msg_ctime_high = msq->q_ctime >> 32; #endif p->msg_cbytes = msq->q_cbytes; p->msg_qnum = msq->q_qnum; p->msg_qbytes = msq->q_qbytes; p->msg_lspid = pid_vnr(msq->q_lspid); p->msg_lrpid = pid_vnr(msq->q_lrpid); if (cmd == IPC_STAT) { /* * As defined in SUS: * Return 0 on success */ err = 0; } else { /* * MSG_STAT and MSG_STAT_ANY (both Linux specific) * Return the full id, including the sequence number */ err = msq->q_perm.id; } ipc_unlock_object(&msq->q_perm); out_unlock: rcu_read_unlock(); return err; } static long ksys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf, int version) { struct ipc_namespace *ns; struct msqid64_ds msqid64; int err; if (msqid < 0 || cmd < 0) return -EINVAL; ns = current->nsproxy->ipc_ns; switch (cmd) { case IPC_INFO: case MSG_INFO: { struct msginfo msginfo; err = msgctl_info(ns, msqid, cmd, &msginfo); if (err < 0) return err; if (copy_to_user(buf, &msginfo, sizeof(struct msginfo))) err = -EFAULT; return err; } case MSG_STAT: /* msqid is an index rather than a msg queue id */ case MSG_STAT_ANY: case IPC_STAT: err = msgctl_stat(ns, msqid, cmd, &msqid64); if (err < 0) return err; if (copy_msqid_to_user(buf, &msqid64, version)) err = -EFAULT; return err; case IPC_SET: if (copy_msqid_from_user(&msqid64, buf, version)) return -EFAULT; return msgctl_down(ns, msqid, cmd, &msqid64.msg_perm, msqid64.msg_qbytes); case IPC_RMID: return msgctl_down(ns, msqid, cmd, NULL, 0); default: return -EINVAL; } } SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf) { return ksys_msgctl(msqid, cmd, buf, IPC_64); } #ifdef CONFIG_ARCH_WANT_IPC_PARSE_VERSION long ksys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf) { int version = ipc_parse_version(&cmd); return ksys_msgctl(msqid, cmd, buf, version); } SYSCALL_DEFINE3(old_msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf) { return ksys_old_msgctl(msqid, cmd, buf); } #endif #ifdef CONFIG_COMPAT struct compat_msqid_ds { struct compat_ipc_perm msg_perm; compat_uptr_t msg_first; compat_uptr_t msg_last; old_time32_t msg_stime; old_time32_t msg_rtime; old_time32_t msg_ctime; compat_ulong_t msg_lcbytes; compat_ulong_t msg_lqbytes; unsigned short msg_cbytes; unsigned short msg_qnum; unsigned short msg_qbytes; compat_ipc_pid_t msg_lspid; compat_ipc_pid_t msg_lrpid; }; static int copy_compat_msqid_from_user(struct msqid64_ds *out, void __user *buf, int version) { memset(out, 0, sizeof(*out)); if (version == IPC_64) { struct compat_msqid64_ds __user *p = buf; if (get_compat_ipc64_perm(&out->msg_perm, &p->msg_perm)) return -EFAULT; if (get_user(out->msg_qbytes, &p->msg_qbytes)) return -EFAULT; } else { struct compat_msqid_ds __user *p = buf; if (get_compat_ipc_perm(&out->msg_perm, &p->msg_perm)) return -EFAULT; if (get_user(out->msg_qbytes, &p->msg_qbytes)) return -EFAULT; } return 0; } static int copy_compat_msqid_to_user(void __user *buf, struct msqid64_ds *in, int version) { if (version == IPC_64) { struct compat_msqid64_ds v; memset(&v, 0, sizeof(v)); to_compat_ipc64_perm(&v.msg_perm, &in->msg_perm); v.msg_stime = lower_32_bits(in->msg_stime); v.msg_stime_high = upper_32_bits(in->msg_stime); v.msg_rtime = lower_32_bits(in->msg_rtime); v.msg_rtime_high = upper_32_bits(in->msg_rtime); v.msg_ctime = lower_32_bits(in->msg_ctime); v.msg_ctime_high = upper_32_bits(in->msg_ctime); v.msg_cbytes = in->msg_cbytes; v.msg_qnum = in->msg_qnum; v.msg_qbytes = in->msg_qbytes; v.msg_lspid = in->msg_lspid; v.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &v, sizeof(v)); } else { struct compat_msqid_ds v; memset(&v, 0, sizeof(v)); to_compat_ipc_perm(&v.msg_perm, &in->msg_perm); v.msg_stime = in->msg_stime; v.msg_rtime = in->msg_rtime; v.msg_ctime = in->msg_ctime; v.msg_cbytes = in->msg_cbytes; v.msg_qnum = in->msg_qnum; v.msg_qbytes = in->msg_qbytes; v.msg_lspid = in->msg_lspid; v.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &v, sizeof(v)); } } static long compat_ksys_msgctl(int msqid, int cmd, void __user *uptr, int version) { struct ipc_namespace *ns; int err; struct msqid64_ds msqid64; ns = current->nsproxy->ipc_ns; if (msqid < 0 || cmd < 0) return -EINVAL; switch (cmd & (~IPC_64)) { case IPC_INFO: case MSG_INFO: { struct msginfo msginfo; err = msgctl_info(ns, msqid, cmd, &msginfo); if (err < 0) return err; if (copy_to_user(uptr, &msginfo, sizeof(struct msginfo))) err = -EFAULT; return err; } case IPC_STAT: case MSG_STAT: case MSG_STAT_ANY: err = msgctl_stat(ns, msqid, cmd, &msqid64); if (err < 0) return err; if (copy_compat_msqid_to_user(uptr, &msqid64, version)) err = -EFAULT; return err; case IPC_SET: if (copy_compat_msqid_from_user(&msqid64, uptr, version)) return -EFAULT; return msgctl_down(ns, msqid, cmd, &msqid64.msg_perm, msqid64.msg_qbytes); case IPC_RMID: return msgctl_down(ns, msqid, cmd, NULL, 0); default: return -EINVAL; } } COMPAT_SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, void __user *, uptr) { return compat_ksys_msgctl(msqid, cmd, uptr, IPC_64); } #ifdef CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION long compat_ksys_old_msgctl(int msqid, int cmd, void __user *uptr) { int version = compat_ipc_parse_version(&cmd); return compat_ksys_msgctl(msqid, cmd, uptr, version); } COMPAT_SYSCALL_DEFINE3(old_msgctl, int, msqid, int, cmd, void __user *, uptr) { return compat_ksys_old_msgctl(msqid, cmd, uptr); } #endif #endif static int testmsg(struct msg_msg *msg, long type, int mode) { switch (mode) { case SEARCH_ANY: case SEARCH_NUMBER: return 1; case SEARCH_LESSEQUAL: if (msg->m_type <= type) return 1; break; case SEARCH_EQUAL: if (msg->m_type == type) return 1; break; case SEARCH_NOTEQUAL: if (msg->m_type != type) return 1; break; } return 0; } static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg, struct wake_q_head *wake_q) { struct msg_receiver *msr, *t; list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) { if (testmsg(msg, msr->r_msgtype, msr->r_mode) && !security_msg_queue_msgrcv(&msq->q_perm, msg, msr->r_tsk, msr->r_msgtype, msr->r_mode)) { list_del(&msr->r_list); if (msr->r_maxsize < msg->m_ts) { wake_q_add(wake_q, msr->r_tsk); /* See expunge_all regarding memory barrier */ smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG)); } else { ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk)); msq->q_rtime = ktime_get_real_seconds(); wake_q_add(wake_q, msr->r_tsk); /* See expunge_all regarding memory barrier */ smp_store_release(&msr->r_msg, msg); return 1; } } } return 0; } static long do_msgsnd(int msqid, long mtype, void __user *mtext, size_t msgsz, int msgflg) { struct msg_queue *msq; struct msg_msg *msg; int err; struct ipc_namespace *ns; DEFINE_WAKE_Q(wake_q); ns = current->nsproxy->ipc_ns; if (msgsz > ns->msg_ctlmax || (long) msgsz < 0 || msqid < 0) return -EINVAL; if (mtype < 1) return -EINVAL; msg = load_msg(mtext, msgsz); if (IS_ERR(msg)) return PTR_ERR(msg); msg->m_type = mtype; msg->m_ts = msgsz; rcu_read_lock(); msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock1; } ipc_lock_object(&msq->q_perm); for (;;) { struct msg_sender s; err = -EACCES; if (ipcperms(ns, &msq->q_perm, S_IWUGO)) goto out_unlock0; /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } err = security_msg_queue_msgsnd(&msq->q_perm, msg, msgflg); if (err) goto out_unlock0; if (msg_fits_inqueue(msq, msgsz)) break; /* queue full, wait: */ if (msgflg & IPC_NOWAIT) { err = -EAGAIN; goto out_unlock0; } /* enqueue the sender and prepare to block */ ss_add(msq, &s, msgsz); if (!ipc_rcu_getref(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); schedule(); rcu_read_lock(); ipc_lock_object(&msq->q_perm); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } ss_del(&s); if (signal_pending(current)) { err = -ERESTARTNOHAND; goto out_unlock0; } } ipc_update_pid(&msq->q_lspid, task_tgid(current)); msq->q_stime = ktime_get_real_seconds(); if (!pipelined_send(msq, msg, &wake_q)) { /* no one is waiting for this message, enqueue it */ list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } err = 0; msg = NULL; out_unlock0: ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); out_unlock1: rcu_read_unlock(); if (msg != NULL) free_msg(msg); return err; } long ksys_msgsnd(int msqid, struct msgbuf __user *msgp, size_t msgsz, int msgflg) { long mtype; if (get_user(mtype, &msgp->mtype)) return -EFAULT; return do_msgsnd(msqid, mtype, msgp->mtext, msgsz, msgflg); } SYSCALL_DEFINE4(msgsnd, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, int, msgflg) { return ksys_msgsnd(msqid, msgp, msgsz, msgflg); } #ifdef CONFIG_COMPAT struct compat_msgbuf { compat_long_t mtype; char mtext[]; }; long compat_ksys_msgsnd(int msqid, compat_uptr_t msgp, compat_ssize_t msgsz, int msgflg) { struct compat_msgbuf __user *up = compat_ptr(msgp); compat_long_t mtype; if (get_user(mtype, &up->mtype)) return -EFAULT; return do_msgsnd(msqid, mtype, up->mtext, (ssize_t)msgsz, msgflg); } COMPAT_SYSCALL_DEFINE4(msgsnd, int, msqid, compat_uptr_t, msgp, compat_ssize_t, msgsz, int, msgflg) { return compat_ksys_msgsnd(msqid, msgp, msgsz, msgflg); } #endif static inline int convert_mode(long *msgtyp, int msgflg) { if (msgflg & MSG_COPY) return SEARCH_NUMBER; /* * find message of correct type. * msgtyp = 0 => get first. * msgtyp > 0 => get first message of matching type. * msgtyp < 0 => get message with least type must be < abs(msgtype). */ if (*msgtyp == 0) return SEARCH_ANY; if (*msgtyp < 0) { if (*msgtyp == LONG_MIN) /* -LONG_MIN is undefined */ *msgtyp = LONG_MAX; else *msgtyp = -*msgtyp; return SEARCH_LESSEQUAL; } if (msgflg & MSG_EXCEPT) return SEARCH_NOTEQUAL; return SEARCH_EQUAL; } static long do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz) { struct msgbuf __user *msgp = dest; size_t msgsz; if (put_user(msg->m_type, &msgp->mtype)) return -EFAULT; msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz; if (store_msg(msgp->mtext, msg, msgsz)) return -EFAULT; return msgsz; } #ifdef CONFIG_CHECKPOINT_RESTORE /* * This function creates new kernel message structure, large enough to store * bufsz message bytes. */ static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz) { struct msg_msg *copy; /* * Create dummy message to copy real message to. */ copy = load_msg(buf, bufsz); if (!IS_ERR(copy)) copy->m_ts = bufsz; return copy; } static inline void free_copy(struct msg_msg *copy) { if (copy) free_msg(copy); } #else static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz) { return ERR_PTR(-ENOSYS); } static inline void free_copy(struct msg_msg *copy) { } #endif static struct msg_msg *find_msg(struct msg_queue *msq, long *msgtyp, int mode) { struct msg_msg *msg, *found = NULL; long count = 0; list_for_each_entry(msg, &msq->q_messages, m_list) { if (testmsg(msg, *msgtyp, mode) && !security_msg_queue_msgrcv(&msq->q_perm, msg, current, *msgtyp, mode)) { if (mode == SEARCH_LESSEQUAL && msg->m_type != 1) { *msgtyp = msg->m_type - 1; found = msg; } else if (mode == SEARCH_NUMBER) { if (*msgtyp == count) return msg; } else return msg; count++; } } return found ?: ERR_PTR(-EAGAIN); } static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgflg, long (*msg_handler)(void __user *, struct msg_msg *, size_t)) { int mode; struct msg_queue *msq; struct ipc_namespace *ns; struct msg_msg *msg, *copy = NULL; DEFINE_WAKE_Q(wake_q); ns = current->nsproxy->ipc_ns; if (msqid < 0 || (long) bufsz < 0) return -EINVAL; if (msgflg & MSG_COPY) { if ((msgflg & MSG_EXCEPT) || !(msgflg & IPC_NOWAIT)) return -EINVAL; copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax)); if (IS_ERR(copy)) return PTR_ERR(copy); } mode = convert_mode(&msgtyp, msgflg); rcu_read_lock(); msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { rcu_read_unlock(); free_copy(copy); return PTR_ERR(msq); } for (;;) { struct msg_receiver msr_d; msg = ERR_PTR(-EACCES); if (ipcperms(ns, &msq->q_perm, S_IRUGO)) goto out_unlock1; ipc_lock_object(&msq->q_perm); /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { msg = ERR_PTR(-EIDRM); goto out_unlock0; } msg = find_msg(msq, &msgtyp, mode); if (!IS_ERR(msg)) { /* * Found a suitable message. * Unlink it from the queue. */ if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) { msg = ERR_PTR(-E2BIG); goto out_unlock0; } /* * If we are copying, then do not unlink message and do * not update queue parameters. */ if (msgflg & MSG_COPY) { msg = copy_msg(msg, copy); goto out_unlock0; } list_del(&msg->m_list); msq->q_qnum--; msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; percpu_counter_sub_local(&ns->percpu_msg_bytes, msg->m_ts); percpu_counter_sub_local(&ns->percpu_msg_hdrs, 1); ss_wakeup(msq, &wake_q, false); goto out_unlock0; } /* No message waiting. Wait for a message */ if (msgflg & IPC_NOWAIT) { msg = ERR_PTR(-ENOMSG); goto out_unlock0; } list_add_tail(&msr_d.r_list, &msq->q_receivers); msr_d.r_tsk = current; msr_d.r_msgtype = msgtyp; msr_d.r_mode = mode; if (msgflg & MSG_NOERROR) msr_d.r_maxsize = INT_MAX; else msr_d.r_maxsize = bufsz; /* memory barrier not require due to ipc_lock_object() */ WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN)); /* memory barrier not required, we own ipc_lock_object() */ __set_current_state(TASK_INTERRUPTIBLE); ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); schedule(); /* * Lockless receive, part 1: * We don't hold a reference to the queue and getting a * reference would defeat the idea of a lockless operation, * thus the code relies on rcu to guarantee the existence of * msq: * Prior to destruction, expunge_all(-EIRDM) changes r_msg. * Thus if r_msg is -EAGAIN, then the queue not yet destroyed. */ rcu_read_lock(); /* * Lockless receive, part 2: * The work in pipelined_send() and expunge_all(): * - Set pointer to message * - Queue the receiver task for later wakeup * - Wake up the process after the lock is dropped. * * Should the process wake up before this wakeup (due to a * signal) it will either see the message and continue ... */ msg = READ_ONCE(msr_d.r_msg); if (msg != ERR_PTR(-EAGAIN)) { /* see MSG_BARRIER for purpose/pairing */ smp_acquire__after_ctrl_dep(); goto out_unlock1; } /* * ... or see -EAGAIN, acquire the lock to check the message * again. */ ipc_lock_object(&msq->q_perm); msg = READ_ONCE(msr_d.r_msg); if (msg != ERR_PTR(-EAGAIN)) goto out_unlock0; list_del(&msr_d.r_list); if (signal_pending(current)) { msg = ERR_PTR(-ERESTARTNOHAND); goto out_unlock0; } ipc_unlock_object(&msq->q_perm); } out_unlock0: ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); out_unlock1: rcu_read_unlock(); if (IS_ERR(msg)) { free_copy(copy); return PTR_ERR(msg); } bufsz = msg_handler(buf, msg, bufsz); free_msg(msg); return bufsz; } long ksys_msgrcv(int msqid, struct msgbuf __user *msgp, size_t msgsz, long msgtyp, int msgflg) { return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill); } SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, long, msgtyp, int, msgflg) { return ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg); } #ifdef CONFIG_COMPAT static long compat_do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz) { struct compat_msgbuf __user *msgp = dest; size_t msgsz; if (put_user(msg->m_type, &msgp->mtype)) return -EFAULT; msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz; if (store_msg(msgp->mtext, msg, msgsz)) return -EFAULT; return msgsz; } long compat_ksys_msgrcv(int msqid, compat_uptr_t msgp, compat_ssize_t msgsz, compat_long_t msgtyp, int msgflg) { return do_msgrcv(msqid, compat_ptr(msgp), (ssize_t)msgsz, (long)msgtyp, msgflg, compat_do_msg_fill); } COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp, compat_ssize_t, msgsz, compat_long_t, msgtyp, int, msgflg) { return compat_ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg); } #endif int msg_init_ns(struct ipc_namespace *ns) { int ret; ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; ret = percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); if (ret) goto fail_msg_bytes; ret = percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); if (ret) goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); return 0; fail_msg_hdrs: percpu_counter_destroy(&ns->percpu_msg_bytes); fail_msg_bytes: return ret; } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); percpu_counter_destroy(&ns->percpu_msg_bytes); percpu_counter_destroy(&ns->percpu_msg_hdrs); } #endif #ifdef CONFIG_PROC_FS static int sysvipc_msg_proc_show(struct seq_file *s, void *it) { struct pid_namespace *pid_ns = ipc_seq_pid_ns(s); struct user_namespace *user_ns = seq_user_ns(s); struct kern_ipc_perm *ipcp = it; struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm); seq_printf(s, "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10llu %10llu %10llu\n", msq->q_perm.key, msq->q_perm.id, msq->q_perm.mode, msq->q_cbytes, msq->q_qnum, pid_nr_ns(msq->q_lspid, pid_ns), pid_nr_ns(msq->q_lrpid, pid_ns), from_kuid_munged(user_ns, msq->q_perm.uid), from_kgid_munged(user_ns, msq->q_perm.gid), from_kuid_munged(user_ns, msq->q_perm.cuid), from_kgid_munged(user_ns, msq->q_perm.cgid), msq->q_stime, msq->q_rtime, msq->q_ctime); return 0; } #endif void __init msg_init(void) { msg_init_ns(&init_ipc_ns); ipc_init_proc_interface("sysvipc/msg", " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n", IPC_MSG_IDS, sysvipc_msg_proc_show); } |
| 1 1 2 2 1 1 4 3 2 2 2 2 2 4 5 4 4 2 2 3 3 3 4 3 1 2 3 3 4 3 4 2 1 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 | // SPDX-License-Identifier: GPL-2.0 /* * linux/drivers/char/mem.c * * Copyright (C) 1991, 1992 Linus Torvalds * * Added devfs support. * Jan-11-1998, C. Scott Ananian <cananian@alumni.princeton.edu> * Shared /dev/zero mmapping support, Feb 2000, Kanoj Sarcar <kanoj@sgi.com> */ #include <linux/mm.h> #include <linux/miscdevice.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/mman.h> #include <linux/random.h> #include <linux/init.h> #include <linux/tty.h> #include <linux/capability.h> #include <linux/ptrace.h> #include <linux/device.h> #include <linux/highmem.h> #include <linux/backing-dev.h> #include <linux/shmem_fs.h> #include <linux/splice.h> #include <linux/pfn.h> #include <linux/export.h> #include <linux/io.h> #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> #define DEVMEM_MINOR 1 #define DEVPORT_MINOR 4 static inline unsigned long size_inside_page(unsigned long start, unsigned long size) { unsigned long sz; sz = PAGE_SIZE - (start & (PAGE_SIZE - 1)); return min(sz, size); } #ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE static inline int valid_phys_addr_range(phys_addr_t addr, size_t count) { return addr + count <= __pa(high_memory); } static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size) { return 1; } #endif #ifdef CONFIG_STRICT_DEVMEM static inline int page_is_allowed(unsigned long pfn) { return devmem_is_allowed(pfn); } #else static inline int page_is_allowed(unsigned long pfn) { return 1; } #endif static inline bool should_stop_iteration(void) { if (need_resched()) cond_resched(); return signal_pending(current); } /* * This funcion reads the *physical* memory. The f_pos points directly to the * memory location. */ static ssize_t read_mem(struct file *file, char __user *buf, size_t count, loff_t *ppos) { phys_addr_t p = *ppos; ssize_t read, sz; void *ptr; char *bounce; int err; if (p != *ppos) return 0; if (!valid_phys_addr_range(p, count)) return -EFAULT; read = 0; #ifdef __ARCH_HAS_NO_PAGE_ZERO_MAPPED /* we don't have page 0 mapped on sparc and m68k.. */ if (p < PAGE_SIZE) { sz = size_inside_page(p, count); if (sz > 0) { if (clear_user(buf, sz)) return -EFAULT; buf += sz; p += sz; count -= sz; read += sz; } } #endif bounce = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!bounce) return -ENOMEM; while (count > 0) { unsigned long remaining; int allowed, probe; sz = size_inside_page(p, count); err = -EPERM; allowed = page_is_allowed(p >> PAGE_SHIFT); if (!allowed) goto failed; err = -EFAULT; if (allowed == 2) { /* Show zeros for restricted memory. */ remaining = clear_user(buf, sz); } else { /* * On ia64 if a page has been mapped somewhere as * uncached, then it must also be accessed uncached * by the kernel or data corruption may occur. */ ptr = xlate_dev_mem_ptr(p); if (!ptr) goto failed; probe = copy_from_kernel_nofault(bounce, ptr, sz); unxlate_dev_mem_ptr(p, ptr); if (probe) goto failed; remaining = copy_to_user(buf, bounce, sz); } if (remaining) goto failed; buf += sz; p += sz; count -= sz; read += sz; if (should_stop_iteration()) break; } kfree(bounce); *ppos += read; return read; failed: kfree(bounce); return err; } static ssize_t write_mem(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { phys_addr_t p = *ppos; ssize_t written, sz; unsigned long copied; void *ptr; if (p != *ppos) return -EFBIG; if (!valid_phys_addr_range(p, count)) return -EFAULT; written = 0; #ifdef __ARCH_HAS_NO_PAGE_ZERO_MAPPED /* we don't have page 0 mapped on sparc and m68k.. */ if (p < PAGE_SIZE) { sz = size_inside_page(p, count); /* Hmm. Do something? */ buf += sz; p += sz; count -= sz; written += sz; } #endif while (count > 0) { int allowed; sz = size_inside_page(p, count); allowed = page_is_allowed(p >> PAGE_SHIFT); if (!allowed) return -EPERM; /* Skip actual writing when a page is marked as restricted. */ if (allowed == 1) { /* * On ia64 if a page has been mapped somewhere as * uncached, then it must also be accessed uncached * by the kernel or data corruption may occur. */ ptr = xlate_dev_mem_ptr(p); if (!ptr) { if (written) break; return -EFAULT; } copied = copy_from_user(ptr, buf, sz); unxlate_dev_mem_ptr(p, ptr); if (copied) { written += sz - copied; if (written) break; return -EFAULT; } } buf += sz; p += sz; count -= sz; written += sz; if (should_stop_iteration()) break; } *ppos += written; return written; } int __weak phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, unsigned long size, pgprot_t *vma_prot) { return 1; } #ifndef __HAVE_PHYS_MEM_ACCESS_PROT /* * Architectures vary in how they handle caching for addresses * outside of main memory. * */ #ifdef pgprot_noncached static int uncached_access(struct file *file, phys_addr_t addr) { /* * Accessing memory above the top the kernel knows about or through a * file pointer * that was marked O_DSYNC will be done non-cached. */ if (file->f_flags & O_DSYNC) return 1; return addr >= __pa(high_memory); } #endif static pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot) { #ifdef pgprot_noncached phys_addr_t offset = pfn << PAGE_SHIFT; if (uncached_access(file, offset)) return pgprot_noncached(vma_prot); #endif return vma_prot; } #endif #ifndef CONFIG_MMU static unsigned long get_unmapped_area_mem(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { if (!valid_mmap_phys_addr_range(pgoff, len)) return (unsigned long) -EINVAL; return pgoff << PAGE_SHIFT; } /* permit direct mmap, for read, write or exec */ static unsigned memory_mmap_capabilities(struct file *file) { return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE | NOMMU_MAP_EXEC; } static unsigned zero_mmap_capabilities(struct file *file) { return NOMMU_MAP_COPY; } /* can't do an in-place private mapping if there's no MMU */ static inline int private_mapping_ok(struct vm_area_struct *vma) { return is_nommu_shared_mapping(vma->vm_flags); } #else static inline int private_mapping_ok(struct vm_area_struct *vma) { return 1; } #endif static const struct vm_operations_struct mmap_mem_ops = { #ifdef CONFIG_HAVE_IOREMAP_PROT .access = generic_access_phys #endif }; static int mmap_mem(struct file *file, struct vm_area_struct *vma) { size_t size = vma->vm_end - vma->vm_start; phys_addr_t offset = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT; /* Does it even fit in phys_addr_t? */ if (offset >> PAGE_SHIFT != vma->vm_pgoff) return -EINVAL; /* It's illegal to wrap around the end of the physical address space. */ if (offset + (phys_addr_t)size - 1 < offset) return -EINVAL; if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size)) return -EINVAL; if (!private_mapping_ok(vma)) return -ENOSYS; if (!range_is_allowed(vma->vm_pgoff, size)) return -EPERM; if (!phys_mem_access_prot_allowed(file, vma->vm_pgoff, size, &vma->vm_page_prot)) return -EINVAL; vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff, size, vma->vm_page_prot); vma->vm_ops = &mmap_mem_ops; /* Remap-pfn-range will mark the range VM_IO */ if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size, vma->vm_page_prot)) { return -EAGAIN; } return 0; } #ifdef CONFIG_DEVPORT static ssize_t read_port(struct file *file, char __user *buf, size_t count, loff_t *ppos) { unsigned long i = *ppos; char __user *tmp = buf; if (!access_ok(buf, count)) return -EFAULT; while (count-- > 0 && i < 65536) { if (__put_user(inb(i), tmp) < 0) return -EFAULT; i++; tmp++; } *ppos = i; return tmp-buf; } static ssize_t write_port(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { unsigned long i = *ppos; const char __user *tmp = buf; if (!access_ok(buf, count)) return -EFAULT; while (count-- > 0 && i < 65536) { char c; if (__get_user(c, tmp)) { if (tmp > buf) break; return -EFAULT; } outb(c, i); i++; tmp++; } *ppos = i; return tmp-buf; } #endif static ssize_t read_null(struct file *file, char __user *buf, size_t count, loff_t *ppos) { return 0; } static ssize_t write_null(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { return count; } static ssize_t read_iter_null(struct kiocb *iocb, struct iov_iter *to) { return 0; } static ssize_t write_iter_null(struct kiocb *iocb, struct iov_iter *from) { size_t count = iov_iter_count(from); iov_iter_advance(from, count); return count; } static int pipe_to_null(struct pipe_inode_info *info, struct pipe_buffer *buf, struct splice_desc *sd) { return sd->len; } static ssize_t splice_write_null(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_null); } static int uring_cmd_null(struct io_uring_cmd *ioucmd, unsigned int issue_flags) { return 0; } static ssize_t read_iter_zero(struct kiocb *iocb, struct iov_iter *iter) { size_t written = 0; while (iov_iter_count(iter)) { size_t chunk = iov_iter_count(iter), n; if (chunk > PAGE_SIZE) chunk = PAGE_SIZE; /* Just for latency reasons */ n = iov_iter_zero(chunk, iter); if (!n && iov_iter_count(iter)) return written ? written : -EFAULT; written += n; if (signal_pending(current)) return written ? written : -ERESTARTSYS; if (!need_resched()) continue; if (iocb->ki_flags & IOCB_NOWAIT) return written ? written : -EAGAIN; cond_resched(); } return written; } static ssize_t read_zero(struct file *file, char __user *buf, size_t count, loff_t *ppos) { size_t cleared = 0; while (count) { size_t chunk = min_t(size_t, count, PAGE_SIZE); size_t left; left = clear_user(buf + cleared, chunk); if (unlikely(left)) { cleared += (chunk - left); if (!cleared) return -EFAULT; break; } cleared += chunk; count -= chunk; if (signal_pending(current)) break; cond_resched(); } return cleared; } static int mmap_zero(struct file *file, struct vm_area_struct *vma) { #ifndef CONFIG_MMU return -ENOSYS; #endif if (vma->vm_flags & VM_SHARED) return shmem_zero_setup(vma); vma_set_anonymous(vma); return 0; } static unsigned long get_unmapped_area_zero(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { #ifdef CONFIG_MMU if (flags & MAP_SHARED) { /* * mmap_zero() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge; * and pass NULL for file as in mmap.c's get_unmapped_area(), * so as not to confuse shmem with our handle on "/dev/zero". */ return shmem_get_unmapped_area(NULL, addr, len, pgoff, flags); } /* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags); #else return -ENOSYS; #endif } static ssize_t write_full(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { return -ENOSPC; } /* * Special lseek() function for /dev/null and /dev/zero. Most notably, you * can fopen() both devices with "a" now. This was previously impossible. * -- SRB. */ static loff_t null_lseek(struct file *file, loff_t offset, int orig) { return file->f_pos = 0; } /* * The memory devices use the full 32/64 bits of the offset, and so we cannot * check against negative addresses: they are ok. The return value is weird, * though, in that case (0). * * also note that seeking relative to the "end of file" isn't supported: * it has no meaning, so it returns -EINVAL. */ static loff_t memory_lseek(struct file *file, loff_t offset, int orig) { loff_t ret; inode_lock(file_inode(file)); switch (orig) { case SEEK_CUR: offset += file->f_pos; fallthrough; case SEEK_SET: /* to avoid userland mistaking f_pos=-9 as -EBADF=-9 */ if ((unsigned long long)offset >= -MAX_ERRNO) { ret = -EOVERFLOW; break; } file->f_pos = offset; ret = file->f_pos; force_successful_syscall_return(); break; default: ret = -EINVAL; } inode_unlock(file_inode(file)); return ret; } static int open_port(struct inode *inode, struct file *filp) { int rc; if (!capable(CAP_SYS_RAWIO)) return -EPERM; rc = security_locked_down(LOCKDOWN_DEV_MEM); if (rc) return rc; if (iminor(inode) != DEVMEM_MINOR) return 0; /* * Use a unified address space to have a single point to manage * revocations when drivers want to take over a /dev/mem mapped * range. */ filp->f_mapping = iomem_get_mapping(); return 0; } #define zero_lseek null_lseek #define full_lseek null_lseek #define write_zero write_null #define write_iter_zero write_iter_null #define splice_write_zero splice_write_null #define open_mem open_port static const struct file_operations __maybe_unused mem_fops = { .llseek = memory_lseek, .read = read_mem, .write = write_mem, .mmap = mmap_mem, .open = open_mem, #ifndef CONFIG_MMU .get_unmapped_area = get_unmapped_area_mem, .mmap_capabilities = memory_mmap_capabilities, #endif .fop_flags = FOP_UNSIGNED_OFFSET, }; static const struct file_operations null_fops = { .llseek = null_lseek, .read = read_null, .write = write_null, .read_iter = read_iter_null, .write_iter = write_iter_null, .splice_write = splice_write_null, .uring_cmd = uring_cmd_null, }; #ifdef CONFIG_DEVPORT static const struct file_operations port_fops = { .llseek = memory_lseek, .read = read_port, .write = write_port, .open = open_port, }; #endif static const struct file_operations zero_fops = { .llseek = zero_lseek, .write = write_zero, .read_iter = read_iter_zero, .read = read_zero, .write_iter = write_iter_zero, .splice_read = copy_splice_read, .splice_write = splice_write_zero, .mmap = mmap_zero, .get_unmapped_area = get_unmapped_area_zero, #ifndef CONFIG_MMU .mmap_capabilities = zero_mmap_capabilities, #endif }; static const struct file_operations full_fops = { .llseek = full_lseek, .read_iter = read_iter_zero, .write = write_full, .splice_read = copy_splice_read, }; static const struct memdev { const char *name; const struct file_operations *fops; fmode_t fmode; umode_t mode; } devlist[] = { #ifdef CONFIG_DEVMEM [DEVMEM_MINOR] = { "mem", &mem_fops, 0, 0 }, #endif [3] = { "null", &null_fops, FMODE_NOWAIT, 0666 }, #ifdef CONFIG_DEVPORT [4] = { "port", &port_fops, 0, 0 }, #endif [5] = { "zero", &zero_fops, FMODE_NOWAIT, 0666 }, [7] = { "full", &full_fops, 0, 0666 }, [8] = { "random", &random_fops, FMODE_NOWAIT, 0666 }, [9] = { "urandom", &urandom_fops, FMODE_NOWAIT, 0666 }, #ifdef CONFIG_PRINTK [11] = { "kmsg", &kmsg_fops, 0, 0644 }, #endif }; static int memory_open(struct inode *inode, struct file *filp) { int minor; const struct memdev *dev; minor = iminor(inode); if (minor >= ARRAY_SIZE(devlist)) return -ENXIO; dev = &devlist[minor]; if (!dev->fops) return -ENXIO; filp->f_op = dev->fops; filp->f_mode |= dev->fmode; if (dev->fops->open) return dev->fops->open(inode, filp); return 0; } static const struct file_operations memory_fops = { .open = memory_open, .llseek = noop_llseek, }; static char *mem_devnode(const struct device *dev, umode_t *mode) { if (mode && devlist[MINOR(dev->devt)].mode) *mode = devlist[MINOR(dev->devt)].mode; return NULL; } static const struct class mem_class = { .name = "mem", .devnode = mem_devnode, }; static int __init chr_dev_init(void) { int retval; int minor; if (register_chrdev(MEM_MAJOR, "mem", &memory_fops)) printk("unable to get major %d for memory devs\n", MEM_MAJOR); retval = class_register(&mem_class); if (retval) return retval; for (minor = 1; minor < ARRAY_SIZE(devlist); minor++) { if (!devlist[minor].name) continue; /* * Create /dev/port? */ if ((minor == DEVPORT_MINOR) && !arch_has_dev_port()) continue; device_create(&mem_class, NULL, MKDEV(MEM_MAJOR, minor), NULL, devlist[minor].name); } return tty_init(); } fs_initcall(chr_dev_init); |
| 133 134 134 134 134 132 133 133 131 133 7 4 3 1 2 6 6 6 7 6 6 3 2 6 2 2 5 5 1 6 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 13 13 10 11 10 9 8 13 5 1 5 2 3 3 3 3 3 3 1 3 3 1 2 1 5 2 4 2 2 2 2 2 2 2 2 5 5 1 5 2 3 1 2 2 1 6 6 2 3 2 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 | // SPDX-License-Identifier: GPL-2.0 /* * IPv6 Address Label subsystem * for the IPv6 "Default" Source Address Selection * * Copyright (C)2007 USAGI/WIDE Project */ /* * Author: * YOSHIFUJI Hideaki @ USAGI/WIDE Project <yoshfuji@linux-ipv6.org> */ #include <linux/kernel.h> #include <linux/list.h> #include <linux/rcupdate.h> #include <linux/in6.h> #include <linux/slab.h> #include <net/addrconf.h> #include <linux/if_addrlabel.h> #include <linux/netlink.h> #include <linux/rtnetlink.h> #if 0 #define ADDRLABEL(x...) printk(x) #else #define ADDRLABEL(x...) do { ; } while (0) #endif /* * Policy Table */ struct ip6addrlbl_entry { struct in6_addr prefix; int prefixlen; int ifindex; int addrtype; u32 label; struct hlist_node list; struct rcu_head rcu; }; /* * Default policy table (RFC6724 + extensions) * * prefix addr_type label * ------------------------------------------------------------------------- * ::1/128 LOOPBACK 0 * ::/0 N/A 1 * 2002::/16 N/A 2 * ::/96 COMPATv4 3 * ::ffff:0:0/96 V4MAPPED 4 * fc00::/7 N/A 5 ULA (RFC 4193) * 2001::/32 N/A 6 Teredo (RFC 4380) * 2001:10::/28 N/A 7 ORCHID (RFC 4843) * fec0::/10 N/A 11 Site-local * (deprecated by RFC3879) * 3ffe::/16 N/A 12 6bone * * Note: 0xffffffff is used if we do not have any policies. * Note: Labels for ULA and 6to4 are different from labels listed in RFC6724. */ #define IPV6_ADDR_LABEL_DEFAULT 0xffffffffUL static const __net_initconst struct ip6addrlbl_init_table { const struct in6_addr *prefix; int prefixlen; u32 label; } ip6addrlbl_init_table[] = { { /* ::/0 */ .prefix = &in6addr_any, .label = 1, }, { /* fc00::/7 */ .prefix = &(struct in6_addr){ { { 0xfc } } } , .prefixlen = 7, .label = 5, }, { /* fec0::/10 */ .prefix = &(struct in6_addr){ { { 0xfe, 0xc0 } } }, .prefixlen = 10, .label = 11, }, { /* 2002::/16 */ .prefix = &(struct in6_addr){ { { 0x20, 0x02 } } }, .prefixlen = 16, .label = 2, }, { /* 3ffe::/16 */ .prefix = &(struct in6_addr){ { { 0x3f, 0xfe } } }, .prefixlen = 16, .label = 12, }, { /* 2001::/32 */ .prefix = &(struct in6_addr){ { { 0x20, 0x01 } } }, .prefixlen = 32, .label = 6, }, { /* 2001:10::/28 */ .prefix = &(struct in6_addr){ { { 0x20, 0x01, 0x00, 0x10 } } }, .prefixlen = 28, .label = 7, }, { /* ::ffff:0:0 */ .prefix = &(struct in6_addr){ { { [10] = 0xff, [11] = 0xff } } }, .prefixlen = 96, .label = 4, }, { /* ::/96 */ .prefix = &in6addr_any, .prefixlen = 96, .label = 3, }, { /* ::1/128 */ .prefix = &in6addr_loopback, .prefixlen = 128, .label = 0, } }; /* Find label */ static bool __ip6addrlbl_match(const struct ip6addrlbl_entry *p, const struct in6_addr *addr, int addrtype, int ifindex) { if (p->ifindex && p->ifindex != ifindex) return false; if (p->addrtype && p->addrtype != addrtype) return false; if (!ipv6_prefix_equal(addr, &p->prefix, p->prefixlen)) return false; return true; } static struct ip6addrlbl_entry *__ipv6_addr_label(struct net *net, const struct in6_addr *addr, int type, int ifindex) { struct ip6addrlbl_entry *p; hlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) { if (__ip6addrlbl_match(p, addr, type, ifindex)) return p; } return NULL; } u32 ipv6_addr_label(struct net *net, const struct in6_addr *addr, int type, int ifindex) { u32 label; struct ip6addrlbl_entry *p; type &= IPV6_ADDR_MAPPED | IPV6_ADDR_COMPATv4 | IPV6_ADDR_LOOPBACK; rcu_read_lock(); p = __ipv6_addr_label(net, addr, type, ifindex); label = p ? p->label : IPV6_ADDR_LABEL_DEFAULT; rcu_read_unlock(); ADDRLABEL(KERN_DEBUG "%s(addr=%pI6, type=%d, ifindex=%d) => %08x\n", __func__, addr, type, ifindex, label); return label; } /* allocate one entry */ static struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix, int prefixlen, int ifindex, u32 label) { struct ip6addrlbl_entry *newp; int addrtype; ADDRLABEL(KERN_DEBUG "%s(prefix=%pI6, prefixlen=%d, ifindex=%d, label=%u)\n", __func__, prefix, prefixlen, ifindex, (unsigned int)label); addrtype = ipv6_addr_type(prefix) & (IPV6_ADDR_MAPPED | IPV6_ADDR_COMPATv4 | IPV6_ADDR_LOOPBACK); switch (addrtype) { case IPV6_ADDR_MAPPED: if (prefixlen > 96) return ERR_PTR(-EINVAL); if (prefixlen < 96) addrtype = 0; break; case IPV6_ADDR_COMPATv4: if (prefixlen != 96) addrtype = 0; break; case IPV6_ADDR_LOOPBACK: if (prefixlen != 128) addrtype = 0; break; } newp = kmalloc(sizeof(*newp), GFP_KERNEL); if (!newp) return ERR_PTR(-ENOMEM); ipv6_addr_prefix(&newp->prefix, prefix, prefixlen); newp->prefixlen = prefixlen; newp->ifindex = ifindex; newp->addrtype = addrtype; newp->label = label; INIT_HLIST_NODE(&newp->list); return newp; } /* add a label */ static int __ip6addrlbl_add(struct net *net, struct ip6addrlbl_entry *newp, int replace) { struct ip6addrlbl_entry *last = NULL, *p = NULL; struct hlist_node *n; int ret = 0; ADDRLABEL(KERN_DEBUG "%s(newp=%p, replace=%d)\n", __func__, newp, replace); hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) { if (p->prefixlen == newp->prefixlen && p->ifindex == newp->ifindex && ipv6_addr_equal(&p->prefix, &newp->prefix)) { if (!replace) { ret = -EEXIST; goto out; } hlist_replace_rcu(&p->list, &newp->list); kfree_rcu(p, rcu); goto out; } else if ((p->prefixlen == newp->prefixlen && !p->ifindex) || (p->prefixlen < newp->prefixlen)) { hlist_add_before_rcu(&newp->list, &p->list); goto out; } last = p; } if (last) hlist_add_behind_rcu(&newp->list, &last->list); else hlist_add_head_rcu(&newp->list, &net->ipv6.ip6addrlbl_table.head); out: if (!ret) WRITE_ONCE(net->ipv6.ip6addrlbl_table.seq, net->ipv6.ip6addrlbl_table.seq + 1); return ret; } /* add a label */ static int ip6addrlbl_add(struct net *net, const struct in6_addr *prefix, int prefixlen, int ifindex, u32 label, int replace) { struct ip6addrlbl_entry *newp; int ret = 0; ADDRLABEL(KERN_DEBUG "%s(prefix=%pI6, prefixlen=%d, ifindex=%d, label=%u, replace=%d)\n", __func__, prefix, prefixlen, ifindex, (unsigned int)label, replace); newp = ip6addrlbl_alloc(prefix, prefixlen, ifindex, label); if (IS_ERR(newp)) return PTR_ERR(newp); spin_lock(&net->ipv6.ip6addrlbl_table.lock); ret = __ip6addrlbl_add(net, newp, replace); spin_unlock(&net->ipv6.ip6addrlbl_table.lock); if (ret) kfree(newp); return ret; } /* remove a label */ static int __ip6addrlbl_del(struct net *net, const struct in6_addr *prefix, int prefixlen, int ifindex) { struct ip6addrlbl_entry *p = NULL; struct hlist_node *n; int ret = -ESRCH; ADDRLABEL(KERN_DEBUG "%s(prefix=%pI6, prefixlen=%d, ifindex=%d)\n", __func__, prefix, prefixlen, ifindex); hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) { if (p->prefixlen == prefixlen && p->ifindex == ifindex && ipv6_addr_equal(&p->prefix, prefix)) { hlist_del_rcu(&p->list); kfree_rcu(p, rcu); ret = 0; break; } } return ret; } static int ip6addrlbl_del(struct net *net, const struct in6_addr *prefix, int prefixlen, int ifindex) { struct in6_addr prefix_buf; int ret; ADDRLABEL(KERN_DEBUG "%s(prefix=%pI6, prefixlen=%d, ifindex=%d)\n", __func__, prefix, prefixlen, ifindex); ipv6_addr_prefix(&prefix_buf, prefix, prefixlen); spin_lock(&net->ipv6.ip6addrlbl_table.lock); ret = __ip6addrlbl_del(net, &prefix_buf, prefixlen, ifindex); spin_unlock(&net->ipv6.ip6addrlbl_table.lock); return ret; } /* add default label */ static int __net_init ip6addrlbl_net_init(struct net *net) { struct ip6addrlbl_entry *p = NULL; struct hlist_node *n; int err; int i; ADDRLABEL(KERN_DEBUG "%s\n", __func__); spin_lock_init(&net->ipv6.ip6addrlbl_table.lock); INIT_HLIST_HEAD(&net->ipv6.ip6addrlbl_table.head); for (i = 0; i < ARRAY_SIZE(ip6addrlbl_init_table); i++) { err = ip6addrlbl_add(net, ip6addrlbl_init_table[i].prefix, ip6addrlbl_init_table[i].prefixlen, 0, ip6addrlbl_init_table[i].label, 0); if (err) goto err_ip6addrlbl_add; } return 0; err_ip6addrlbl_add: hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) { hlist_del_rcu(&p->list); kfree_rcu(p, rcu); } return err; } static void __net_exit ip6addrlbl_net_exit(struct net *net) { struct ip6addrlbl_entry *p = NULL; struct hlist_node *n; /* Remove all labels belonging to the exiting net */ spin_lock(&net->ipv6.ip6addrlbl_table.lock); hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) { hlist_del_rcu(&p->list); kfree_rcu(p, rcu); } spin_unlock(&net->ipv6.ip6addrlbl_table.lock); } static struct pernet_operations ipv6_addr_label_ops = { .init = ip6addrlbl_net_init, .exit = ip6addrlbl_net_exit, }; int __init ipv6_addr_label_init(void) { return register_pernet_subsys(&ipv6_addr_label_ops); } void ipv6_addr_label_cleanup(void) { unregister_pernet_subsys(&ipv6_addr_label_ops); } static const struct nla_policy ifal_policy[IFAL_MAX+1] = { [IFAL_ADDRESS] = { .len = sizeof(struct in6_addr), }, [IFAL_LABEL] = { .len = sizeof(u32), }, }; static bool addrlbl_ifindex_exists(struct net *net, int ifindex) { struct net_device *dev; rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); rcu_read_unlock(); return dev != NULL; } static int ip6addrlbl_newdel(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct ifaddrlblmsg *ifal; struct nlattr *tb[IFAL_MAX+1]; struct in6_addr *pfx; u32 label; int err = 0; err = nlmsg_parse_deprecated(nlh, sizeof(*ifal), tb, IFAL_MAX, ifal_policy, extack); if (err < 0) return err; ifal = nlmsg_data(nlh); if (ifal->ifal_family != AF_INET6 || ifal->ifal_prefixlen > 128) return -EINVAL; if (!tb[IFAL_ADDRESS]) return -EINVAL; pfx = nla_data(tb[IFAL_ADDRESS]); if (!tb[IFAL_LABEL]) return -EINVAL; label = nla_get_u32(tb[IFAL_LABEL]); if (label == IPV6_ADDR_LABEL_DEFAULT) return -EINVAL; switch (nlh->nlmsg_type) { case RTM_NEWADDRLABEL: if (ifal->ifal_index && !addrlbl_ifindex_exists(net, ifal->ifal_index)) return -EINVAL; err = ip6addrlbl_add(net, pfx, ifal->ifal_prefixlen, ifal->ifal_index, label, nlh->nlmsg_flags & NLM_F_REPLACE); break; case RTM_DELADDRLABEL: err = ip6addrlbl_del(net, pfx, ifal->ifal_prefixlen, ifal->ifal_index); break; default: err = -EOPNOTSUPP; } return err; } static void ip6addrlbl_putmsg(struct nlmsghdr *nlh, int prefixlen, int ifindex, u32 lseq) { struct ifaddrlblmsg *ifal = nlmsg_data(nlh); ifal->ifal_family = AF_INET6; ifal->__ifal_reserved = 0; ifal->ifal_prefixlen = prefixlen; ifal->ifal_flags = 0; ifal->ifal_index = ifindex; ifal->ifal_seq = lseq; }; static int ip6addrlbl_fill(struct sk_buff *skb, const struct ip6addrlbl_entry *p, u32 lseq, u32 portid, u32 seq, int event, unsigned int flags) { struct nlmsghdr *nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct ifaddrlblmsg), flags); if (!nlh) return -EMSGSIZE; ip6addrlbl_putmsg(nlh, p->prefixlen, p->ifindex, lseq); if (nla_put_in6_addr(skb, IFAL_ADDRESS, &p->prefix) < 0 || nla_put_u32(skb, IFAL_LABEL, p->label) < 0) { nlmsg_cancel(skb, nlh); return -EMSGSIZE; } nlmsg_end(skb, nlh); return 0; } static int ip6addrlbl_valid_dump_req(const struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct ifaddrlblmsg *ifal; ifal = nlmsg_payload(nlh, sizeof(*ifal)); if (!ifal) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for address label dump request"); return -EINVAL; } if (ifal->__ifal_reserved || ifal->ifal_prefixlen || ifal->ifal_flags || ifal->ifal_index || ifal->ifal_seq) { NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for address label dump request"); return -EINVAL; } if (nlmsg_attrlen(nlh, sizeof(*ifal))) { NL_SET_ERR_MSG_MOD(extack, "Invalid data after header for address label dump request"); return -EINVAL; } return 0; } static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct ip6addrlbl_entry *p; int idx = 0, s_idx = cb->args[0]; int err = 0; u32 lseq; if (cb->strict_check) { err = ip6addrlbl_valid_dump_req(nlh, cb->extack); if (err < 0) return err; } rcu_read_lock(); lseq = READ_ONCE(net->ipv6.ip6addrlbl_table.seq); hlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) { if (idx >= s_idx) { err = ip6addrlbl_fill(skb, p, lseq, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWADDRLABEL, NLM_F_MULTI); if (err < 0) break; } idx++; } rcu_read_unlock(); cb->args[0] = idx; return err; } static inline int ip6addrlbl_msgsize(void) { return NLMSG_ALIGN(sizeof(struct ifaddrlblmsg)) + nla_total_size(16) /* IFAL_ADDRESS */ + nla_total_size(4); /* IFAL_LABEL */ } static int ip6addrlbl_valid_get_req(struct sk_buff *skb, const struct nlmsghdr *nlh, struct nlattr **tb, struct netlink_ext_ack *extack) { struct ifaddrlblmsg *ifal; int i, err; ifal = nlmsg_payload(nlh, sizeof(*ifal)); if (!ifal) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for addrlabel get request"); return -EINVAL; } if (!netlink_strict_get_check(skb)) return nlmsg_parse_deprecated(nlh, sizeof(*ifal), tb, IFAL_MAX, ifal_policy, extack); if (ifal->__ifal_reserved || ifal->ifal_flags || ifal->ifal_seq) { NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for addrlabel get request"); return -EINVAL; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(*ifal), tb, IFAL_MAX, ifal_policy, extack); if (err) return err; for (i = 0; i <= IFAL_MAX; i++) { if (!tb[i]) continue; switch (i) { case IFAL_ADDRESS: break; default: NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in addrlabel get request"); return -EINVAL; } } return 0; } static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(in_skb->sk); struct ifaddrlblmsg *ifal; struct nlattr *tb[IFAL_MAX+1]; struct in6_addr *addr; u32 lseq; int err = 0; struct ip6addrlbl_entry *p; struct sk_buff *skb; err = ip6addrlbl_valid_get_req(in_skb, nlh, tb, extack); if (err < 0) return err; ifal = nlmsg_data(nlh); if (ifal->ifal_family != AF_INET6 || ifal->ifal_prefixlen != 128) return -EINVAL; if (ifal->ifal_index && !addrlbl_ifindex_exists(net, ifal->ifal_index)) return -EINVAL; if (!tb[IFAL_ADDRESS]) return -EINVAL; addr = nla_data(tb[IFAL_ADDRESS]); skb = nlmsg_new(ip6addrlbl_msgsize(), GFP_KERNEL); if (!skb) return -ENOBUFS; err = -ESRCH; rcu_read_lock(); p = __ipv6_addr_label(net, addr, ipv6_addr_type(addr), ifal->ifal_index); lseq = READ_ONCE(net->ipv6.ip6addrlbl_table.seq); if (p) err = ip6addrlbl_fill(skb, p, lseq, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, RTM_NEWADDRLABEL, 0); rcu_read_unlock(); if (err < 0) { WARN_ON(err == -EMSGSIZE); kfree_skb(skb); } else { err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid); } return err; } static const struct rtnl_msg_handler ipv6_adddr_label_rtnl_msg_handlers[] __initconst_or_module = { {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_NEWADDRLABEL, .doit = ip6addrlbl_newdel, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_DELADDRLABEL, .doit = ip6addrlbl_newdel, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETADDRLABEL, .doit = ip6addrlbl_get, .dumpit = ip6addrlbl_dump, .flags = RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED}, }; int __init ipv6_addr_label_rtnl_register(void) { return rtnl_register_many(ipv6_adddr_label_rtnl_msg_handlers); } |
| 1 866 56 38 44 19 36 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_HIGHMEM_INTERNAL_H #define _LINUX_HIGHMEM_INTERNAL_H /* * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft. */ #ifdef CONFIG_KMAP_LOCAL void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot); void *__kmap_local_page_prot(struct page *page, pgprot_t prot); void kunmap_local_indexed(const void *vaddr); void kmap_local_fork(struct task_struct *tsk); void __kmap_local_sched_out(void); void __kmap_local_sched_in(void); static inline void kmap_assert_nomap(void) { DEBUG_LOCKS_WARN_ON(current->kmap_ctrl.idx); } #else static inline void kmap_local_fork(struct task_struct *tsk) { } static inline void kmap_assert_nomap(void) { } #endif #ifdef CONFIG_HIGHMEM #include <asm/highmem.h> #ifndef ARCH_HAS_KMAP_FLUSH_TLB static inline void kmap_flush_tlb(unsigned long addr) { } #endif #ifndef kmap_prot #define kmap_prot PAGE_KERNEL #endif void *kmap_high(struct page *page); void kunmap_high(struct page *page); void __kmap_flush_unused(void); struct page *__kmap_to_page(void *addr); static inline void *kmap(struct page *page) { void *addr; might_sleep(); if (!PageHighMem(page)) addr = page_address(page); else addr = kmap_high(page); kmap_flush_tlb((unsigned long)addr); return addr; } static inline void kunmap(struct page *page) { might_sleep(); if (!PageHighMem(page)) return; kunmap_high(page); } static inline struct page *kmap_to_page(void *addr) { return __kmap_to_page(addr); } static inline void kmap_flush_unused(void) { __kmap_flush_unused(); } static inline void *kmap_local_page(struct page *page) { return __kmap_local_page_prot(page, kmap_prot); } static inline void *kmap_local_page_try_from_panic(struct page *page) { if (!PageHighMem(page)) return page_address(page); /* If the page is in HighMem, it's not safe to kmap it.*/ return NULL; } static inline void *kmap_local_folio(struct folio *folio, size_t offset) { struct page *page = folio_page(folio, offset / PAGE_SIZE); return __kmap_local_page_prot(page, kmap_prot) + offset % PAGE_SIZE; } static inline void *kmap_local_page_prot(struct page *page, pgprot_t prot) { return __kmap_local_page_prot(page, prot); } static inline void *kmap_local_pfn(unsigned long pfn) { return __kmap_local_pfn_prot(pfn, kmap_prot); } static inline void __kunmap_local(const void *vaddr) { kunmap_local_indexed(vaddr); } static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else preempt_disable(); pagefault_disable(); return __kmap_local_page_prot(page, prot); } static inline void *kmap_atomic(struct page *page) { return kmap_atomic_prot(page, kmap_prot); } static inline void *kmap_atomic_pfn(unsigned long pfn) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else preempt_disable(); pagefault_disable(); return __kmap_local_pfn_prot(pfn, kmap_prot); } static inline void __kunmap_atomic(const void *addr) { kunmap_local_indexed(addr); pagefault_enable(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_enable(); else preempt_enable(); } unsigned long __nr_free_highpages(void); unsigned long __totalhigh_pages(void); static inline unsigned long nr_free_highpages(void) { return __nr_free_highpages(); } static inline unsigned long totalhigh_pages(void) { return __totalhigh_pages(); } static inline bool is_kmap_addr(const void *x) { unsigned long addr = (unsigned long)x; return (addr >= PKMAP_ADDR(0) && addr < PKMAP_ADDR(LAST_PKMAP)) || (addr >= __fix_to_virt(FIX_KMAP_END) && addr < __fix_to_virt(FIX_KMAP_BEGIN)); } #else /* CONFIG_HIGHMEM */ static inline struct page *kmap_to_page(void *addr) { return virt_to_page(addr); } static inline void *kmap(struct page *page) { might_sleep(); return page_address(page); } static inline void kunmap_high(struct page *page) { } static inline void kmap_flush_unused(void) { } static inline void kunmap(struct page *page) { #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(page_address(page)); #endif } static inline void *kmap_local_page(struct page *page) { return page_address(page); } static inline void *kmap_local_page_try_from_panic(struct page *page) { return page_address(page); } static inline void *kmap_local_folio(struct folio *folio, size_t offset) { return page_address(&folio->page) + offset; } static inline void *kmap_local_page_prot(struct page *page, pgprot_t prot) { return kmap_local_page(page); } static inline void *kmap_local_pfn(unsigned long pfn) { return kmap_local_page(pfn_to_page(pfn)); } static inline void __kunmap_local(const void *addr) { #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(PTR_ALIGN_DOWN(addr, PAGE_SIZE)); #endif } static inline void *kmap_atomic(struct page *page) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else preempt_disable(); pagefault_disable(); return page_address(page); } static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { return kmap_atomic(page); } static inline void *kmap_atomic_pfn(unsigned long pfn) { return kmap_atomic(pfn_to_page(pfn)); } static inline void __kunmap_atomic(const void *addr) { #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(PTR_ALIGN_DOWN(addr, PAGE_SIZE)); #endif pagefault_enable(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_enable(); else preempt_enable(); } static inline unsigned long nr_free_highpages(void) { return 0; } static inline unsigned long totalhigh_pages(void) { return 0; } static inline bool is_kmap_addr(const void *x) { return false; } #endif /* CONFIG_HIGHMEM */ /** * kunmap_atomic - Unmap the virtual address mapped by kmap_atomic() - deprecated! * @__addr: Virtual address to be unmapped * * Unmaps an address previously mapped by kmap_atomic() and re-enables * pagefaults. Depending on PREEMP_RT configuration, re-enables also * migration and preemption. Users should not count on these side effects. * * Mappings should be unmapped in the reverse order that they were mapped. * See kmap_local_page() for details on nesting. * * @__addr can be any address within the mapped page, so there is no need * to subtract any offset that has been added. In contrast to kunmap(), * this function takes the address returned from kmap_atomic(), not the * page passed to it. The compiler will warn you if you pass the page. */ #define kunmap_atomic(__addr) \ do { \ BUILD_BUG_ON(__same_type((__addr), struct page *)); \ __kunmap_atomic(__addr); \ } while (0) /** * kunmap_local - Unmap a page mapped via kmap_local_page(). * @__addr: An address within the page mapped * * @__addr can be any address within the mapped page. Commonly it is the * address return from kmap_local_page(), but it can also include offsets. * * Unmapping should be done in the reverse order of the mapping. See * kmap_local_page() for details. */ #define kunmap_local(__addr) \ do { \ BUILD_BUG_ON(__same_type((__addr), struct page *)); \ __kunmap_local(__addr); \ } while (0) #endif |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 | // SPDX-License-Identifier: GPL-2.0-only /****************************************************************************** ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. ** Copyright (C) 2004-2011 Red Hat, Inc. All rights reserved. ** ** ******************************************************************************* ******************************************************************************/ #include <linux/kernel.h> #include <linux/init.h> #include <linux/configfs.h> #include <linux/slab.h> #include <linux/in.h> #include <linux/in6.h> #include <linux/dlmconstants.h> #include <net/ipv6.h> #include <net/sock.h> #include "config.h" #include "midcomms.h" #include "lowcomms.h" /* * /config/dlm/<cluster>/spaces/<space>/nodes/<node>/nodeid (refers to <node>) * /config/dlm/<cluster>/spaces/<space>/nodes/<node>/weight * /config/dlm/<cluster>/comms/<comm>/nodeid (refers to <comm>) * /config/dlm/<cluster>/comms/<comm>/local * /config/dlm/<cluster>/comms/<comm>/addr (write only) * /config/dlm/<cluster>/comms/<comm>/addr_list (read only) * The <cluster> level is useless, but I haven't figured out how to avoid it. */ static struct config_group *space_list; static struct config_group *comm_list; static struct dlm_comm *local_comm; static uint32_t dlm_comm_count; struct dlm_clusters; struct dlm_cluster; struct dlm_spaces; struct dlm_space; struct dlm_comms; struct dlm_comm; struct dlm_nodes; struct dlm_node; static struct config_group *make_cluster(struct config_group *, const char *); static void drop_cluster(struct config_group *, struct config_item *); static void release_cluster(struct config_item *); static struct config_group *make_space(struct config_group *, const char *); static void drop_space(struct config_group *, struct config_item *); static void release_space(struct config_item *); static struct config_item *make_comm(struct config_group *, const char *); static void drop_comm(struct config_group *, struct config_item *); static void release_comm(struct config_item *); static struct config_item *make_node(struct config_group *, const char *); static void drop_node(struct config_group *, struct config_item *); static void release_node(struct config_item *); static struct configfs_attribute *comm_attrs[]; static struct configfs_attribute *node_attrs[]; const struct rhashtable_params dlm_rhash_rsb_params = { .nelem_hint = 3, /* start small */ .key_len = DLM_RESNAME_MAXLEN, .key_offset = offsetof(struct dlm_rsb, res_name), .head_offset = offsetof(struct dlm_rsb, res_node), .automatic_shrinking = true, }; struct dlm_cluster { struct config_group group; struct dlm_spaces *sps; struct dlm_comms *cms; }; static struct dlm_cluster *config_item_to_cluster(struct config_item *i) { return i ? container_of(to_config_group(i), struct dlm_cluster, group) : NULL; } enum { CLUSTER_ATTR_TCP_PORT = 0, CLUSTER_ATTR_BUFFER_SIZE, CLUSTER_ATTR_RSBTBL_SIZE, CLUSTER_ATTR_RECOVER_TIMER, CLUSTER_ATTR_TOSS_SECS, CLUSTER_ATTR_SCAN_SECS, CLUSTER_ATTR_LOG_DEBUG, CLUSTER_ATTR_LOG_INFO, CLUSTER_ATTR_PROTOCOL, CLUSTER_ATTR_MARK, CLUSTER_ATTR_NEW_RSB_COUNT, CLUSTER_ATTR_RECOVER_CALLBACKS, CLUSTER_ATTR_CLUSTER_NAME, }; static ssize_t cluster_cluster_name_show(struct config_item *item, char *buf) { return sprintf(buf, "%s\n", dlm_config.ci_cluster_name); } static ssize_t cluster_cluster_name_store(struct config_item *item, const char *buf, size_t len) { strscpy(dlm_config.ci_cluster_name, buf, sizeof(dlm_config.ci_cluster_name)); return len; } CONFIGFS_ATTR(cluster_, cluster_name); static ssize_t cluster_tcp_port_show(struct config_item *item, char *buf) { return sprintf(buf, "%u\n", be16_to_cpu(dlm_config.ci_tcp_port)); } static int dlm_check_zero_and_dlm_running(unsigned int x) { if (!x) return -EINVAL; if (dlm_lowcomms_is_running()) return -EBUSY; return 0; } static ssize_t cluster_tcp_port_store(struct config_item *item, const char *buf, size_t len) { int rc; u16 x; if (!capable(CAP_SYS_ADMIN)) return -EPERM; rc = kstrtou16(buf, 0, &x); if (rc) return rc; rc = dlm_check_zero_and_dlm_running(x); if (rc) return rc; dlm_config.ci_tcp_port = cpu_to_be16(x); return len; } CONFIGFS_ATTR(cluster_, tcp_port); static ssize_t cluster_set(unsigned int *info_field, int (*check_cb)(unsigned int x), const char *buf, size_t len) { unsigned int x; int rc; if (!capable(CAP_SYS_ADMIN)) return -EPERM; rc = kstrtouint(buf, 0, &x); if (rc) return rc; if (check_cb) { rc = check_cb(x); if (rc) return rc; } *info_field = x; return len; } #define CLUSTER_ATTR(name, check_cb) \ static ssize_t cluster_##name##_store(struct config_item *item, \ const char *buf, size_t len) \ { \ return cluster_set(&dlm_config.ci_##name, check_cb, buf, len); \ } \ static ssize_t cluster_##name##_show(struct config_item *item, char *buf) \ { \ return snprintf(buf, PAGE_SIZE, "%u\n", dlm_config.ci_##name); \ } \ CONFIGFS_ATTR(cluster_, name); static int dlm_check_protocol_and_dlm_running(unsigned int x) { switch (x) { case 0: /* TCP */ break; case 1: /* SCTP */ if (!IS_ENABLED(CONFIG_IP_SCTP)) return -EOPNOTSUPP; break; default: return -EINVAL; } if (dlm_lowcomms_is_running()) return -EBUSY; return 0; } static int dlm_check_zero(unsigned int x) { if (!x) return -EINVAL; return 0; } static int dlm_check_buffer_size(unsigned int x) { if (x < DLM_MAX_SOCKET_BUFSIZE) return -EINVAL; return 0; } CLUSTER_ATTR(buffer_size, dlm_check_buffer_size); CLUSTER_ATTR(rsbtbl_size, dlm_check_zero); CLUSTER_ATTR(recover_timer, dlm_check_zero); CLUSTER_ATTR(toss_secs, dlm_check_zero); CLUSTER_ATTR(scan_secs, dlm_check_zero); CLUSTER_ATTR(log_debug, NULL); CLUSTER_ATTR(log_info, NULL); CLUSTER_ATTR(protocol, dlm_check_protocol_and_dlm_running); CLUSTER_ATTR(mark, NULL); CLUSTER_ATTR(new_rsb_count, NULL); CLUSTER_ATTR(recover_callbacks, NULL); static struct configfs_attribute *cluster_attrs[] = { [CLUSTER_ATTR_TCP_PORT] = &cluster_attr_tcp_port, [CLUSTER_ATTR_BUFFER_SIZE] = &cluster_attr_buffer_size, [CLUSTER_ATTR_RSBTBL_SIZE] = &cluster_attr_rsbtbl_size, [CLUSTER_ATTR_RECOVER_TIMER] = &cluster_attr_recover_timer, [CLUSTER_ATTR_TOSS_SECS] = &cluster_attr_toss_secs, [CLUSTER_ATTR_SCAN_SECS] = &cluster_attr_scan_secs, [CLUSTER_ATTR_LOG_DEBUG] = &cluster_attr_log_debug, [CLUSTER_ATTR_LOG_INFO] = &cluster_attr_log_info, [CLUSTER_ATTR_PROTOCOL] = &cluster_attr_protocol, [CLUSTER_ATTR_MARK] = &cluster_attr_mark, [CLUSTER_ATTR_NEW_RSB_COUNT] = &cluster_attr_new_rsb_count, [CLUSTER_ATTR_RECOVER_CALLBACKS] = &cluster_attr_recover_callbacks, [CLUSTER_ATTR_CLUSTER_NAME] = &cluster_attr_cluster_name, NULL, }; enum { COMM_ATTR_NODEID = 0, COMM_ATTR_LOCAL, COMM_ATTR_ADDR, COMM_ATTR_ADDR_LIST, COMM_ATTR_MARK, }; enum { NODE_ATTR_NODEID = 0, NODE_ATTR_WEIGHT, }; struct dlm_clusters { struct configfs_subsystem subsys; }; struct dlm_spaces { struct config_group ss_group; }; struct dlm_space { struct config_group group; struct list_head members; struct mutex members_lock; int members_count; struct dlm_nodes *nds; }; struct dlm_comms { struct config_group cs_group; }; struct dlm_comm { struct config_item item; int seq; int nodeid; int local; int addr_count; unsigned int mark; struct sockaddr_storage *addr[DLM_MAX_ADDR_COUNT]; }; struct dlm_nodes { struct config_group ns_group; }; struct dlm_node { struct config_item item; struct list_head list; /* space->members */ int nodeid; int weight; int new; int comm_seq; /* copy of cm->seq when nd->nodeid is set */ }; static struct configfs_group_operations clusters_ops = { .make_group = make_cluster, .drop_item = drop_cluster, }; static struct configfs_item_operations cluster_ops = { .release = release_cluster, }; static struct configfs_group_operations spaces_ops = { .make_group = make_space, .drop_item = drop_space, }; static struct configfs_item_operations space_ops = { .release = release_space, }; static struct configfs_group_operations comms_ops = { .make_item = make_comm, .drop_item = drop_comm, }; static struct configfs_item_operations comm_ops = { .release = release_comm, }; static struct configfs_group_operations nodes_ops = { .make_item = make_node, .drop_item = drop_node, }; static struct configfs_item_operations node_ops = { .release = release_node, }; static const struct config_item_type clusters_type = { .ct_group_ops = &clusters_ops, .ct_owner = THIS_MODULE, }; static const struct config_item_type cluster_type = { .ct_item_ops = &cluster_ops, .ct_attrs = cluster_attrs, .ct_owner = THIS_MODULE, }; static const struct config_item_type spaces_type = { .ct_group_ops = &spaces_ops, .ct_owner = THIS_MODULE, }; static const struct config_item_type space_type = { .ct_item_ops = &space_ops, .ct_owner = THIS_MODULE, }; static const struct config_item_type comms_type = { .ct_group_ops = &comms_ops, .ct_owner = THIS_MODULE, }; static const struct config_item_type comm_type = { .ct_item_ops = &comm_ops, .ct_attrs = comm_attrs, .ct_owner = THIS_MODULE, }; static const struct config_item_type nodes_type = { .ct_group_ops = &nodes_ops, .ct_owner = THIS_MODULE, }; static const struct config_item_type node_type = { .ct_item_ops = &node_ops, .ct_attrs = node_attrs, .ct_owner = THIS_MODULE, }; static struct dlm_space *config_item_to_space(struct config_item *i) { return i ? container_of(to_config_group(i), struct dlm_space, group) : NULL; } static struct dlm_comm *config_item_to_comm(struct config_item *i) { return i ? container_of(i, struct dlm_comm, item) : NULL; } static struct dlm_node *config_item_to_node(struct config_item *i) { return i ? container_of(i, struct dlm_node, item) : NULL; } static struct config_group *make_cluster(struct config_group *g, const char *name) { struct dlm_cluster *cl = NULL; struct dlm_spaces *sps = NULL; struct dlm_comms *cms = NULL; cl = kzalloc(sizeof(struct dlm_cluster), GFP_NOFS); sps = kzalloc(sizeof(struct dlm_spaces), GFP_NOFS); cms = kzalloc(sizeof(struct dlm_comms), GFP_NOFS); if (!cl || !sps || !cms) goto fail; cl->sps = sps; cl->cms = cms; config_group_init_type_name(&cl->group, name, &cluster_type); config_group_init_type_name(&sps->ss_group, "spaces", &spaces_type); config_group_init_type_name(&cms->cs_group, "comms", &comms_type); configfs_add_default_group(&sps->ss_group, &cl->group); configfs_add_default_group(&cms->cs_group, &cl->group); space_list = &sps->ss_group; comm_list = &cms->cs_group; return &cl->group; fail: kfree(cl); kfree(sps); kfree(cms); return ERR_PTR(-ENOMEM); } static void drop_cluster(struct config_group *g, struct config_item *i) { struct dlm_cluster *cl = config_item_to_cluster(i); configfs_remove_default_groups(&cl->group); space_list = NULL; comm_list = NULL; config_item_put(i); } static void release_cluster(struct config_item *i) { struct dlm_cluster *cl = config_item_to_cluster(i); kfree(cl->sps); kfree(cl->cms); kfree(cl); } static struct config_group *make_space(struct config_group *g, const char *name) { struct dlm_space *sp = NULL; struct dlm_nodes *nds = NULL; sp = kzalloc(sizeof(struct dlm_space), GFP_NOFS); nds = kzalloc(sizeof(struct dlm_nodes), GFP_NOFS); if (!sp || !nds) goto fail; config_group_init_type_name(&sp->group, name, &space_type); config_group_init_type_name(&nds->ns_group, "nodes", &nodes_type); configfs_add_default_group(&nds->ns_group, &sp->group); INIT_LIST_HEAD(&sp->members); mutex_init(&sp->members_lock); sp->members_count = 0; sp->nds = nds; return &sp->group; fail: kfree(sp); kfree(nds); return ERR_PTR(-ENOMEM); } static void drop_space(struct config_group *g, struct config_item *i) { struct dlm_space *sp = config_item_to_space(i); /* assert list_empty(&sp->members) */ configfs_remove_default_groups(&sp->group); config_item_put(i); } static void release_space(struct config_item *i) { struct dlm_space *sp = config_item_to_space(i); kfree(sp->nds); kfree(sp); } static struct config_item *make_comm(struct config_group *g, const char *name) { struct dlm_comm *cm; unsigned int nodeid; int rv; rv = kstrtouint(name, 0, &nodeid); if (rv) return ERR_PTR(rv); cm = kzalloc(sizeof(struct dlm_comm), GFP_NOFS); if (!cm) return ERR_PTR(-ENOMEM); config_item_init_type_name(&cm->item, name, &comm_type); cm->seq = dlm_comm_count++; if (!cm->seq) cm->seq = dlm_comm_count++; cm->nodeid = nodeid; cm->local = 0; cm->addr_count = 0; cm->mark = 0; return &cm->item; } static void drop_comm(struct config_group *g, struct config_item *i) { struct dlm_comm *cm = config_item_to_comm(i); if (local_comm == cm) local_comm = NULL; dlm_midcomms_close(cm->nodeid); while (cm->addr_count--) kfree(cm->addr[cm->addr_count]); config_item_put(i); } static void release_comm(struct config_item *i) { struct dlm_comm *cm = config_item_to_comm(i); kfree(cm); } static struct config_item *make_node(struct config_group *g, const char *name) { struct dlm_space *sp = config_item_to_space(g->cg_item.ci_parent); unsigned int nodeid; struct dlm_node *nd; uint32_t seq = 0; int rv; rv = kstrtouint(name, 0, &nodeid); if (rv) return ERR_PTR(rv); nd = kzalloc(sizeof(struct dlm_node), GFP_NOFS); if (!nd) return ERR_PTR(-ENOMEM); config_item_init_type_name(&nd->item, name, &node_type); nd->nodeid = nodeid; nd->weight = 1; /* default weight of 1 if none is set */ nd->new = 1; /* set to 0 once it's been read by dlm_nodeid_list() */ dlm_comm_seq(nodeid, &seq, true); nd->comm_seq = seq; mutex_lock(&sp->members_lock); list_add(&nd->list, &sp->members); sp->members_count++; mutex_unlock(&sp->members_lock); return &nd->item; } static void drop_node(struct config_group *g, struct config_item *i) { struct dlm_space *sp = config_item_to_space(g->cg_item.ci_parent); struct dlm_node *nd = config_item_to_node(i); mutex_lock(&sp->members_lock); list_del(&nd->list); sp->members_count--; mutex_unlock(&sp->members_lock); config_item_put(i); } static void release_node(struct config_item *i) { struct dlm_node *nd = config_item_to_node(i); kfree(nd); } static struct dlm_clusters clusters_root = { .subsys = { .su_group = { .cg_item = { .ci_namebuf = "dlm", .ci_type = &clusters_type, }, }, }, }; int __init dlm_config_init(void) { config_group_init(&clusters_root.subsys.su_group); mutex_init(&clusters_root.subsys.su_mutex); return configfs_register_subsystem(&clusters_root.subsys); } void dlm_config_exit(void) { configfs_unregister_subsystem(&clusters_root.subsys); } /* * Functions for user space to read/write attributes */ static ssize_t comm_nodeid_show(struct config_item *item, char *buf) { unsigned int nodeid; int rv; rv = kstrtouint(config_item_name(item), 0, &nodeid); if (WARN_ON(rv)) return rv; return sprintf(buf, "%u\n", nodeid); } static ssize_t comm_nodeid_store(struct config_item *item, const char *buf, size_t len) { return len; } static ssize_t comm_local_show(struct config_item *item, char *buf) { return sprintf(buf, "%d\n", config_item_to_comm(item)->local); } static ssize_t comm_local_store(struct config_item *item, const char *buf, size_t len) { struct dlm_comm *cm = config_item_to_comm(item); int rc = kstrtoint(buf, 0, &cm->local); if (rc) return rc; if (cm->local && !local_comm) local_comm = cm; return len; } static ssize_t comm_addr_store(struct config_item *item, const char *buf, size_t len) { struct dlm_comm *cm = config_item_to_comm(item); struct sockaddr_storage *addr; int rv; if (len != sizeof(struct sockaddr_storage)) return -EINVAL; if (cm->addr_count >= DLM_MAX_ADDR_COUNT) return -ENOSPC; addr = kzalloc(sizeof(*addr), GFP_NOFS); if (!addr) return -ENOMEM; memcpy(addr, buf, len); rv = dlm_midcomms_addr(cm->nodeid, addr); if (rv) { kfree(addr); return rv; } cm->addr[cm->addr_count++] = addr; return len; } static ssize_t comm_addr_list_show(struct config_item *item, char *buf) { struct dlm_comm *cm = config_item_to_comm(item); ssize_t s; ssize_t allowance; int i; struct sockaddr_storage *addr; struct sockaddr_in *addr_in; struct sockaddr_in6 *addr_in6; /* Taken from ip6_addr_string() defined in lib/vsprintf.c */ char buf0[sizeof("AF_INET6 xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:255.255.255.255\n")]; /* Derived from SIMPLE_ATTR_SIZE of fs/configfs/file.c */ allowance = 4096; buf[0] = '\0'; for (i = 0; i < cm->addr_count; i++) { addr = cm->addr[i]; switch(addr->ss_family) { case AF_INET: addr_in = (struct sockaddr_in *)addr; s = sprintf(buf0, "AF_INET %pI4\n", &addr_in->sin_addr.s_addr); break; case AF_INET6: addr_in6 = (struct sockaddr_in6 *)addr; s = sprintf(buf0, "AF_INET6 %pI6\n", &addr_in6->sin6_addr); break; default: s = sprintf(buf0, "%s\n", "<UNKNOWN>"); break; } allowance -= s; if (allowance >= 0) strcat(buf, buf0); else { allowance += s; break; } } return 4096 - allowance; } static ssize_t comm_mark_show(struct config_item *item, char *buf) { return sprintf(buf, "%u\n", config_item_to_comm(item)->mark); } static ssize_t comm_mark_store(struct config_item *item, const char *buf, size_t len) { struct dlm_comm *comm; unsigned int mark; int rc; rc = kstrtouint(buf, 0, &mark); if (rc) return rc; if (mark == 0) mark = dlm_config.ci_mark; comm = config_item_to_comm(item); rc = dlm_lowcomms_nodes_set_mark(comm->nodeid, mark); if (rc) return rc; comm->mark = mark; return len; } CONFIGFS_ATTR(comm_, nodeid); CONFIGFS_ATTR(comm_, local); CONFIGFS_ATTR(comm_, mark); CONFIGFS_ATTR_WO(comm_, addr); CONFIGFS_ATTR_RO(comm_, addr_list); static struct configfs_attribute *comm_attrs[] = { [COMM_ATTR_NODEID] = &comm_attr_nodeid, [COMM_ATTR_LOCAL] = &comm_attr_local, [COMM_ATTR_ADDR] = &comm_attr_addr, [COMM_ATTR_ADDR_LIST] = &comm_attr_addr_list, [COMM_ATTR_MARK] = &comm_attr_mark, NULL, }; static ssize_t node_nodeid_show(struct config_item *item, char *buf) { unsigned int nodeid; int rv; rv = kstrtouint(config_item_name(item), 0, &nodeid); if (WARN_ON(rv)) return rv; return sprintf(buf, "%u\n", nodeid); } static ssize_t node_nodeid_store(struct config_item *item, const char *buf, size_t len) { return len; } static ssize_t node_weight_show(struct config_item *item, char *buf) { return sprintf(buf, "%d\n", config_item_to_node(item)->weight); } static ssize_t node_weight_store(struct config_item *item, const char *buf, size_t len) { int rc = kstrtoint(buf, 0, &config_item_to_node(item)->weight); if (rc) return rc; return len; } CONFIGFS_ATTR(node_, nodeid); CONFIGFS_ATTR(node_, weight); static struct configfs_attribute *node_attrs[] = { [NODE_ATTR_NODEID] = &node_attr_nodeid, [NODE_ATTR_WEIGHT] = &node_attr_weight, NULL, }; /* * Functions for the dlm to get the info that's been configured */ static struct dlm_space *get_space(char *name) { struct config_item *i; if (!space_list) return NULL; mutex_lock(&space_list->cg_subsys->su_mutex); i = config_group_find_item(space_list, name); mutex_unlock(&space_list->cg_subsys->su_mutex); return config_item_to_space(i); } static void put_space(struct dlm_space *sp) { config_item_put(&sp->group.cg_item); } static struct dlm_comm *get_comm(int nodeid) { struct config_item *i; struct dlm_comm *cm = NULL; int found = 0; if (!comm_list) return NULL; WARN_ON_ONCE(!mutex_is_locked(&clusters_root.subsys.su_mutex)); list_for_each_entry(i, &comm_list->cg_children, ci_entry) { cm = config_item_to_comm(i); if (cm->nodeid != nodeid) continue; found = 1; config_item_get(i); break; } if (!found) cm = NULL; return cm; } static void put_comm(struct dlm_comm *cm) { config_item_put(&cm->item); } /* caller must free mem */ int dlm_config_nodes(char *lsname, struct dlm_config_node **nodes_out, int *count_out) { struct dlm_space *sp; struct dlm_node *nd; struct dlm_config_node *nodes, *node; int rv, count; sp = get_space(lsname); if (!sp) return -EEXIST; mutex_lock(&sp->members_lock); if (!sp->members_count) { rv = -EINVAL; printk(KERN_ERR "dlm: zero members_count\n"); goto out; } count = sp->members_count; nodes = kcalloc(count, sizeof(struct dlm_config_node), GFP_NOFS); if (!nodes) { rv = -ENOMEM; goto out; } node = nodes; list_for_each_entry(nd, &sp->members, list) { node->nodeid = nd->nodeid; node->weight = nd->weight; node->new = nd->new; node->comm_seq = nd->comm_seq; node++; nd->new = 0; } *count_out = count; *nodes_out = nodes; rv = 0; out: mutex_unlock(&sp->members_lock); put_space(sp); return rv; } int dlm_comm_seq(int nodeid, uint32_t *seq, bool locked) { struct dlm_comm *cm; if (locked) { cm = get_comm(nodeid); } else { mutex_lock(&clusters_root.subsys.su_mutex); cm = get_comm(nodeid); mutex_unlock(&clusters_root.subsys.su_mutex); } if (!cm) return -ENOENT; *seq = cm->seq; put_comm(cm); return 0; } int dlm_our_nodeid(void) { return local_comm->nodeid; } /* num 0 is first addr, num 1 is second addr */ int dlm_our_addr(struct sockaddr_storage *addr, int num) { if (!local_comm) return -1; if (num + 1 > local_comm->addr_count) return -1; memcpy(addr, local_comm->addr[num], sizeof(*addr)); return 0; } /* Config file defaults */ #define DEFAULT_TCP_PORT 21064 #define DEFAULT_RSBTBL_SIZE 1024 #define DEFAULT_RECOVER_TIMER 5 #define DEFAULT_TOSS_SECS 10 #define DEFAULT_SCAN_SECS 5 #define DEFAULT_LOG_DEBUG 0 #define DEFAULT_LOG_INFO 1 #define DEFAULT_PROTOCOL DLM_PROTO_TCP #define DEFAULT_MARK 0 #define DEFAULT_NEW_RSB_COUNT 128 #define DEFAULT_RECOVER_CALLBACKS 0 #define DEFAULT_CLUSTER_NAME "" struct dlm_config_info dlm_config = { .ci_tcp_port = cpu_to_be16(DEFAULT_TCP_PORT), .ci_buffer_size = DLM_MAX_SOCKET_BUFSIZE, .ci_rsbtbl_size = DEFAULT_RSBTBL_SIZE, .ci_recover_timer = DEFAULT_RECOVER_TIMER, .ci_toss_secs = DEFAULT_TOSS_SECS, .ci_scan_secs = DEFAULT_SCAN_SECS, .ci_log_debug = DEFAULT_LOG_DEBUG, .ci_log_info = DEFAULT_LOG_INFO, .ci_protocol = DEFAULT_PROTOCOL, .ci_mark = DEFAULT_MARK, .ci_new_rsb_count = DEFAULT_NEW_RSB_COUNT, .ci_recover_callbacks = DEFAULT_RECOVER_CALLBACKS, .ci_cluster_name = DEFAULT_CLUSTER_NAME }; |
| 18 18 19 18 18 18 18 18 18 18 17 18 18 18 20 18 18 18 18 17 18 18 18 18 17 18 18 18 18 18 18 18 18 18 18 18 17 18 18 18 18 20 17 20 18 18 18 18 20 18 18 18 18 18 18 5 5 5 20 18 20 17 17 18 5 18 20 18 17 20 20 18 18 18 18 18 18 18 20 5 5 5 20 18 18 18 18 18 20 18 18 18 18 18 18 18 18 18 18 17 18 20 19 19 19 19 18 20 18 18 18 20 20 20 20 20 20 20 19 18 18 18 4 19 18 18 18 18 2 2 2 2 2 19 5 2 2 2 2 2 2 2 2 19 19 18 18 18 18 5 5 18 18 18 19 18 18 19 19 18 19 19 19 18 18 19 19 18 19 5 3 3 3 19 18 16 3 3 3 3 18 19 19 18 16 16 16 16 19 19 19 19 19 18 18 18 19 18 19 19 19 19 19 19 19 19 19 18 19 19 19 19 19 19 19 19 19 18 16 5 5 19 18 18 19 19 19 19 19 18 19 18 17 1 18 18 18 18 18 17 18 19 18 18 18 19 19 18 19 19 19 19 19 19 19 19 19 18 18 18 18 19 19 19 19 17 18 18 19 18 19 19 19 19 19 19 18 18 19 18 18 18 19 19 19 19 18 18 19 18 18 18 19 19 19 19 19 19 19 19 18 19 18 19 19 19 18 19 18 18 18 18 19 18 19 19 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 | /* * Copyright (C) 2014 Red Hat * Copyright (C) 2014 Intel Corp. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Rob Clark <robdclark@gmail.com> * Daniel Vetter <daniel.vetter@ffwll.ch> */ #include <linux/dma-fence.h> #include <linux/ktime.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_atomic_uapi.h> #include <drm/drm_blend.h> #include <drm/drm_bridge.h> #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> #include <drm/drm_framebuffer.h> #include <drm/drm_gem_atomic_helper.h> #include <drm/drm_panic.h> #include <drm/drm_print.h> #include <drm/drm_self_refresh_helper.h> #include <drm/drm_vblank.h> #include <drm/drm_writeback.h> #include "drm_crtc_helper_internal.h" #include "drm_crtc_internal.h" /** * DOC: overview * * This helper library provides implementations of check and commit functions on * top of the CRTC modeset helper callbacks and the plane helper callbacks. It * also provides convenience implementations for the atomic state handling * callbacks for drivers which don't need to subclass the drm core structures to * add their own additional internal state. * * This library also provides default implementations for the check callback in * drm_atomic_helper_check() and for the commit callback with * drm_atomic_helper_commit(). But the individual stages and callbacks are * exposed to allow drivers to mix and match and e.g. use the plane helpers only * together with a driver private modeset implementation. * * This library also provides implementations for all the legacy driver * interfaces on top of the atomic interface. See drm_atomic_helper_set_config(), * drm_atomic_helper_disable_plane(), and the various functions to implement * set_property callbacks. New drivers must not implement these functions * themselves but must use the provided helpers. * * The atomic helper uses the same function table structures as all other * modesetting helpers. See the documentation for &struct drm_crtc_helper_funcs, * struct &drm_encoder_helper_funcs and &struct drm_connector_helper_funcs. It * also shares the &struct drm_plane_helper_funcs function table with the plane * helpers. */ static void drm_atomic_helper_plane_changed(struct drm_atomic_state *state, struct drm_plane_state *old_plane_state, struct drm_plane_state *plane_state, struct drm_plane *plane) { struct drm_crtc_state *crtc_state; if (old_plane_state->crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, old_plane_state->crtc); if (WARN_ON(!crtc_state)) return; crtc_state->planes_changed = true; } if (plane_state->crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, plane_state->crtc); if (WARN_ON(!crtc_state)) return; crtc_state->planes_changed = true; } } static int handle_conflicting_encoders(struct drm_atomic_state *state, bool disable_conflicting_encoders) { struct drm_connector_state *new_conn_state; struct drm_connector *connector; struct drm_connector_list_iter conn_iter; struct drm_encoder *encoder; unsigned int encoder_mask = 0; int i, ret = 0; /* * First loop, find all newly assigned encoders from the connectors * part of the state. If the same encoder is assigned to multiple * connectors bail out. */ for_each_new_connector_in_state(state, connector, new_conn_state, i) { const struct drm_connector_helper_funcs *funcs = connector->helper_private; struct drm_encoder *new_encoder; if (!new_conn_state->crtc) continue; if (funcs->atomic_best_encoder) new_encoder = funcs->atomic_best_encoder(connector, state); else if (funcs->best_encoder) new_encoder = funcs->best_encoder(connector); else new_encoder = drm_connector_get_single_encoder(connector); if (new_encoder) { if (encoder_mask & drm_encoder_mask(new_encoder)) { drm_dbg_atomic(connector->dev, "[ENCODER:%d:%s] on [CONNECTOR:%d:%s] already assigned\n", new_encoder->base.id, new_encoder->name, connector->base.id, connector->name); return -EINVAL; } encoder_mask |= drm_encoder_mask(new_encoder); } } if (!encoder_mask) return 0; /* * Second loop, iterate over all connectors not part of the state. * * If a conflicting encoder is found and disable_conflicting_encoders * is not set, an error is returned. Userspace can provide a solution * through the atomic ioctl. * * If the flag is set conflicting connectors are removed from the CRTC * and the CRTC is disabled if no encoder is left. This preserves * compatibility with the legacy set_config behavior. */ drm_connector_list_iter_begin(state->dev, &conn_iter); drm_for_each_connector_iter(connector, &conn_iter) { struct drm_crtc_state *crtc_state; if (drm_atomic_get_new_connector_state(state, connector)) continue; encoder = connector->state->best_encoder; if (!encoder || !(encoder_mask & drm_encoder_mask(encoder))) continue; if (!disable_conflicting_encoders) { drm_dbg_atomic(connector->dev, "[ENCODER:%d:%s] in use on [CRTC:%d:%s] by [CONNECTOR:%d:%s]\n", encoder->base.id, encoder->name, connector->state->crtc->base.id, connector->state->crtc->name, connector->base.id, connector->name); ret = -EINVAL; goto out; } new_conn_state = drm_atomic_get_connector_state(state, connector); if (IS_ERR(new_conn_state)) { ret = PTR_ERR(new_conn_state); goto out; } drm_dbg_atomic(connector->dev, "[ENCODER:%d:%s] in use on [CRTC:%d:%s], disabling [CONNECTOR:%d:%s]\n", encoder->base.id, encoder->name, new_conn_state->crtc->base.id, new_conn_state->crtc->name, connector->base.id, connector->name); crtc_state = drm_atomic_get_new_crtc_state(state, new_conn_state->crtc); ret = drm_atomic_set_crtc_for_connector(new_conn_state, NULL); if (ret) goto out; if (!crtc_state->connector_mask) { ret = drm_atomic_set_mode_prop_for_crtc(crtc_state, NULL); if (ret < 0) goto out; crtc_state->active = false; } } out: drm_connector_list_iter_end(&conn_iter); return ret; } static void set_best_encoder(struct drm_atomic_state *state, struct drm_connector_state *conn_state, struct drm_encoder *encoder) { struct drm_crtc_state *crtc_state; struct drm_crtc *crtc; if (conn_state->best_encoder) { /* Unset the encoder_mask in the old crtc state. */ crtc = conn_state->connector->state->crtc; /* A NULL crtc is an error here because we should have * duplicated a NULL best_encoder when crtc was NULL. * As an exception restoring duplicated atomic state * during resume is allowed, so don't warn when * best_encoder is equal to encoder we intend to set. */ WARN_ON(!crtc && encoder != conn_state->best_encoder); if (crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, crtc); crtc_state->encoder_mask &= ~drm_encoder_mask(conn_state->best_encoder); } } if (encoder) { crtc = conn_state->crtc; WARN_ON(!crtc); if (crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, crtc); crtc_state->encoder_mask |= drm_encoder_mask(encoder); } } conn_state->best_encoder = encoder; } static void steal_encoder(struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_crtc_state *crtc_state; struct drm_connector *connector; struct drm_connector_state *old_connector_state, *new_connector_state; int i; for_each_oldnew_connector_in_state(state, connector, old_connector_state, new_connector_state, i) { struct drm_crtc *encoder_crtc; if (new_connector_state->best_encoder != encoder) continue; encoder_crtc = old_connector_state->crtc; drm_dbg_atomic(encoder->dev, "[ENCODER:%d:%s] in use on [CRTC:%d:%s], stealing it\n", encoder->base.id, encoder->name, encoder_crtc->base.id, encoder_crtc->name); set_best_encoder(state, new_connector_state, NULL); crtc_state = drm_atomic_get_new_crtc_state(state, encoder_crtc); crtc_state->connectors_changed = true; return; } } static int update_connector_routing(struct drm_atomic_state *state, struct drm_connector *connector, struct drm_connector_state *old_connector_state, struct drm_connector_state *new_connector_state, bool added_by_user) { const struct drm_connector_helper_funcs *funcs; struct drm_encoder *new_encoder; struct drm_crtc_state *crtc_state; drm_dbg_atomic(connector->dev, "Updating routing for [CONNECTOR:%d:%s]\n", connector->base.id, connector->name); if (old_connector_state->crtc != new_connector_state->crtc) { if (old_connector_state->crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, old_connector_state->crtc); crtc_state->connectors_changed = true; } if (new_connector_state->crtc) { crtc_state = drm_atomic_get_new_crtc_state(state, new_connector_state->crtc); crtc_state->connectors_changed = true; } } if (!new_connector_state->crtc) { drm_dbg_atomic(connector->dev, "Disabling [CONNECTOR:%d:%s]\n", connector->base.id, connector->name); set_best_encoder(state, new_connector_state, NULL); return 0; } crtc_state = drm_atomic_get_new_crtc_state(state, new_connector_state->crtc); /* * For compatibility with legacy users, we want to make sure that * we allow DPMS On->Off modesets on unregistered connectors. Modesets * which would result in anything else must be considered invalid, to * avoid turning on new displays on dead connectors. * * Since the connector can be unregistered at any point during an * atomic check or commit, this is racy. But that's OK: all we care * about is ensuring that userspace can't do anything but shut off the * display on a connector that was destroyed after it's been notified, * not before. * * Additionally, we also want to ignore connector registration when * we're trying to restore an atomic state during system resume since * there's a chance the connector may have been destroyed during the * process, but it's better to ignore that then cause * drm_atomic_helper_resume() to fail. * * Last, we want to ignore connector registration when the connector * was not pulled in the atomic state by user-space (ie, was pulled * in by the driver, e.g. when updating a DP-MST stream). */ if (!state->duplicated && drm_connector_is_unregistered(connector) && added_by_user && crtc_state->active) { drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] is not registered\n", connector->base.id, connector->name); return -EINVAL; } funcs = connector->helper_private; if (funcs->atomic_best_encoder) new_encoder = funcs->atomic_best_encoder(connector, state); else if (funcs->best_encoder) new_encoder = funcs->best_encoder(connector); else new_encoder = drm_connector_get_single_encoder(connector); if (!new_encoder) { drm_dbg_atomic(connector->dev, "No suitable encoder found for [CONNECTOR:%d:%s]\n", connector->base.id, connector->name); return -EINVAL; } if (!drm_encoder_crtc_ok(new_encoder, new_connector_state->crtc)) { drm_dbg_atomic(connector->dev, "[ENCODER:%d:%s] incompatible with [CRTC:%d:%s]\n", new_encoder->base.id, new_encoder->name, new_connector_state->crtc->base.id, new_connector_state->crtc->name); return -EINVAL; } if (new_encoder == new_connector_state->best_encoder) { set_best_encoder(state, new_connector_state, new_encoder); drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] keeps [ENCODER:%d:%s], now on [CRTC:%d:%s]\n", connector->base.id, connector->name, new_encoder->base.id, new_encoder->name, new_connector_state->crtc->base.id, new_connector_state->crtc->name); return 0; } steal_encoder(state, new_encoder); set_best_encoder(state, new_connector_state, new_encoder); crtc_state->connectors_changed = true; drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] using [ENCODER:%d:%s] on [CRTC:%d:%s]\n", connector->base.id, connector->name, new_encoder->base.id, new_encoder->name, new_connector_state->crtc->base.id, new_connector_state->crtc->name); return 0; } static int mode_fixup(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; struct drm_connector *connector; struct drm_connector_state *new_conn_state; int i; int ret; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { if (!new_crtc_state->mode_changed && !new_crtc_state->connectors_changed) continue; drm_mode_copy(&new_crtc_state->adjusted_mode, &new_crtc_state->mode); } for_each_new_connector_in_state(state, connector, new_conn_state, i) { const struct drm_encoder_helper_funcs *funcs; struct drm_encoder *encoder; struct drm_bridge *bridge; WARN_ON(!!new_conn_state->best_encoder != !!new_conn_state->crtc); if (!new_conn_state->crtc || !new_conn_state->best_encoder) continue; new_crtc_state = drm_atomic_get_new_crtc_state(state, new_conn_state->crtc); /* * Each encoder has at most one connector (since we always steal * it away), so we won't call ->mode_fixup twice. */ encoder = new_conn_state->best_encoder; funcs = encoder->helper_private; bridge = drm_bridge_chain_get_first_bridge(encoder); ret = drm_atomic_bridge_chain_check(bridge, new_crtc_state, new_conn_state); if (ret) { drm_dbg_atomic(encoder->dev, "Bridge atomic check failed\n"); return ret; } if (funcs && funcs->atomic_check) { ret = funcs->atomic_check(encoder, new_crtc_state, new_conn_state); if (ret) { drm_dbg_atomic(encoder->dev, "[ENCODER:%d:%s] check failed\n", encoder->base.id, encoder->name); return ret; } } else if (funcs && funcs->mode_fixup) { ret = funcs->mode_fixup(encoder, &new_crtc_state->mode, &new_crtc_state->adjusted_mode); if (!ret) { drm_dbg_atomic(encoder->dev, "[ENCODER:%d:%s] fixup failed\n", encoder->base.id, encoder->name); return -EINVAL; } } } for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; if (!new_crtc_state->enable) continue; if (!new_crtc_state->mode_changed && !new_crtc_state->connectors_changed) continue; funcs = crtc->helper_private; if (!funcs || !funcs->mode_fixup) continue; ret = funcs->mode_fixup(crtc, &new_crtc_state->mode, &new_crtc_state->adjusted_mode); if (!ret) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] fixup failed\n", crtc->base.id, crtc->name); return -EINVAL; } } return 0; } static enum drm_mode_status mode_valid_path(struct drm_connector *connector, struct drm_encoder *encoder, struct drm_crtc *crtc, const struct drm_display_mode *mode) { struct drm_bridge *bridge; enum drm_mode_status ret; ret = drm_encoder_mode_valid(encoder, mode); if (ret != MODE_OK) { drm_dbg_atomic(encoder->dev, "[ENCODER:%d:%s] mode_valid() failed\n", encoder->base.id, encoder->name); return ret; } bridge = drm_bridge_chain_get_first_bridge(encoder); ret = drm_bridge_chain_mode_valid(bridge, &connector->display_info, mode); if (ret != MODE_OK) { drm_dbg_atomic(encoder->dev, "[BRIDGE] mode_valid() failed\n"); return ret; } ret = drm_crtc_mode_valid(crtc, mode); if (ret != MODE_OK) { drm_dbg_atomic(encoder->dev, "[CRTC:%d:%s] mode_valid() failed\n", crtc->base.id, crtc->name); return ret; } return ret; } static int mode_valid(struct drm_atomic_state *state) { struct drm_connector_state *conn_state; struct drm_connector *connector; int i; for_each_new_connector_in_state(state, connector, conn_state, i) { struct drm_encoder *encoder = conn_state->best_encoder; struct drm_crtc *crtc = conn_state->crtc; struct drm_crtc_state *crtc_state; enum drm_mode_status mode_status; const struct drm_display_mode *mode; if (!crtc || !encoder) continue; crtc_state = drm_atomic_get_new_crtc_state(state, crtc); if (!crtc_state) continue; if (!crtc_state->mode_changed && !crtc_state->connectors_changed) continue; mode = &crtc_state->mode; mode_status = mode_valid_path(connector, encoder, crtc, mode); if (mode_status != MODE_OK) return -EINVAL; } return 0; } static int drm_atomic_check_valid_clones(struct drm_atomic_state *state, struct drm_crtc *crtc) { struct drm_encoder *drm_enc; struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc); drm_for_each_encoder_mask(drm_enc, crtc->dev, crtc_state->encoder_mask) { if (!drm_enc->possible_clones) { DRM_DEBUG("enc%d possible_clones is 0\n", drm_enc->base.id); continue; } if ((crtc_state->encoder_mask & drm_enc->possible_clones) != crtc_state->encoder_mask) { DRM_DEBUG("crtc%d failed valid clone check for mask 0x%x\n", crtc->base.id, crtc_state->encoder_mask); return -EINVAL; } } return 0; } /** * drm_atomic_helper_check_modeset - validate state object for modeset changes * @dev: DRM device * @state: the driver state object * * Check the state object to see if the requested state is physically possible. * This does all the CRTC and connector related computations for an atomic * update and adds any additional connectors needed for full modesets. It calls * the various per-object callbacks in the follow order: * * 1. &drm_connector_helper_funcs.atomic_best_encoder for determining the new encoder. * 2. &drm_connector_helper_funcs.atomic_check to validate the connector state. * 3. If it's determined a modeset is needed then all connectors on the affected * CRTC are added and &drm_connector_helper_funcs.atomic_check is run on them. * 4. &drm_encoder_helper_funcs.mode_valid, &drm_bridge_funcs.mode_valid and * &drm_crtc_helper_funcs.mode_valid are called on the affected components. * 5. &drm_bridge_funcs.mode_fixup is called on all encoder bridges. * 6. &drm_encoder_helper_funcs.atomic_check is called to validate any encoder state. * This function is only called when the encoder will be part of a configured CRTC, * it must not be used for implementing connector property validation. * If this function is NULL, &drm_atomic_encoder_helper_funcs.mode_fixup is called * instead. * 7. &drm_crtc_helper_funcs.mode_fixup is called last, to fix up the mode with CRTC constraints. * * &drm_crtc_state.mode_changed is set when the input mode is changed. * &drm_crtc_state.connectors_changed is set when a connector is added or * removed from the CRTC. &drm_crtc_state.active_changed is set when * &drm_crtc_state.active changes, which is used for DPMS. * &drm_crtc_state.no_vblank is set from the result of drm_dev_has_vblank(). * See also: drm_atomic_crtc_needs_modeset() * * IMPORTANT: * * Drivers which set &drm_crtc_state.mode_changed (e.g. in their * &drm_plane_helper_funcs.atomic_check hooks if a plane update can't be done * without a full modeset) _must_ call this function after that change. It is * permitted to call this function multiple times for the same update, e.g. * when the &drm_crtc_helper_funcs.atomic_check functions depend upon the * adjusted dotclock for fifo space allocation and watermark computation. * * RETURNS: * Zero for success or -errno */ int drm_atomic_helper_check_modeset(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; struct drm_connector *connector; struct drm_connector_state *old_connector_state, *new_connector_state; int i, ret; unsigned int connectors_mask = 0, user_connectors_mask = 0; for_each_oldnew_connector_in_state(state, connector, old_connector_state, new_connector_state, i) user_connectors_mask |= BIT(i); for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { bool has_connectors = !!new_crtc_state->connector_mask; WARN_ON(!drm_modeset_is_locked(&crtc->mutex)); if (!drm_mode_equal(&old_crtc_state->mode, &new_crtc_state->mode)) { drm_dbg_atomic(dev, "[CRTC:%d:%s] mode changed\n", crtc->base.id, crtc->name); new_crtc_state->mode_changed = true; } if (old_crtc_state->enable != new_crtc_state->enable) { drm_dbg_atomic(dev, "[CRTC:%d:%s] enable changed\n", crtc->base.id, crtc->name); /* * For clarity this assignment is done here, but * enable == 0 is only true when there are no * connectors and a NULL mode. * * The other way around is true as well. enable != 0 * implies that connectors are attached and a mode is set. */ new_crtc_state->mode_changed = true; new_crtc_state->connectors_changed = true; } if (old_crtc_state->active != new_crtc_state->active) { drm_dbg_atomic(dev, "[CRTC:%d:%s] active changed\n", crtc->base.id, crtc->name); new_crtc_state->active_changed = true; } if (new_crtc_state->enable != has_connectors) { drm_dbg_atomic(dev, "[CRTC:%d:%s] enabled/connectors mismatch (%d/%d)\n", crtc->base.id, crtc->name, new_crtc_state->enable, has_connectors); return -EINVAL; } if (drm_dev_has_vblank(dev)) new_crtc_state->no_vblank = false; else new_crtc_state->no_vblank = true; } ret = handle_conflicting_encoders(state, false); if (ret) return ret; for_each_oldnew_connector_in_state(state, connector, old_connector_state, new_connector_state, i) { const struct drm_connector_helper_funcs *funcs = connector->helper_private; WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex)); /* * This only sets crtc->connectors_changed for routing changes, * drivers must set crtc->connectors_changed themselves when * connector properties need to be updated. */ ret = update_connector_routing(state, connector, old_connector_state, new_connector_state, BIT(i) & user_connectors_mask); if (ret) return ret; if (old_connector_state->crtc) { new_crtc_state = drm_atomic_get_new_crtc_state(state, old_connector_state->crtc); if (old_connector_state->link_status != new_connector_state->link_status) new_crtc_state->connectors_changed = true; if (old_connector_state->max_requested_bpc != new_connector_state->max_requested_bpc) new_crtc_state->connectors_changed = true; } if (funcs->atomic_check) ret = funcs->atomic_check(connector, state); if (ret) { drm_dbg_atomic(dev, "[CONNECTOR:%d:%s] driver check failed\n", connector->base.id, connector->name); return ret; } connectors_mask |= BIT(i); } /* * After all the routing has been prepared we need to add in any * connector which is itself unchanged, but whose CRTC changes its * configuration. This must be done before calling mode_fixup in case a * crtc only changed its mode but has the same set of connectors. */ for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { if (!drm_atomic_crtc_needs_modeset(new_crtc_state)) continue; drm_dbg_atomic(dev, "[CRTC:%d:%s] needs all connectors, enable: %c, active: %c\n", crtc->base.id, crtc->name, new_crtc_state->enable ? 'y' : 'n', new_crtc_state->active ? 'y' : 'n'); ret = drm_atomic_add_affected_connectors(state, crtc); if (ret != 0) return ret; ret = drm_atomic_add_affected_planes(state, crtc); if (ret != 0) return ret; ret = drm_atomic_check_valid_clones(state, crtc); if (ret != 0) return ret; } /* * Iterate over all connectors again, to make sure atomic_check() * has been called on them when a modeset is forced. */ for_each_oldnew_connector_in_state(state, connector, old_connector_state, new_connector_state, i) { const struct drm_connector_helper_funcs *funcs = connector->helper_private; if (connectors_mask & BIT(i)) continue; if (funcs->atomic_check) ret = funcs->atomic_check(connector, state); if (ret) { drm_dbg_atomic(dev, "[CONNECTOR:%d:%s] driver check failed\n", connector->base.id, connector->name); return ret; } } /* * Iterate over all connectors again, and add all affected bridges to * the state. */ for_each_oldnew_connector_in_state(state, connector, old_connector_state, new_connector_state, i) { struct drm_encoder *encoder; encoder = old_connector_state->best_encoder; ret = drm_atomic_add_encoder_bridges(state, encoder); if (ret) return ret; encoder = new_connector_state->best_encoder; ret = drm_atomic_add_encoder_bridges(state, encoder); if (ret) return ret; } ret = mode_valid(state); if (ret) return ret; return mode_fixup(state); } EXPORT_SYMBOL(drm_atomic_helper_check_modeset); /** * drm_atomic_helper_check_wb_connector_state() - Check writeback connector state * @connector: corresponding connector * @state: the driver state object * * Checks if the writeback connector state is valid, and returns an error if it * isn't. * * RETURNS: * Zero for success or -errno */ int drm_atomic_helper_check_wb_connector_state(struct drm_connector *connector, struct drm_atomic_state *state) { struct drm_connector_state *conn_state = drm_atomic_get_new_connector_state(state, connector); struct drm_writeback_job *wb_job = conn_state->writeback_job; struct drm_property_blob *pixel_format_blob; struct drm_framebuffer *fb; size_t i, nformats; u32 *formats; if (!wb_job || !wb_job->fb) return 0; pixel_format_blob = wb_job->connector->pixel_formats_blob_ptr; nformats = pixel_format_blob->length / sizeof(u32); formats = pixel_format_blob->data; fb = wb_job->fb; for (i = 0; i < nformats; i++) if (fb->format->format == formats[i]) return 0; drm_dbg_kms(connector->dev, "Invalid pixel format %p4cc\n", &fb->format->format); return -EINVAL; } EXPORT_SYMBOL(drm_atomic_helper_check_wb_connector_state); /** * drm_atomic_helper_check_plane_state() - Check plane state for validity * @plane_state: plane state to check * @crtc_state: CRTC state to check * @min_scale: minimum @src:@dest scaling factor in 16.16 fixed point * @max_scale: maximum @src:@dest scaling factor in 16.16 fixed point * @can_position: is it legal to position the plane such that it * doesn't cover the entire CRTC? This will generally * only be false for primary planes. * @can_update_disabled: can the plane be updated while the CRTC * is disabled? * * Checks that a desired plane update is valid, and updates various * bits of derived state (clipped coordinates etc.). Drivers that provide * their own plane handling rather than helper-provided implementations may * still wish to call this function to avoid duplication of error checking * code. * * RETURNS: * Zero if update appears valid, error code on failure */ int drm_atomic_helper_check_plane_state(struct drm_plane_state *plane_state, const struct drm_crtc_state *crtc_state, int min_scale, int max_scale, bool can_position, bool can_update_disabled) { struct drm_framebuffer *fb = plane_state->fb; struct drm_rect *src = &plane_state->src; struct drm_rect *dst = &plane_state->dst; unsigned int rotation = plane_state->rotation; struct drm_rect clip = {}; int hscale, vscale; WARN_ON(plane_state->crtc && plane_state->crtc != crtc_state->crtc); *src = drm_plane_state_src(plane_state); *dst = drm_plane_state_dest(plane_state); if (!fb) { plane_state->visible = false; return 0; } /* crtc should only be NULL when disabling (i.e., !fb) */ if (WARN_ON(!plane_state->crtc)) { plane_state->visible = false; return 0; } if (!crtc_state->enable && !can_update_disabled) { drm_dbg_kms(plane_state->plane->dev, "Cannot update plane of a disabled CRTC.\n"); return -EINVAL; } drm_rect_rotate(src, fb->width << 16, fb->height << 16, rotation); /* Check scaling */ hscale = drm_rect_calc_hscale(src, dst, min_scale, max_scale); vscale = drm_rect_calc_vscale(src, dst, min_scale, max_scale); if (hscale < 0 || vscale < 0) { drm_dbg_kms(plane_state->plane->dev, "Invalid scaling of plane\n"); drm_rect_debug_print("src: ", &plane_state->src, true); drm_rect_debug_print("dst: ", &plane_state->dst, false); return -ERANGE; } if (crtc_state->enable) drm_mode_get_hv_timing(&crtc_state->mode, &clip.x2, &clip.y2); plane_state->visible = drm_rect_clip_scaled(src, dst, &clip); drm_rect_rotate_inv(src, fb->width << 16, fb->height << 16, rotation); if (!plane_state->visible) /* * Plane isn't visible; some drivers can handle this * so we just return success here. Drivers that can't * (including those that use the primary plane helper's * update function) will return an error from their * update_plane handler. */ return 0; if (!can_position && !drm_rect_equals(dst, &clip)) { drm_dbg_kms(plane_state->plane->dev, "Plane must cover entire CRTC\n"); drm_rect_debug_print("dst: ", dst, false); drm_rect_debug_print("clip: ", &clip, false); return -EINVAL; } return 0; } EXPORT_SYMBOL(drm_atomic_helper_check_plane_state); /** * drm_atomic_helper_check_crtc_primary_plane() - Check CRTC state for primary plane * @crtc_state: CRTC state to check * * Checks that a CRTC has at least one primary plane attached to it, which is * a requirement on some hardware. Note that this only involves the CRTC side * of the test. To test if the primary plane is visible or if it can be updated * without the CRTC being enabled, use drm_atomic_helper_check_plane_state() in * the plane's atomic check. * * RETURNS: * 0 if a primary plane is attached to the CRTC, or an error code otherwise */ int drm_atomic_helper_check_crtc_primary_plane(struct drm_crtc_state *crtc_state) { struct drm_crtc *crtc = crtc_state->crtc; struct drm_device *dev = crtc->dev; struct drm_plane *plane; /* needs at least one primary plane to be enabled */ drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) { if (plane->type == DRM_PLANE_TYPE_PRIMARY) return 0; } drm_dbg_atomic(dev, "[CRTC:%d:%s] primary plane missing\n", crtc->base.id, crtc->name); return -EINVAL; } EXPORT_SYMBOL(drm_atomic_helper_check_crtc_primary_plane); /** * drm_atomic_helper_check_planes - validate state object for planes changes * @dev: DRM device * @state: the driver state object * * Check the state object to see if the requested state is physically possible. * This does all the plane update related checks using by calling into the * &drm_crtc_helper_funcs.atomic_check and &drm_plane_helper_funcs.atomic_check * hooks provided by the driver. * * It also sets &drm_crtc_state.planes_changed to indicate that a CRTC has * updated planes. * * RETURNS: * Zero for success or -errno */ int drm_atomic_helper_check_planes(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; struct drm_plane *plane; struct drm_plane_state *new_plane_state, *old_plane_state; int i, ret = 0; for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs; WARN_ON(!drm_modeset_is_locked(&plane->mutex)); funcs = plane->helper_private; drm_atomic_helper_plane_changed(state, old_plane_state, new_plane_state, plane); drm_atomic_helper_check_plane_damage(state, new_plane_state); if (!funcs || !funcs->atomic_check) continue; ret = funcs->atomic_check(plane, state); if (ret) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] atomic driver check failed\n", plane->base.id, plane->name); return ret; } } for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; funcs = crtc->helper_private; if (!funcs || !funcs->atomic_check) continue; ret = funcs->atomic_check(crtc, state); if (ret) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] atomic driver check failed\n", crtc->base.id, crtc->name); return ret; } } return ret; } EXPORT_SYMBOL(drm_atomic_helper_check_planes); /** * drm_atomic_helper_check - validate state object * @dev: DRM device * @state: the driver state object * * Check the state object to see if the requested state is physically possible. * Only CRTCs and planes have check callbacks, so for any additional (global) * checking that a driver needs it can simply wrap that around this function. * Drivers without such needs can directly use this as their * &drm_mode_config_funcs.atomic_check callback. * * This just wraps the two parts of the state checking for planes and modeset * state in the default order: First it calls drm_atomic_helper_check_modeset() * and then drm_atomic_helper_check_planes(). The assumption is that the * @drm_plane_helper_funcs.atomic_check and @drm_crtc_helper_funcs.atomic_check * functions depend upon an updated adjusted_mode.clock to e.g. properly compute * watermarks. * * Note that zpos normalization will add all enable planes to the state which * might not desired for some drivers. * For example enable/disable of a cursor plane which have fixed zpos value * would trigger all other enabled planes to be forced to the state change. * * IMPORTANT: * * As this function calls drm_atomic_helper_check_modeset() internally, its * restrictions also apply: * Drivers which set &drm_crtc_state.mode_changed (e.g. in their * &drm_plane_helper_funcs.atomic_check hooks if a plane update can't be done * without a full modeset) _must_ call drm_atomic_helper_check_modeset() * function again after that change. * * RETURNS: * Zero for success or -errno */ int drm_atomic_helper_check(struct drm_device *dev, struct drm_atomic_state *state) { int ret; ret = drm_atomic_helper_check_modeset(dev, state); if (ret) return ret; if (dev->mode_config.normalize_zpos) { ret = drm_atomic_normalize_zpos(dev, state); if (ret) return ret; } ret = drm_atomic_helper_check_planes(dev, state); if (ret) return ret; if (state->legacy_cursor_update) state->async_update = !drm_atomic_helper_async_check(dev, state); drm_self_refresh_helper_alter_state(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_check); static bool crtc_needs_disable(struct drm_crtc_state *old_state, struct drm_crtc_state *new_state) { /* * No new_state means the CRTC is off, so the only criteria is whether * it's currently active or in self refresh mode. */ if (!new_state) return drm_atomic_crtc_effectively_active(old_state); /* * We need to disable bridge(s) and CRTC if we're transitioning out of * self-refresh and changing CRTCs at the same time, because the * bridge tracks self-refresh status via CRTC state. */ if (old_state->self_refresh_active && old_state->crtc != new_state->crtc) return true; /* * We also need to run through the crtc_funcs->disable() function if * the CRTC is currently on, if it's transitioning to self refresh * mode, or if it's in self refresh mode and needs to be fully * disabled. */ return old_state->active || (old_state->self_refresh_active && !new_state->active) || new_state->self_refresh_active; } static void disable_outputs(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_connector *connector; struct drm_connector_state *old_conn_state, *new_conn_state; struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; int i; for_each_oldnew_connector_in_state(state, connector, old_conn_state, new_conn_state, i) { const struct drm_encoder_helper_funcs *funcs; struct drm_encoder *encoder; struct drm_bridge *bridge; /* * Shut down everything that's in the changeset and currently * still on. So need to check the old, saved state. */ if (!old_conn_state->crtc) continue; old_crtc_state = drm_atomic_get_old_crtc_state(state, old_conn_state->crtc); if (new_conn_state->crtc) new_crtc_state = drm_atomic_get_new_crtc_state( state, new_conn_state->crtc); else new_crtc_state = NULL; if (!crtc_needs_disable(old_crtc_state, new_crtc_state) || !drm_atomic_crtc_needs_modeset(old_conn_state->crtc->state)) continue; encoder = old_conn_state->best_encoder; /* We shouldn't get this far if we didn't previously have * an encoder.. but WARN_ON() rather than explode. */ if (WARN_ON(!encoder)) continue; funcs = encoder->helper_private; drm_dbg_atomic(dev, "disabling [ENCODER:%d:%s]\n", encoder->base.id, encoder->name); /* * Each encoder has at most one connector (since we always steal * it away), so we won't call disable hooks twice. */ bridge = drm_bridge_chain_get_first_bridge(encoder); drm_atomic_bridge_chain_disable(bridge, state); /* Right function depends upon target state. */ if (funcs) { if (funcs->atomic_disable) funcs->atomic_disable(encoder, state); else if (new_conn_state->crtc && funcs->prepare) funcs->prepare(encoder); else if (funcs->disable) funcs->disable(encoder); else if (funcs->dpms) funcs->dpms(encoder, DRM_MODE_DPMS_OFF); } drm_atomic_bridge_chain_post_disable(bridge, state); } for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; int ret; /* Shut down everything that needs a full modeset. */ if (!drm_atomic_crtc_needs_modeset(new_crtc_state)) continue; if (!crtc_needs_disable(old_crtc_state, new_crtc_state)) continue; funcs = crtc->helper_private; drm_dbg_atomic(dev, "disabling [CRTC:%d:%s]\n", crtc->base.id, crtc->name); /* Right function depends upon target state. */ if (new_crtc_state->enable && funcs->prepare) funcs->prepare(crtc); else if (funcs->atomic_disable) funcs->atomic_disable(crtc, state); else if (funcs->disable) funcs->disable(crtc); else if (funcs->dpms) funcs->dpms(crtc, DRM_MODE_DPMS_OFF); if (!drm_dev_has_vblank(dev)) continue; ret = drm_crtc_vblank_get(crtc); /* * Self-refresh is not a true "disable"; ensure vblank remains * enabled. */ if (new_crtc_state->self_refresh_active) WARN_ONCE(ret != 0, "driver disabled vblank in self-refresh\n"); else WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n"); if (ret == 0) drm_crtc_vblank_put(crtc); } } /** * drm_atomic_helper_update_legacy_modeset_state - update legacy modeset state * @dev: DRM device * @state: atomic state object being committed * * This function updates all the various legacy modeset state pointers in * connectors, encoders and CRTCs. * * Drivers can use this for building their own atomic commit if they don't have * a pure helper-based modeset implementation. * * Since these updates are not synchronized with lockings, only code paths * called from &drm_mode_config_helper_funcs.atomic_commit_tail can look at the * legacy state filled out by this helper. Defacto this means this helper and * the legacy state pointers are only really useful for transitioning an * existing driver to the atomic world. */ void drm_atomic_helper_update_legacy_modeset_state(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_connector *connector; struct drm_connector_state *old_conn_state, *new_conn_state; struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; int i; /* clear out existing links and update dpms */ for_each_oldnew_connector_in_state(state, connector, old_conn_state, new_conn_state, i) { if (connector->encoder) { WARN_ON(!connector->encoder->crtc); connector->encoder->crtc = NULL; connector->encoder = NULL; } crtc = new_conn_state->crtc; if ((!crtc && old_conn_state->crtc) || (crtc && drm_atomic_crtc_needs_modeset(crtc->state))) { int mode = DRM_MODE_DPMS_OFF; if (crtc && crtc->state->active) mode = DRM_MODE_DPMS_ON; connector->dpms = mode; } } /* set new links */ for_each_new_connector_in_state(state, connector, new_conn_state, i) { if (!new_conn_state->crtc) continue; if (WARN_ON(!new_conn_state->best_encoder)) continue; connector->encoder = new_conn_state->best_encoder; connector->encoder->crtc = new_conn_state->crtc; } /* set legacy state in the crtc structure */ for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { struct drm_plane *primary = crtc->primary; struct drm_plane_state *new_plane_state; crtc->mode = new_crtc_state->mode; crtc->enabled = new_crtc_state->enable; new_plane_state = drm_atomic_get_new_plane_state(state, primary); if (new_plane_state && new_plane_state->crtc == crtc) { crtc->x = new_plane_state->src_x >> 16; crtc->y = new_plane_state->src_y >> 16; } } } EXPORT_SYMBOL(drm_atomic_helper_update_legacy_modeset_state); /** * drm_atomic_helper_calc_timestamping_constants - update vblank timestamping constants * @state: atomic state object * * Updates the timestamping constants used for precise vblank timestamps * by calling drm_calc_timestamping_constants() for all enabled crtcs in @state. */ void drm_atomic_helper_calc_timestamping_constants(struct drm_atomic_state *state) { struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; int i; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { if (new_crtc_state->enable) drm_calc_timestamping_constants(crtc, &new_crtc_state->adjusted_mode); } } EXPORT_SYMBOL(drm_atomic_helper_calc_timestamping_constants); static void crtc_set_mode(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; struct drm_connector *connector; struct drm_connector_state *new_conn_state; int i; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; if (!new_crtc_state->mode_changed) continue; funcs = crtc->helper_private; if (new_crtc_state->enable && funcs->mode_set_nofb) { drm_dbg_atomic(dev, "modeset on [CRTC:%d:%s]\n", crtc->base.id, crtc->name); funcs->mode_set_nofb(crtc); } } for_each_new_connector_in_state(state, connector, new_conn_state, i) { const struct drm_encoder_helper_funcs *funcs; struct drm_encoder *encoder; struct drm_display_mode *mode, *adjusted_mode; struct drm_bridge *bridge; if (!new_conn_state->best_encoder) continue; encoder = new_conn_state->best_encoder; funcs = encoder->helper_private; new_crtc_state = new_conn_state->crtc->state; mode = &new_crtc_state->mode; adjusted_mode = &new_crtc_state->adjusted_mode; if (!new_crtc_state->mode_changed && !new_crtc_state->connectors_changed) continue; drm_dbg_atomic(dev, "modeset on [ENCODER:%d:%s]\n", encoder->base.id, encoder->name); /* * Each encoder has at most one connector (since we always steal * it away), so we won't call mode_set hooks twice. */ if (funcs && funcs->atomic_mode_set) { funcs->atomic_mode_set(encoder, new_crtc_state, new_conn_state); } else if (funcs && funcs->mode_set) { funcs->mode_set(encoder, mode, adjusted_mode); } bridge = drm_bridge_chain_get_first_bridge(encoder); drm_bridge_chain_mode_set(bridge, mode, adjusted_mode); } } /** * drm_atomic_helper_commit_modeset_disables - modeset commit to disable outputs * @dev: DRM device * @state: atomic state object being committed * * This function shuts down all the outputs that need to be shut down and * prepares them (if required) with the new mode. * * For compatibility with legacy CRTC helpers this should be called before * drm_atomic_helper_commit_planes(), which is what the default commit function * does. But drivers with different needs can group the modeset commits together * and do the plane commits at the end. This is useful for drivers doing runtime * PM since planes updates then only happen when the CRTC is actually enabled. */ void drm_atomic_helper_commit_modeset_disables(struct drm_device *dev, struct drm_atomic_state *state) { disable_outputs(dev, state); drm_atomic_helper_update_legacy_modeset_state(dev, state); drm_atomic_helper_calc_timestamping_constants(state); crtc_set_mode(dev, state); } EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_disables); static void drm_atomic_helper_commit_writebacks(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_connector *connector; struct drm_connector_state *new_conn_state; int i; for_each_new_connector_in_state(state, connector, new_conn_state, i) { const struct drm_connector_helper_funcs *funcs; funcs = connector->helper_private; if (!funcs->atomic_commit) continue; if (new_conn_state->writeback_job && new_conn_state->writeback_job->fb) { WARN_ON(connector->connector_type != DRM_MODE_CONNECTOR_WRITEBACK); funcs->atomic_commit(connector, state); } } } /** * drm_atomic_helper_commit_modeset_enables - modeset commit to enable outputs * @dev: DRM device * @state: atomic state object being committed * * This function enables all the outputs with the new configuration which had to * be turned off for the update. * * For compatibility with legacy CRTC helpers this should be called after * drm_atomic_helper_commit_planes(), which is what the default commit function * does. But drivers with different needs can group the modeset commits together * and do the plane commits at the end. This is useful for drivers doing runtime * PM since planes updates then only happen when the CRTC is actually enabled. */ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; struct drm_crtc_state *new_crtc_state; struct drm_connector *connector; struct drm_connector_state *new_conn_state; int i; for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; /* Need to filter out CRTCs where only planes change. */ if (!drm_atomic_crtc_needs_modeset(new_crtc_state)) continue; if (!new_crtc_state->active) continue; funcs = crtc->helper_private; if (new_crtc_state->enable) { drm_dbg_atomic(dev, "enabling [CRTC:%d:%s]\n", crtc->base.id, crtc->name); if (funcs->atomic_enable) funcs->atomic_enable(crtc, state); else if (funcs->commit) funcs->commit(crtc); } } for_each_new_connector_in_state(state, connector, new_conn_state, i) { const struct drm_encoder_helper_funcs *funcs; struct drm_encoder *encoder; struct drm_bridge *bridge; if (!new_conn_state->best_encoder) continue; if (!new_conn_state->crtc->state->active || !drm_atomic_crtc_needs_modeset(new_conn_state->crtc->state)) continue; encoder = new_conn_state->best_encoder; funcs = encoder->helper_private; drm_dbg_atomic(dev, "enabling [ENCODER:%d:%s]\n", encoder->base.id, encoder->name); /* * Each encoder has at most one connector (since we always steal * it away), so we won't call enable hooks twice. */ bridge = drm_bridge_chain_get_first_bridge(encoder); drm_atomic_bridge_chain_pre_enable(bridge, state); if (funcs) { if (funcs->atomic_enable) funcs->atomic_enable(encoder, state); else if (funcs->enable) funcs->enable(encoder); else if (funcs->commit) funcs->commit(encoder); } drm_atomic_bridge_chain_enable(bridge, state); } drm_atomic_helper_commit_writebacks(dev, state); } EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables); /* * For atomic updates which touch just a single CRTC, calculate the time of the * next vblank, and inform all the fences of the deadline. */ static void set_fence_deadline(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; struct drm_plane *plane; struct drm_plane_state *new_plane_state; ktime_t vbltime = 0; int i; for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) { ktime_t v; if (drm_atomic_crtc_needs_modeset(new_crtc_state)) continue; if (!new_crtc_state->active) continue; if (drm_crtc_next_vblank_start(crtc, &v)) continue; if (!vbltime || ktime_before(v, vbltime)) vbltime = v; } /* If no CRTCs updated, then nothing to do: */ if (!vbltime) return; for_each_new_plane_in_state (state, plane, new_plane_state, i) { if (!new_plane_state->fence) continue; dma_fence_set_deadline(new_plane_state->fence, vbltime); } } /** * drm_atomic_helper_wait_for_fences - wait for fences stashed in plane state * @dev: DRM device * @state: atomic state object with old state structures * @pre_swap: If true, do an interruptible wait, and @state is the new state. * Otherwise @state is the old state. * * For implicit sync, driver should fish the exclusive fence out from the * incoming fb's and stash it in the drm_plane_state. This is called after * drm_atomic_helper_swap_state() so it uses the current plane state (and * just uses the atomic state to find the changed planes) * * Note that @pre_swap is needed since the point where we block for fences moves * around depending upon whether an atomic commit is blocking or * non-blocking. For non-blocking commit all waiting needs to happen after * drm_atomic_helper_swap_state() is called, but for blocking commits we want * to wait **before** we do anything that can't be easily rolled back. That is * before we call drm_atomic_helper_swap_state(). * * Returns zero if success or < 0 if dma_fence_wait() fails. */ int drm_atomic_helper_wait_for_fences(struct drm_device *dev, struct drm_atomic_state *state, bool pre_swap) { struct drm_plane *plane; struct drm_plane_state *new_plane_state; int i, ret; set_fence_deadline(dev, state); for_each_new_plane_in_state(state, plane, new_plane_state, i) { if (!new_plane_state->fence) continue; WARN_ON(!new_plane_state->fb); /* * If waiting for fences pre-swap (ie: nonblock), userspace can * still interrupt the operation. Instead of blocking until the * timer expires, make the wait interruptible. */ ret = dma_fence_wait(new_plane_state->fence, pre_swap); if (ret) return ret; dma_fence_put(new_plane_state->fence); new_plane_state->fence = NULL; } return 0; } EXPORT_SYMBOL(drm_atomic_helper_wait_for_fences); /** * drm_atomic_helper_wait_for_vblanks - wait for vblank on CRTCs * @dev: DRM device * @state: atomic state object being committed * * Helper to, after atomic commit, wait for vblanks on all affected * CRTCs (ie. before cleaning up old framebuffers using * drm_atomic_helper_cleanup_planes()). It will only wait on CRTCs where the * framebuffers have actually changed to optimize for the legacy cursor and * plane update use-case. * * Drivers using the nonblocking commit tracking support initialized by calling * drm_atomic_helper_setup_commit() should look at * drm_atomic_helper_wait_for_flip_done() as an alternative. */ void drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; int i, ret; unsigned int crtc_mask = 0; /* * Legacy cursor ioctls are completely unsynced, and userspace * relies on that (by doing tons of cursor updates). */ if (state->legacy_cursor_update) return; for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue; ret = drm_crtc_vblank_get(crtc); if (ret != 0) continue; crtc_mask |= drm_crtc_mask(crtc); state->crtcs[i].last_vblank_count = drm_crtc_vblank_count(crtc); } for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { if (!(crtc_mask & drm_crtc_mask(crtc))) continue; ret = wait_event_timeout(dev->vblank[i].queue, state->crtcs[i].last_vblank_count != drm_crtc_vblank_count(crtc), msecs_to_jiffies(100)); WARN(!ret, "[CRTC:%d:%s] vblank wait timed out\n", crtc->base.id, crtc->name); drm_crtc_vblank_put(crtc); } } EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks); /** * drm_atomic_helper_wait_for_flip_done - wait for all page flips to be done * @dev: DRM device * @state: atomic state object being committed * * Helper to, after atomic commit, wait for page flips on all affected * crtcs (ie. before cleaning up old framebuffers using * drm_atomic_helper_cleanup_planes()). Compared to * drm_atomic_helper_wait_for_vblanks() this waits for the completion on all * CRTCs, assuming that cursors-only updates are signalling their completion * immediately (or using a different path). * * This requires that drivers use the nonblocking commit tracking support * initialized using drm_atomic_helper_setup_commit(). */ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; int i; for (i = 0; i < dev->mode_config.num_crtc; i++) { struct drm_crtc_commit *commit = state->crtcs[i].commit; int ret; crtc = state->crtcs[i].ptr; if (!crtc || !commit) continue; ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ); if (ret == 0) drm_err(dev, "[CRTC:%d:%s] flip_done timed out\n", crtc->base.id, crtc->name); } if (state->fake_commit) complete_all(&state->fake_commit->flip_done); } EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); /** * drm_atomic_helper_commit_tail - commit atomic update to hardware * @state: atomic state object being committed * * This is the default implementation for the * &drm_mode_config_helper_funcs.atomic_commit_tail hook, for drivers * that do not support runtime_pm or do not need the CRTC to be * enabled to perform a commit. Otherwise, see * drm_atomic_helper_commit_tail_rpm(). * * Note that the default ordering of how the various stages are called is to * match the legacy modeset helper library closest. */ void drm_atomic_helper_commit_tail(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; drm_atomic_helper_commit_modeset_disables(dev, state); drm_atomic_helper_commit_planes(dev, state, 0); drm_atomic_helper_commit_modeset_enables(dev, state); drm_atomic_helper_fake_vblank(state); drm_atomic_helper_commit_hw_done(state); drm_atomic_helper_wait_for_vblanks(dev, state); drm_atomic_helper_cleanup_planes(dev, state); } EXPORT_SYMBOL(drm_atomic_helper_commit_tail); /** * drm_atomic_helper_commit_tail_rpm - commit atomic update to hardware * @state: new modeset state to be committed * * This is an alternative implementation for the * &drm_mode_config_helper_funcs.atomic_commit_tail hook, for drivers * that support runtime_pm or need the CRTC to be enabled to perform a * commit. Otherwise, one should use the default implementation * drm_atomic_helper_commit_tail(). */ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; drm_atomic_helper_commit_modeset_disables(dev, state); drm_atomic_helper_commit_modeset_enables(dev, state); drm_atomic_helper_commit_planes(dev, state, DRM_PLANE_COMMIT_ACTIVE_ONLY); drm_atomic_helper_fake_vblank(state); drm_atomic_helper_commit_hw_done(state); drm_atomic_helper_wait_for_vblanks(dev, state); drm_atomic_helper_cleanup_planes(dev, state); } EXPORT_SYMBOL(drm_atomic_helper_commit_tail_rpm); static void commit_tail(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; const struct drm_mode_config_helper_funcs *funcs; struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; ktime_t start; s64 commit_time_ms; unsigned int i, new_self_refresh_mask = 0; funcs = dev->mode_config.helper_private; /* * We're measuring the _entire_ commit, so the time will vary depending * on how many fences and objects are involved. For the purposes of self * refresh, this is desirable since it'll give us an idea of how * congested things are. This will inform our decision on how often we * should enter self refresh after idle. * * These times will be averaged out in the self refresh helpers to avoid * overreacting over one outlier frame */ start = ktime_get(); drm_atomic_helper_wait_for_fences(dev, state, false); drm_atomic_helper_wait_for_dependencies(state); /* * We cannot safely access new_crtc_state after * drm_atomic_helper_commit_hw_done() so figure out which crtc's have * self-refresh active beforehand: */ for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) if (new_crtc_state->self_refresh_active) new_self_refresh_mask |= BIT(i); if (funcs && funcs->atomic_commit_tail) funcs->atomic_commit_tail(state); else drm_atomic_helper_commit_tail(state); commit_time_ms = ktime_ms_delta(ktime_get(), start); if (commit_time_ms > 0) drm_self_refresh_helper_update_avg_times(state, (unsigned long)commit_time_ms, new_self_refresh_mask); drm_atomic_helper_commit_cleanup_done(state); drm_atomic_state_put(state); } static void commit_work(struct work_struct *work) { struct drm_atomic_state *state = container_of(work, struct drm_atomic_state, commit_work); commit_tail(state); } /** * drm_atomic_helper_async_check - check if state can be committed asynchronously * @dev: DRM device * @state: the driver state object * * This helper will check if it is possible to commit the state asynchronously. * Async commits are not supposed to swap the states like normal sync commits * but just do in-place changes on the current state. * * It will return 0 if the commit can happen in an asynchronous fashion or error * if not. Note that error just mean it can't be committed asynchronously, if it * fails the commit should be treated like a normal synchronous commit. */ int drm_atomic_helper_async_check(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; struct drm_plane *plane = NULL; struct drm_plane_state *old_plane_state = NULL; struct drm_plane_state *new_plane_state = NULL; const struct drm_plane_helper_funcs *funcs; int i, ret, n_planes = 0; for_each_new_crtc_in_state(state, crtc, crtc_state, i) { if (drm_atomic_crtc_needs_modeset(crtc_state)) return -EINVAL; } for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) n_planes++; /* FIXME: we support only single plane updates for now */ if (n_planes != 1) { drm_dbg_atomic(dev, "only single plane async updates are supported\n"); return -EINVAL; } if (!new_plane_state->crtc || old_plane_state->crtc != new_plane_state->crtc) { drm_dbg_atomic(dev, "[PLANE:%d:%s] async update cannot change CRTC\n", plane->base.id, plane->name); return -EINVAL; } funcs = plane->helper_private; if (!funcs->atomic_async_update) { drm_dbg_atomic(dev, "[PLANE:%d:%s] driver does not support async updates\n", plane->base.id, plane->name); return -EINVAL; } if (new_plane_state->fence) { drm_dbg_atomic(dev, "[PLANE:%d:%s] missing fence for async update\n", plane->base.id, plane->name); return -EINVAL; } /* * Don't do an async update if there is an outstanding commit modifying * the plane. This prevents our async update's changes from getting * overridden by a previous synchronous update's state. */ if (old_plane_state->commit && !try_wait_for_completion(&old_plane_state->commit->hw_done)) { drm_dbg_atomic(dev, "[PLANE:%d:%s] inflight previous commit preventing async commit\n", plane->base.id, plane->name); return -EBUSY; } ret = funcs->atomic_async_check(plane, state, false); if (ret != 0) drm_dbg_atomic(dev, "[PLANE:%d:%s] driver async check failed\n", plane->base.id, plane->name); return ret; } EXPORT_SYMBOL(drm_atomic_helper_async_check); /** * drm_atomic_helper_async_commit - commit state asynchronously * @dev: DRM device * @state: the driver state object * * This function commits a state asynchronously, i.e., not vblank * synchronized. It should be used on a state only when * drm_atomic_async_check() succeeds. Async commits are not supposed to swap * the states like normal sync commits, but just do in-place changes on the * current state. * * TODO: Implement full swap instead of doing in-place changes. */ void drm_atomic_helper_async_commit(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_plane *plane; struct drm_plane_state *plane_state; const struct drm_plane_helper_funcs *funcs; int i; for_each_new_plane_in_state(state, plane, plane_state, i) { struct drm_framebuffer *new_fb = plane_state->fb; struct drm_framebuffer *old_fb = plane->state->fb; funcs = plane->helper_private; funcs->atomic_async_update(plane, state); /* * ->atomic_async_update() is supposed to update the * plane->state in-place, make sure at least common * properties have been properly updated. */ WARN_ON_ONCE(plane->state->fb != new_fb); WARN_ON_ONCE(plane->state->crtc_x != plane_state->crtc_x); WARN_ON_ONCE(plane->state->crtc_y != plane_state->crtc_y); WARN_ON_ONCE(plane->state->src_x != plane_state->src_x); WARN_ON_ONCE(plane->state->src_y != plane_state->src_y); /* * Make sure the FBs have been swapped so that cleanups in the * new_state performs a cleanup in the old FB. */ WARN_ON_ONCE(plane_state->fb != old_fb); } } EXPORT_SYMBOL(drm_atomic_helper_async_commit); /** * drm_atomic_helper_commit - commit validated state object * @dev: DRM device * @state: the driver state object * @nonblock: whether nonblocking behavior is requested. * * This function commits a with drm_atomic_helper_check() pre-validated state * object. This can still fail when e.g. the framebuffer reservation fails. This * function implements nonblocking commits, using * drm_atomic_helper_setup_commit() and related functions. * * Committing the actual hardware state is done through the * &drm_mode_config_helper_funcs.atomic_commit_tail callback, or its default * implementation drm_atomic_helper_commit_tail(). * * RETURNS: * Zero for success or -errno. */ int drm_atomic_helper_commit(struct drm_device *dev, struct drm_atomic_state *state, bool nonblock) { int ret; if (state->async_update) { ret = drm_atomic_helper_prepare_planes(dev, state); if (ret) return ret; drm_atomic_helper_async_commit(dev, state); drm_atomic_helper_unprepare_planes(dev, state); return 0; } ret = drm_atomic_helper_setup_commit(state, nonblock); if (ret) return ret; INIT_WORK(&state->commit_work, commit_work); ret = drm_atomic_helper_prepare_planes(dev, state); if (ret) return ret; if (!nonblock) { ret = drm_atomic_helper_wait_for_fences(dev, state, true); if (ret) goto err; } /* * This is the point of no return - everything below never fails except * when the hw goes bonghits. Which means we can commit the new state on * the software side now. */ ret = drm_atomic_helper_swap_state(state, true); if (ret) goto err; /* * Everything below can be run asynchronously without the need to grab * any modeset locks at all under one condition: It must be guaranteed * that the asynchronous work has either been cancelled (if the driver * supports it, which at least requires that the framebuffers get * cleaned up with drm_atomic_helper_cleanup_planes()) or completed * before the new state gets committed on the software side with * drm_atomic_helper_swap_state(). * * This scheme allows new atomic state updates to be prepared and * checked in parallel to the asynchronous completion of the previous * update. Which is important since compositors need to figure out the * composition of the next frame right after having submitted the * current layout. * * NOTE: Commit work has multiple phases, first hardware commit, then * cleanup. We want them to overlap, hence need system_unbound_wq to * make sure work items don't artificially stall on each another. */ drm_atomic_state_get(state); if (nonblock) queue_work(system_unbound_wq, &state->commit_work); else commit_tail(state); return 0; err: drm_atomic_helper_unprepare_planes(dev, state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_commit); /** * DOC: implementing nonblocking commit * * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence * different operations against each another. Locks, especially struct * &drm_modeset_lock, should not be held in worker threads or any other * asynchronous context used to commit the hardware state. * * drm_atomic_helper_commit() implements the recommended sequence for * nonblocking commits, using drm_atomic_helper_setup_commit() internally: * * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we * need to propagate out of memory/VRAM errors to userspace, it must be called * synchronously. * * 2. Synchronize with any outstanding nonblocking commit worker threads which * might be affected by the new state update. This is handled by * drm_atomic_helper_setup_commit(). * * Asynchronous workers need to have sufficient parallelism to be able to run * different atomic commits on different CRTCs in parallel. The simplest way to * achieve this is by running them on the &system_unbound_wq work queue. Note * that drivers are not required to split up atomic commits and run an * individual commit in parallel - userspace is supposed to do that if it cares. * But it might be beneficial to do that for modesets, since those necessarily * must be done as one global operation, and enabling or disabling a CRTC can * take a long time. But even that is not required. * * IMPORTANT: A &drm_atomic_state update for multiple CRTCs is sequenced * against all CRTCs therein. Therefore for atomic state updates which only flip * planes the driver must not get the struct &drm_crtc_state of unrelated CRTCs * in its atomic check code: This would prevent committing of atomic updates to * multiple CRTCs in parallel. In general, adding additional state structures * should be avoided as much as possible, because this reduces parallelism in * (nonblocking) commits, both due to locking and due to commit sequencing * requirements. * * 3. The software state is updated synchronously with * drm_atomic_helper_swap_state(). Doing this under the protection of all modeset * locks means concurrent callers never see inconsistent state. Note that commit * workers do not hold any locks; their access is only coordinated through * ordering. If workers would access state only through the pointers in the * free-standing state objects (currently not the case for any driver) then even * multiple pending commits could be in-flight at the same time. * * 4. Schedule a work item to do all subsequent steps, using the split-out * commit helpers: a) pre-plane commit b) plane commit c) post-plane commit and * then cleaning up the framebuffers after the old framebuffer is no longer * being displayed. The scheduled work should synchronize against other workers * using the &drm_crtc_commit infrastructure as needed. See * drm_atomic_helper_setup_commit() for more details. */ static int stall_checks(struct drm_crtc *crtc, bool nonblock) { struct drm_crtc_commit *commit, *stall_commit = NULL; bool completed = true; int i; long ret = 0; spin_lock(&crtc->commit_lock); i = 0; list_for_each_entry(commit, &crtc->commit_list, commit_entry) { if (i == 0) { completed = try_wait_for_completion(&commit->flip_done); /* * Userspace is not allowed to get ahead of the previous * commit with nonblocking ones. */ if (!completed && nonblock) { spin_unlock(&crtc->commit_lock); drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] busy with a previous commit\n", crtc->base.id, crtc->name); return -EBUSY; } } else if (i == 1) { stall_commit = drm_crtc_commit_get(commit); break; } i++; } spin_unlock(&crtc->commit_lock); if (!stall_commit) return 0; /* We don't want to let commits get ahead of cleanup work too much, * stalling on 2nd previous commit means triple-buffer won't ever stall. */ ret = wait_for_completion_interruptible_timeout(&stall_commit->cleanup_done, 10*HZ); if (ret == 0) drm_err(crtc->dev, "[CRTC:%d:%s] cleanup_done timed out\n", crtc->base.id, crtc->name); drm_crtc_commit_put(stall_commit); return ret < 0 ? ret : 0; } static void release_crtc_commit(struct completion *completion) { struct drm_crtc_commit *commit = container_of(completion, typeof(*commit), flip_done); drm_crtc_commit_put(commit); } static void init_commit(struct drm_crtc_commit *commit, struct drm_crtc *crtc) { init_completion(&commit->flip_done); init_completion(&commit->hw_done); init_completion(&commit->cleanup_done); INIT_LIST_HEAD(&commit->commit_entry); kref_init(&commit->ref); commit->crtc = crtc; } static struct drm_crtc_commit * crtc_or_fake_commit(struct drm_atomic_state *state, struct drm_crtc *crtc) { if (crtc) { struct drm_crtc_state *new_crtc_state; new_crtc_state = drm_atomic_get_new_crtc_state(state, crtc); return new_crtc_state->commit; } if (!state->fake_commit) { state->fake_commit = kzalloc(sizeof(*state->fake_commit), GFP_KERNEL); if (!state->fake_commit) return NULL; init_commit(state->fake_commit, NULL); } return state->fake_commit; } /** * drm_atomic_helper_setup_commit - setup possibly nonblocking commit * @state: new modeset state to be committed * @nonblock: whether nonblocking behavior is requested. * * This function prepares @state to be used by the atomic helper's support for * nonblocking commits. Drivers using the nonblocking commit infrastructure * should always call this function from their * &drm_mode_config_funcs.atomic_commit hook. * * Drivers that need to extend the commit setup to private objects can use the * &drm_mode_config_helper_funcs.atomic_commit_setup hook. * * To be able to use this support drivers need to use a few more helper * functions. drm_atomic_helper_wait_for_dependencies() must be called before * actually committing the hardware state, and for nonblocking commits this call * must be placed in the async worker. See also drm_atomic_helper_swap_state() * and its stall parameter, for when a driver's commit hooks look at the * &drm_crtc.state, &drm_plane.state or &drm_connector.state pointer directly. * * Completion of the hardware commit step must be signalled using * drm_atomic_helper_commit_hw_done(). After this step the driver is not allowed * to read or change any permanent software or hardware modeset state. The only * exception is state protected by other means than &drm_modeset_lock locks. * Only the free standing @state with pointers to the old state structures can * be inspected, e.g. to clean up old buffers using * drm_atomic_helper_cleanup_planes(). * * At the very end, before cleaning up @state drivers must call * drm_atomic_helper_commit_cleanup_done(). * * This is all implemented by in drm_atomic_helper_commit(), giving drivers a * complete and easy-to-use default implementation of the atomic_commit() hook. * * The tracking of asynchronously executed and still pending commits is done * using the core structure &drm_crtc_commit. * * By default there's no need to clean up resources allocated by this function * explicitly: drm_atomic_state_default_clear() will take care of that * automatically. * * Returns: * 0 on success. -EBUSY when userspace schedules nonblocking commits too fast, * -ENOMEM on allocation failures and -EINTR when a signal is pending. */ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, bool nonblock) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; struct drm_connector *conn; struct drm_connector_state *old_conn_state, *new_conn_state; struct drm_plane *plane; struct drm_plane_state *old_plane_state, *new_plane_state; struct drm_crtc_commit *commit; const struct drm_mode_config_helper_funcs *funcs; int i, ret; funcs = state->dev->mode_config.helper_private; for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { commit = kzalloc(sizeof(*commit), GFP_KERNEL); if (!commit) return -ENOMEM; init_commit(commit, crtc); new_crtc_state->commit = commit; ret = stall_checks(crtc, nonblock); if (ret) return ret; /* * Drivers only send out events when at least either current or * new CRTC state is active. Complete right away if everything * stays off. */ if (!old_crtc_state->active && !new_crtc_state->active) { complete_all(&commit->flip_done); continue; } /* Legacy cursor updates are fully unsynced. */ if (state->legacy_cursor_update) { complete_all(&commit->flip_done); continue; } if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL); if (!commit->event) return -ENOMEM; new_crtc_state->event = commit->event; } new_crtc_state->event->base.completion = &commit->flip_done; new_crtc_state->event->base.completion_release = release_crtc_commit; drm_crtc_commit_get(commit); commit->abort_completion = true; state->crtcs[i].commit = commit; drm_crtc_commit_get(commit); } for_each_oldnew_connector_in_state(state, conn, old_conn_state, new_conn_state, i) { /* * Userspace is not allowed to get ahead of the previous * commit with nonblocking ones. */ if (nonblock && old_conn_state->commit && !try_wait_for_completion(&old_conn_state->commit->flip_done)) { drm_dbg_atomic(conn->dev, "[CONNECTOR:%d:%s] busy with a previous commit\n", conn->base.id, conn->name); return -EBUSY; } /* Always track connectors explicitly for e.g. link retraining. */ commit = crtc_or_fake_commit(state, new_conn_state->crtc ?: old_conn_state->crtc); if (!commit) return -ENOMEM; new_conn_state->commit = drm_crtc_commit_get(commit); } for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { /* * Userspace is not allowed to get ahead of the previous * commit with nonblocking ones. */ if (nonblock && old_plane_state->commit && !try_wait_for_completion(&old_plane_state->commit->flip_done)) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] busy with a previous commit\n", plane->base.id, plane->name); return -EBUSY; } /* Always track planes explicitly for async pageflip support. */ commit = crtc_or_fake_commit(state, new_plane_state->crtc ?: old_plane_state->crtc); if (!commit) return -ENOMEM; new_plane_state->commit = drm_crtc_commit_get(commit); } if (funcs && funcs->atomic_commit_setup) return funcs->atomic_commit_setup(state); return 0; } EXPORT_SYMBOL(drm_atomic_helper_setup_commit); /** * drm_atomic_helper_wait_for_dependencies - wait for required preceding commits * @state: atomic state object being committed * * This function waits for all preceding commits that touch the same CRTC as * @state to both be committed to the hardware (as signalled by * drm_atomic_helper_commit_hw_done()) and executed by the hardware (as signalled * by calling drm_crtc_send_vblank_event() on the &drm_crtc_state.event). * * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; struct drm_plane *plane; struct drm_plane_state *old_plane_state; struct drm_connector *conn; struct drm_connector_state *old_conn_state; int i; long ret; for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { ret = drm_crtc_commit_wait(old_crtc_state->commit); if (ret) drm_err(crtc->dev, "[CRTC:%d:%s] commit wait timed out\n", crtc->base.id, crtc->name); } for_each_old_connector_in_state(state, conn, old_conn_state, i) { ret = drm_crtc_commit_wait(old_conn_state->commit); if (ret) drm_err(conn->dev, "[CONNECTOR:%d:%s] commit wait timed out\n", conn->base.id, conn->name); } for_each_old_plane_in_state(state, plane, old_plane_state, i) { ret = drm_crtc_commit_wait(old_plane_state->commit); if (ret) drm_err(plane->dev, "[PLANE:%d:%s] commit wait timed out\n", plane->base.id, plane->name); } } EXPORT_SYMBOL(drm_atomic_helper_wait_for_dependencies); /** * drm_atomic_helper_fake_vblank - fake VBLANK events if needed * @state: atomic state object being committed * * This function walks all CRTCs and fakes VBLANK events on those with * &drm_crtc_state.no_vblank set to true and &drm_crtc_state.event != NULL. * The primary use of this function is writeback connectors working in oneshot * mode and faking VBLANK events. In this case they only fake the VBLANK event * when a job is queued, and any change to the pipeline that does not touch the * connector is leading to timeouts when calling * drm_atomic_helper_wait_for_vblanks() or * drm_atomic_helper_wait_for_flip_done(). In addition to writeback * connectors, this function can also fake VBLANK events for CRTCs without * VBLANK interrupt. * * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ void drm_atomic_helper_fake_vblank(struct drm_atomic_state *state) { struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; int i; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { unsigned long flags; if (!new_crtc_state->no_vblank) continue; spin_lock_irqsave(&state->dev->event_lock, flags); if (new_crtc_state->event) { drm_crtc_send_vblank_event(crtc, new_crtc_state->event); new_crtc_state->event = NULL; } spin_unlock_irqrestore(&state->dev->event_lock, flags); } } EXPORT_SYMBOL(drm_atomic_helper_fake_vblank); /** * drm_atomic_helper_commit_hw_done - setup possible nonblocking commit * @state: atomic state object being committed * * This function is used to signal completion of the hardware commit step. After * this step the driver is not allowed to read or change any permanent software * or hardware modeset state. The only exception is state protected by other * means than &drm_modeset_lock locks. * * Drivers should try to postpone any expensive or delayed cleanup work after * this function is called. * * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ void drm_atomic_helper_commit_hw_done(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; struct drm_crtc_commit *commit; int i; for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { commit = new_crtc_state->commit; if (!commit) continue; /* * copy new_crtc_state->commit to old_crtc_state->commit, * it's unsafe to touch new_crtc_state after hw_done, * but we still need to do so in cleanup_done(). */ if (old_crtc_state->commit) drm_crtc_commit_put(old_crtc_state->commit); old_crtc_state->commit = drm_crtc_commit_get(commit); /* backend must have consumed any event by now */ WARN_ON(new_crtc_state->event); complete_all(&commit->hw_done); } if (state->fake_commit) { complete_all(&state->fake_commit->hw_done); complete_all(&state->fake_commit->flip_done); } } EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done); /** * drm_atomic_helper_commit_cleanup_done - signal completion of commit * @state: atomic state object being committed * * This signals completion of the atomic update @state, including any * cleanup work. If used, it must be called right before calling * drm_atomic_state_put(). * * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; struct drm_crtc_commit *commit; int i; for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { commit = old_crtc_state->commit; if (WARN_ON(!commit)) continue; complete_all(&commit->cleanup_done); WARN_ON(!try_wait_for_completion(&commit->hw_done)); spin_lock(&crtc->commit_lock); list_del(&commit->commit_entry); spin_unlock(&crtc->commit_lock); } if (state->fake_commit) { complete_all(&state->fake_commit->cleanup_done); WARN_ON(!try_wait_for_completion(&state->fake_commit->hw_done)); } } EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done); /** * drm_atomic_helper_prepare_planes - prepare plane resources before commit * @dev: DRM device * @state: atomic state object with new state structures * * This function prepares plane state, specifically framebuffers, for the new * configuration, by calling &drm_plane_helper_funcs.prepare_fb. If any failure * is encountered this function will call &drm_plane_helper_funcs.cleanup_fb on * any already successfully prepared framebuffer. * * Returns: * 0 on success, negative error code on failure. */ int drm_atomic_helper_prepare_planes(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_connector *connector; struct drm_connector_state *new_conn_state; struct drm_plane *plane; struct drm_plane_state *new_plane_state; int ret, i, j; for_each_new_connector_in_state(state, connector, new_conn_state, i) { if (!new_conn_state->writeback_job) continue; ret = drm_writeback_prepare_job(new_conn_state->writeback_job); if (ret < 0) return ret; } for_each_new_plane_in_state(state, plane, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs; funcs = plane->helper_private; if (funcs->prepare_fb) { ret = funcs->prepare_fb(plane, new_plane_state); if (ret) goto fail_prepare_fb; } else { WARN_ON_ONCE(funcs->cleanup_fb); if (!drm_core_check_feature(dev, DRIVER_GEM)) continue; ret = drm_gem_plane_helper_prepare_fb(plane, new_plane_state); if (ret) goto fail_prepare_fb; } } for_each_new_plane_in_state(state, plane, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (funcs->begin_fb_access) { ret = funcs->begin_fb_access(plane, new_plane_state); if (ret) goto fail_begin_fb_access; } } return 0; fail_begin_fb_access: for_each_new_plane_in_state(state, plane, new_plane_state, j) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (j >= i) continue; if (funcs->end_fb_access) funcs->end_fb_access(plane, new_plane_state); } i = j; /* set i to upper limit to cleanup all planes */ fail_prepare_fb: for_each_new_plane_in_state(state, plane, new_plane_state, j) { const struct drm_plane_helper_funcs *funcs; if (j >= i) continue; funcs = plane->helper_private; if (funcs->cleanup_fb) funcs->cleanup_fb(plane, new_plane_state); } return ret; } EXPORT_SYMBOL(drm_atomic_helper_prepare_planes); /** * drm_atomic_helper_unprepare_planes - release plane resources on aborts * @dev: DRM device * @state: atomic state object with old state structures * * This function cleans up plane state, specifically framebuffers, from the * atomic state. It undoes the effects of drm_atomic_helper_prepare_planes() * when aborting an atomic commit. For cleaning up after a successful commit * use drm_atomic_helper_cleanup_planes(). */ void drm_atomic_helper_unprepare_planes(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_plane *plane; struct drm_plane_state *new_plane_state; int i; for_each_new_plane_in_state(state, plane, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (funcs->end_fb_access) funcs->end_fb_access(plane, new_plane_state); } for_each_new_plane_in_state(state, plane, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (funcs->cleanup_fb) funcs->cleanup_fb(plane, new_plane_state); } } EXPORT_SYMBOL(drm_atomic_helper_unprepare_planes); static bool plane_crtc_active(const struct drm_plane_state *state) { return state->crtc && state->crtc->state->active; } /** * drm_atomic_helper_commit_planes - commit plane state * @dev: DRM device * @state: atomic state object being committed * @flags: flags for committing plane state * * This function commits the new plane state using the plane and atomic helper * functions for planes and CRTCs. It assumes that the atomic state has already * been pushed into the relevant object state pointers, since this step can no * longer fail. * * It still requires the global state object @state to know which planes and * crtcs need to be updated though. * * Note that this function does all plane updates across all CRTCs in one step. * If the hardware can't support this approach look at * drm_atomic_helper_commit_planes_on_crtc() instead. * * Plane parameters can be updated by applications while the associated CRTC is * disabled. The DRM/KMS core will store the parameters in the plane state, * which will be available to the driver when the CRTC is turned on. As a result * most drivers don't need to be immediately notified of plane updates for a * disabled CRTC. * * Unless otherwise needed, drivers are advised to set the ACTIVE_ONLY flag in * @flags in order not to receive plane update notifications related to a * disabled CRTC. This avoids the need to manually ignore plane updates in * driver code when the driver and/or hardware can't or just don't need to deal * with updates on disabled CRTCs, for example when supporting runtime PM. * * Drivers may set the NO_DISABLE_AFTER_MODESET flag in @flags if the relevant * display controllers require to disable a CRTC's planes when the CRTC is * disabled. This function would skip the &drm_plane_helper_funcs.atomic_disable * call for a plane if the CRTC of the old plane state needs a modesetting * operation. Of course, the drivers need to disable the planes in their CRTC * disable callbacks since no one else would do that. * * The drm_atomic_helper_commit() default implementation doesn't set the * ACTIVE_ONLY flag to most closely match the behaviour of the legacy helpers. * This should not be copied blindly by drivers. */ void drm_atomic_helper_commit_planes(struct drm_device *dev, struct drm_atomic_state *state, uint32_t flags) { struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; struct drm_plane *plane; struct drm_plane_state *old_plane_state, *new_plane_state; int i; bool active_only = flags & DRM_PLANE_COMMIT_ACTIVE_ONLY; bool no_disable = flags & DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODESET; for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; funcs = crtc->helper_private; if (!funcs || !funcs->atomic_begin) continue; if (active_only && !new_crtc_state->active) continue; funcs->atomic_begin(crtc, state); } for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { const struct drm_plane_helper_funcs *funcs; bool disabling; funcs = plane->helper_private; if (!funcs) continue; disabling = drm_atomic_plane_disabling(old_plane_state, new_plane_state); if (active_only) { /* * Skip planes related to inactive CRTCs. If the plane * is enabled use the state of the current CRTC. If the * plane is being disabled use the state of the old * CRTC to avoid skipping planes being disabled on an * active CRTC. */ if (!disabling && !plane_crtc_active(new_plane_state)) continue; if (disabling && !plane_crtc_active(old_plane_state)) continue; } /* * Special-case disabling the plane if drivers support it. */ if (disabling && funcs->atomic_disable) { struct drm_crtc_state *crtc_state; crtc_state = old_plane_state->crtc->state; if (drm_atomic_crtc_needs_modeset(crtc_state) && no_disable) continue; funcs->atomic_disable(plane, state); } else if (new_plane_state->crtc || disabling) { funcs->atomic_update(plane, state); if (!disabling && funcs->atomic_enable) { if (drm_atomic_plane_enabling(old_plane_state, new_plane_state)) funcs->atomic_enable(plane, state); } } } for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { const struct drm_crtc_helper_funcs *funcs; funcs = crtc->helper_private; if (!funcs || !funcs->atomic_flush) continue; if (active_only && !new_crtc_state->active) continue; funcs->atomic_flush(crtc, state); } /* * Signal end of framebuffer access here before hw_done. After hw_done, * a later commit might have already released the plane state. */ for_each_old_plane_in_state(state, plane, old_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (funcs->end_fb_access) funcs->end_fb_access(plane, old_plane_state); } } EXPORT_SYMBOL(drm_atomic_helper_commit_planes); /** * drm_atomic_helper_commit_planes_on_crtc - commit plane state for a CRTC * @old_crtc_state: atomic state object with the old CRTC state * * This function commits the new plane state using the plane and atomic helper * functions for planes on the specific CRTC. It assumes that the atomic state * has already been pushed into the relevant object state pointers, since this * step can no longer fail. * * This function is useful when plane updates should be done CRTC-by-CRTC * instead of one global step like drm_atomic_helper_commit_planes() does. * * This function can only be savely used when planes are not allowed to move * between different CRTCs because this function doesn't handle inter-CRTC * dependencies. Callers need to ensure that either no such dependencies exist, * resolve them through ordering of commit calls or through some other means. */ void drm_atomic_helper_commit_planes_on_crtc(struct drm_crtc_state *old_crtc_state) { const struct drm_crtc_helper_funcs *crtc_funcs; struct drm_crtc *crtc = old_crtc_state->crtc; struct drm_atomic_state *old_state = old_crtc_state->state; struct drm_crtc_state *new_crtc_state = drm_atomic_get_new_crtc_state(old_state, crtc); struct drm_plane *plane; unsigned int plane_mask; plane_mask = old_crtc_state->plane_mask; plane_mask |= new_crtc_state->plane_mask; crtc_funcs = crtc->helper_private; if (crtc_funcs && crtc_funcs->atomic_begin) crtc_funcs->atomic_begin(crtc, old_state); drm_for_each_plane_mask(plane, crtc->dev, plane_mask) { struct drm_plane_state *old_plane_state = drm_atomic_get_old_plane_state(old_state, plane); struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(old_state, plane); const struct drm_plane_helper_funcs *plane_funcs; bool disabling; plane_funcs = plane->helper_private; if (!old_plane_state || !plane_funcs) continue; WARN_ON(new_plane_state->crtc && new_plane_state->crtc != crtc); disabling = drm_atomic_plane_disabling(old_plane_state, new_plane_state); if (disabling && plane_funcs->atomic_disable) { plane_funcs->atomic_disable(plane, old_state); } else if (new_plane_state->crtc || disabling) { plane_funcs->atomic_update(plane, old_state); if (!disabling && plane_funcs->atomic_enable) { if (drm_atomic_plane_enabling(old_plane_state, new_plane_state)) plane_funcs->atomic_enable(plane, old_state); } } } if (crtc_funcs && crtc_funcs->atomic_flush) crtc_funcs->atomic_flush(crtc, old_state); } EXPORT_SYMBOL(drm_atomic_helper_commit_planes_on_crtc); /** * drm_atomic_helper_disable_planes_on_crtc - helper to disable CRTC's planes * @old_crtc_state: atomic state object with the old CRTC state * @atomic: if set, synchronize with CRTC's atomic_begin/flush hooks * * Disables all planes associated with the given CRTC. This can be * used for instance in the CRTC helper atomic_disable callback to disable * all planes. * * If the atomic-parameter is set the function calls the CRTC's * atomic_begin hook before and atomic_flush hook after disabling the * planes. * * It is a bug to call this function without having implemented the * &drm_plane_helper_funcs.atomic_disable plane hook. */ void drm_atomic_helper_disable_planes_on_crtc(struct drm_crtc_state *old_crtc_state, bool atomic) { struct drm_crtc *crtc = old_crtc_state->crtc; const struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private; struct drm_plane *plane; if (atomic && crtc_funcs && crtc_funcs->atomic_begin) crtc_funcs->atomic_begin(crtc, NULL); drm_atomic_crtc_state_for_each_plane(plane, old_crtc_state) { const struct drm_plane_helper_funcs *plane_funcs = plane->helper_private; if (!plane_funcs) continue; WARN_ON(!plane_funcs->atomic_disable); if (plane_funcs->atomic_disable) plane_funcs->atomic_disable(plane, NULL); } if (atomic && crtc_funcs && crtc_funcs->atomic_flush) crtc_funcs->atomic_flush(crtc, NULL); } EXPORT_SYMBOL(drm_atomic_helper_disable_planes_on_crtc); /** * drm_atomic_helper_cleanup_planes - cleanup plane resources after commit * @dev: DRM device * @state: atomic state object being committed * * This function cleans up plane state, specifically framebuffers, from the old * configuration. Hence the old configuration must be perserved in @state to * be able to call this function. * * This function may not be called on the new state when the atomic update * fails at any point after calling drm_atomic_helper_prepare_planes(). Use * drm_atomic_helper_unprepare_planes() in this case. */ void drm_atomic_helper_cleanup_planes(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_plane *plane; struct drm_plane_state *old_plane_state; int i; for_each_old_plane_in_state(state, plane, old_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private; if (funcs->cleanup_fb) funcs->cleanup_fb(plane, old_plane_state); } } EXPORT_SYMBOL(drm_atomic_helper_cleanup_planes); /** * drm_atomic_helper_swap_state - store atomic state into current sw state * @state: atomic state * @stall: stall for preceding commits * * This function stores the atomic state into the current state pointers in all * driver objects. It should be called after all failing steps have been done * and succeeded, but before the actual hardware state is committed. * * For cleanup and error recovery the current state for all changed objects will * be swapped into @state. * * With that sequence it fits perfectly into the plane prepare/cleanup sequence: * * 1. Call drm_atomic_helper_prepare_planes() with the staged atomic state. * * 2. Do any other steps that might fail. * * 3. Put the staged state into the current state pointers with this function. * * 4. Actually commit the hardware state. * * 5. Call drm_atomic_helper_cleanup_planes() with @state, which since step 3 * contains the old state. Also do any other cleanup required with that state. * * @stall must be set when nonblocking commits for this driver directly access * the &drm_plane.state, &drm_crtc.state or &drm_connector.state pointer. With * the current atomic helpers this is almost always the case, since the helpers * don't pass the right state structures to the callbacks. * * Returns: * Returns 0 on success. Can return -ERESTARTSYS when @stall is true and the * waiting for the previous commits has been interrupted. */ int drm_atomic_helper_swap_state(struct drm_atomic_state *state, bool stall) { int i, ret; unsigned long flags = 0; struct drm_connector *connector; struct drm_connector_state *old_conn_state, *new_conn_state; struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state, *new_crtc_state; struct drm_plane *plane; struct drm_plane_state *old_plane_state, *new_plane_state; struct drm_crtc_commit *commit; struct drm_private_obj *obj; struct drm_private_state *old_obj_state, *new_obj_state; if (stall) { /* * We have to stall for hw_done here before * drm_atomic_helper_wait_for_dependencies() because flip * depth > 1 is not yet supported by all drivers. As long as * obj->state is directly dereferenced anywhere in the drivers * atomic_commit_tail function, then it's unsafe to swap state * before drm_atomic_helper_commit_hw_done() is called. */ for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { commit = old_crtc_state->commit; if (!commit) continue; ret = wait_for_completion_interruptible(&commit->hw_done); if (ret) return ret; } for_each_old_connector_in_state(state, connector, old_conn_state, i) { commit = old_conn_state->commit; if (!commit) continue; ret = wait_for_completion_interruptible(&commit->hw_done); if (ret) return ret; } for_each_old_plane_in_state(state, plane, old_plane_state, i) { commit = old_plane_state->commit; if (!commit) continue; ret = wait_for_completion_interruptible(&commit->hw_done); if (ret) return ret; } } for_each_oldnew_connector_in_state(state, connector, old_conn_state, new_conn_state, i) { WARN_ON(connector->state != old_conn_state); old_conn_state->state = state; new_conn_state->state = NULL; state->connectors[i].state = old_conn_state; connector->state = new_conn_state; } for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { WARN_ON(crtc->state != old_crtc_state); old_crtc_state->state = state; new_crtc_state->state = NULL; state->crtcs[i].state = old_crtc_state; crtc->state = new_crtc_state; if (new_crtc_state->commit) { spin_lock(&crtc->commit_lock); list_add(&new_crtc_state->commit->commit_entry, &crtc->commit_list); spin_unlock(&crtc->commit_lock); new_crtc_state->commit->event = NULL; } } drm_panic_lock(state->dev, flags); for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { WARN_ON(plane->state != old_plane_state); old_plane_state->state = state; new_plane_state->state = NULL; state->planes[i].state = old_plane_state; plane->state = new_plane_state; } drm_panic_unlock(state->dev, flags); for_each_oldnew_private_obj_in_state(state, obj, old_obj_state, new_obj_state, i) { WARN_ON(obj->state != old_obj_state); old_obj_state->state = state; new_obj_state->state = NULL; state->private_objs[i].state = old_obj_state; obj->state = new_obj_state; } return 0; } EXPORT_SYMBOL(drm_atomic_helper_swap_state); /** * drm_atomic_helper_update_plane - Helper for primary plane update using atomic * @plane: plane object to update * @crtc: owning CRTC of owning plane * @fb: framebuffer to flip onto plane * @crtc_x: x offset of primary plane on @crtc * @crtc_y: y offset of primary plane on @crtc * @crtc_w: width of primary plane rectangle on @crtc * @crtc_h: height of primary plane rectangle on @crtc * @src_x: x offset of @fb for panning * @src_y: y offset of @fb for panning * @src_w: width of source rectangle in @fb * @src_h: height of source rectangle in @fb * @ctx: lock acquire context * * Provides a default plane update handler using the atomic driver interface. * * RETURNS: * Zero on success, error code on failure */ int drm_atomic_helper_update_plane(struct drm_plane *plane, struct drm_crtc *crtc, struct drm_framebuffer *fb, int crtc_x, int crtc_y, unsigned int crtc_w, unsigned int crtc_h, uint32_t src_x, uint32_t src_y, uint32_t src_w, uint32_t src_h, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_plane_state *plane_state; int ret = 0; state = drm_atomic_state_alloc(plane->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { ret = PTR_ERR(plane_state); goto fail; } ret = drm_atomic_set_crtc_for_plane(plane_state, crtc); if (ret != 0) goto fail; drm_atomic_set_fb_for_plane(plane_state, fb); plane_state->crtc_x = crtc_x; plane_state->crtc_y = crtc_y; plane_state->crtc_w = crtc_w; plane_state->crtc_h = crtc_h; plane_state->src_x = src_x; plane_state->src_y = src_y; plane_state->src_w = src_w; plane_state->src_h = src_h; if (plane == crtc->cursor) state->legacy_cursor_update = true; ret = drm_atomic_commit(state); fail: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_update_plane); /** * drm_atomic_helper_disable_plane - Helper for primary plane disable using atomic * @plane: plane to disable * @ctx: lock acquire context * * Provides a default plane disable handler using the atomic driver interface. * * RETURNS: * Zero on success, error code on failure */ int drm_atomic_helper_disable_plane(struct drm_plane *plane, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_plane_state *plane_state; int ret = 0; state = drm_atomic_state_alloc(plane->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { ret = PTR_ERR(plane_state); goto fail; } if (plane_state->crtc && plane_state->crtc->cursor == plane) plane_state->state->legacy_cursor_update = true; ret = __drm_atomic_helper_disable_plane(plane, plane_state); if (ret != 0) goto fail; ret = drm_atomic_commit(state); fail: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_disable_plane); /** * drm_atomic_helper_set_config - set a new config from userspace * @set: mode set configuration * @ctx: lock acquisition context * * Provides a default CRTC set_config handler using the atomic driver interface. * * NOTE: For backwards compatibility with old userspace this automatically * resets the "link-status" property to GOOD, to force any link * re-training. The SETCRTC ioctl does not define whether an update does * need a full modeset or just a plane update, hence we're allowed to do * that. See also drm_connector_set_link_status_property(). * * Returns: * Returns 0 on success, negative errno numbers on failure. */ int drm_atomic_helper_set_config(struct drm_mode_set *set, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_crtc *crtc = set->crtc; int ret = 0; state = drm_atomic_state_alloc(crtc->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; ret = __drm_atomic_helper_set_config(set, state); if (ret != 0) goto fail; ret = handle_conflicting_encoders(state, true); if (ret) goto fail; ret = drm_atomic_commit(state); fail: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_set_config); /** * drm_atomic_helper_disable_all - disable all currently active outputs * @dev: DRM device * @ctx: lock acquisition context * * Loops through all connectors, finding those that aren't turned off and then * turns them off by setting their DPMS mode to OFF and deactivating the CRTC * that they are connected to. * * This is used for example in suspend/resume to disable all currently active * functions when suspending. If you just want to shut down everything at e.g. * driver unload, look at drm_atomic_helper_shutdown(). * * Note that if callers haven't already acquired all modeset locks this might * return -EDEADLK, which must be handled by calling drm_modeset_backoff(). * * Returns: * 0 on success or a negative error code on failure. * * See also: * drm_atomic_helper_suspend(), drm_atomic_helper_resume() and * drm_atomic_helper_shutdown(). */ int drm_atomic_helper_disable_all(struct drm_device *dev, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_connector_state *conn_state; struct drm_connector *conn; struct drm_plane_state *plane_state; struct drm_plane *plane; struct drm_crtc_state *crtc_state; struct drm_crtc *crtc; int ret, i; state = drm_atomic_state_alloc(dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; drm_for_each_crtc(crtc, dev) { crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) { ret = PTR_ERR(crtc_state); goto free; } crtc_state->active = false; ret = drm_atomic_set_mode_prop_for_crtc(crtc_state, NULL); if (ret < 0) goto free; ret = drm_atomic_add_affected_planes(state, crtc); if (ret < 0) goto free; ret = drm_atomic_add_affected_connectors(state, crtc); if (ret < 0) goto free; } for_each_new_connector_in_state(state, conn, conn_state, i) { ret = drm_atomic_set_crtc_for_connector(conn_state, NULL); if (ret < 0) goto free; } for_each_new_plane_in_state(state, plane, plane_state, i) { ret = drm_atomic_set_crtc_for_plane(plane_state, NULL); if (ret < 0) goto free; drm_atomic_set_fb_for_plane(plane_state, NULL); } ret = drm_atomic_commit(state); free: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_disable_all); /** * drm_atomic_helper_reset_crtc - reset the active outputs of a CRTC * @crtc: DRM CRTC * @ctx: lock acquisition context * * Reset the active outputs by indicating that connectors have changed. * This implies a reset of all active components available between the CRTC and * connectors. * * A variant of this function exists with * drm_bridge_helper_reset_crtc(), dedicated to bridges. * * NOTE: This relies on resetting &drm_crtc_state.connectors_changed. * For drivers which optimize out unnecessary modesets this will result in * a no-op commit, achieving nothing. * * Returns: * 0 on success or a negative error code on failure. */ int drm_atomic_helper_reset_crtc(struct drm_crtc *crtc, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_crtc_state *crtc_state; int ret; state = drm_atomic_state_alloc(crtc->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) { ret = PTR_ERR(crtc_state); goto out; } crtc_state->connectors_changed = true; ret = drm_atomic_commit(state); out: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_reset_crtc); /** * drm_atomic_helper_shutdown - shutdown all CRTC * @dev: DRM device * * This shuts down all CRTC, which is useful for driver unloading. Shutdown on * suspend should instead be handled with drm_atomic_helper_suspend(), since * that also takes a snapshot of the modeset state to be restored on resume. * * This is just a convenience wrapper around drm_atomic_helper_disable_all(), * and it is the atomic version of drm_helper_force_disable_all(). */ void drm_atomic_helper_shutdown(struct drm_device *dev) { struct drm_modeset_acquire_ctx ctx; int ret; if (dev == NULL) return; DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, ret); ret = drm_atomic_helper_disable_all(dev, &ctx); if (ret) drm_err(dev, "Disabling all crtc's during unload failed with %i\n", ret); DRM_MODESET_LOCK_ALL_END(dev, ctx, ret); } EXPORT_SYMBOL(drm_atomic_helper_shutdown); /** * drm_atomic_helper_duplicate_state - duplicate an atomic state object * @dev: DRM device * @ctx: lock acquisition context * * Makes a copy of the current atomic state by looping over all objects and * duplicating their respective states. This is used for example by suspend/ * resume support code to save the state prior to suspend such that it can * be restored upon resume. * * Note that this treats atomic state as persistent between save and restore. * Drivers must make sure that this is possible and won't result in confusion * or erroneous behaviour. * * Note that if callers haven't already acquired all modeset locks this might * return -EDEADLK, which must be handled by calling drm_modeset_backoff(). * * Returns: * A pointer to the copy of the atomic state object on success or an * ERR_PTR()-encoded error code on failure. * * See also: * drm_atomic_helper_suspend(), drm_atomic_helper_resume() */ struct drm_atomic_state * drm_atomic_helper_duplicate_state(struct drm_device *dev, struct drm_modeset_acquire_ctx *ctx) { struct drm_atomic_state *state; struct drm_connector *conn; struct drm_connector_list_iter conn_iter; struct drm_plane *plane; struct drm_crtc *crtc; int err = 0; state = drm_atomic_state_alloc(dev); if (!state) return ERR_PTR(-ENOMEM); state->acquire_ctx = ctx; state->duplicated = true; drm_for_each_crtc(crtc, dev) { struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) { err = PTR_ERR(crtc_state); goto free; } } drm_for_each_plane(plane, dev) { struct drm_plane_state *plane_state; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { err = PTR_ERR(plane_state); goto free; } } drm_connector_list_iter_begin(dev, &conn_iter); drm_for_each_connector_iter(conn, &conn_iter) { struct drm_connector_state *conn_state; conn_state = drm_atomic_get_connector_state(state, conn); if (IS_ERR(conn_state)) { err = PTR_ERR(conn_state); drm_connector_list_iter_end(&conn_iter); goto free; } } drm_connector_list_iter_end(&conn_iter); /* clear the acquire context so that it isn't accidentally reused */ state->acquire_ctx = NULL; free: if (err < 0) { drm_atomic_state_put(state); state = ERR_PTR(err); } return state; } EXPORT_SYMBOL(drm_atomic_helper_duplicate_state); /** * drm_atomic_helper_suspend - subsystem-level suspend helper * @dev: DRM device * * Duplicates the current atomic state, disables all active outputs and then * returns a pointer to the original atomic state to the caller. Drivers can * pass this pointer to the drm_atomic_helper_resume() helper upon resume to * restore the output configuration that was active at the time the system * entered suspend. * * Note that it is potentially unsafe to use this. The atomic state object * returned by this function is assumed to be persistent. Drivers must ensure * that this holds true. Before calling this function, drivers must make sure * to suspend fbdev emulation so that nothing can be using the device. * * Returns: * A pointer to a copy of the state before suspend on success or an ERR_PTR()- * encoded error code on failure. Drivers should store the returned atomic * state object and pass it to the drm_atomic_helper_resume() helper upon * resume. * * See also: * drm_atomic_helper_duplicate_state(), drm_atomic_helper_disable_all(), * drm_atomic_helper_resume(), drm_atomic_helper_commit_duplicated_state() */ struct drm_atomic_state *drm_atomic_helper_suspend(struct drm_device *dev) { struct drm_modeset_acquire_ctx ctx; struct drm_atomic_state *state; int err; /* This can never be returned, but it makes the compiler happy */ state = ERR_PTR(-EINVAL); DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, err); state = drm_atomic_helper_duplicate_state(dev, &ctx); if (IS_ERR(state)) goto unlock; err = drm_atomic_helper_disable_all(dev, &ctx); if (err < 0) { drm_atomic_state_put(state); state = ERR_PTR(err); goto unlock; } unlock: DRM_MODESET_LOCK_ALL_END(dev, ctx, err); if (err) return ERR_PTR(err); return state; } EXPORT_SYMBOL(drm_atomic_helper_suspend); /** * drm_atomic_helper_commit_duplicated_state - commit duplicated state * @state: duplicated atomic state to commit * @ctx: pointer to acquire_ctx to use for commit. * * The state returned by drm_atomic_helper_duplicate_state() and * drm_atomic_helper_suspend() is partially invalid, and needs to * be fixed up before commit. * * Returns: * 0 on success or a negative error code on failure. * * See also: * drm_atomic_helper_suspend() */ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state, struct drm_modeset_acquire_ctx *ctx) { int i, ret; struct drm_plane *plane; struct drm_plane_state *new_plane_state; struct drm_connector *connector; struct drm_connector_state *new_conn_state; struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; state->acquire_ctx = ctx; for_each_new_plane_in_state(state, plane, new_plane_state, i) state->planes[i].old_state = plane->state; for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) state->crtcs[i].old_state = crtc->state; for_each_new_connector_in_state(state, connector, new_conn_state, i) state->connectors[i].old_state = connector->state; ret = drm_atomic_commit(state); state->acquire_ctx = NULL; return ret; } EXPORT_SYMBOL(drm_atomic_helper_commit_duplicated_state); /** * drm_atomic_helper_resume - subsystem-level resume helper * @dev: DRM device * @state: atomic state to resume to * * Calls drm_mode_config_reset() to synchronize hardware and software states, * grabs all modeset locks and commits the atomic state object. This can be * used in conjunction with the drm_atomic_helper_suspend() helper to * implement suspend/resume for drivers that support atomic mode-setting. * * Returns: * 0 on success or a negative error code on failure. * * See also: * drm_atomic_helper_suspend() */ int drm_atomic_helper_resume(struct drm_device *dev, struct drm_atomic_state *state) { struct drm_modeset_acquire_ctx ctx; int err; drm_mode_config_reset(dev); DRM_MODESET_LOCK_ALL_BEGIN(dev, ctx, 0, err); err = drm_atomic_helper_commit_duplicated_state(state, &ctx); DRM_MODESET_LOCK_ALL_END(dev, ctx, err); drm_atomic_state_put(state); return err; } EXPORT_SYMBOL(drm_atomic_helper_resume); static int page_flip_common(struct drm_atomic_state *state, struct drm_crtc *crtc, struct drm_framebuffer *fb, struct drm_pending_vblank_event *event, uint32_t flags) { struct drm_plane *plane = crtc->primary; struct drm_plane_state *plane_state; struct drm_crtc_state *crtc_state; int ret = 0; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); crtc_state->event = event; crtc_state->async_flip = flags & DRM_MODE_PAGE_FLIP_ASYNC; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) return PTR_ERR(plane_state); ret = drm_atomic_set_crtc_for_plane(plane_state, crtc); if (ret != 0) return ret; drm_atomic_set_fb_for_plane(plane_state, fb); /* Make sure we don't accidentally do a full modeset. */ state->allow_modeset = false; if (!crtc_state->active) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] disabled, rejecting legacy flip\n", crtc->base.id, crtc->name); return -EINVAL; } return ret; } /** * drm_atomic_helper_page_flip - execute a legacy page flip * @crtc: DRM CRTC * @fb: DRM framebuffer * @event: optional DRM event to signal upon completion * @flags: flip flags for non-vblank sync'ed updates * @ctx: lock acquisition context * * Provides a default &drm_crtc_funcs.page_flip implementation * using the atomic driver interface. * * Returns: * Returns 0 on success, negative errno numbers on failure. * * See also: * drm_atomic_helper_page_flip_target() */ int drm_atomic_helper_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb, struct drm_pending_vblank_event *event, uint32_t flags, struct drm_modeset_acquire_ctx *ctx) { struct drm_plane *plane = crtc->primary; struct drm_atomic_state *state; int ret = 0; state = drm_atomic_state_alloc(plane->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; ret = page_flip_common(state, crtc, fb, event, flags); if (ret != 0) goto fail; ret = drm_atomic_nonblocking_commit(state); fail: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_page_flip); /** * drm_atomic_helper_page_flip_target - do page flip on target vblank period. * @crtc: DRM CRTC * @fb: DRM framebuffer * @event: optional DRM event to signal upon completion * @flags: flip flags for non-vblank sync'ed updates * @target: specifying the target vblank period when the flip to take effect * @ctx: lock acquisition context * * Provides a default &drm_crtc_funcs.page_flip_target implementation. * Similar to drm_atomic_helper_page_flip() with extra parameter to specify * target vblank period to flip. * * Returns: * Returns 0 on success, negative errno numbers on failure. */ int drm_atomic_helper_page_flip_target(struct drm_crtc *crtc, struct drm_framebuffer *fb, struct drm_pending_vblank_event *event, uint32_t flags, uint32_t target, struct drm_modeset_acquire_ctx *ctx) { struct drm_plane *plane = crtc->primary; struct drm_atomic_state *state; struct drm_crtc_state *crtc_state; int ret = 0; state = drm_atomic_state_alloc(plane->dev); if (!state) return -ENOMEM; state->acquire_ctx = ctx; ret = page_flip_common(state, crtc, fb, event, flags); if (ret != 0) goto fail; crtc_state = drm_atomic_get_new_crtc_state(state, crtc); if (WARN_ON(!crtc_state)) { ret = -EINVAL; goto fail; } crtc_state->target_vblank = target; ret = drm_atomic_nonblocking_commit(state); fail: drm_atomic_state_put(state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_page_flip_target); /** * drm_atomic_helper_bridge_propagate_bus_fmt() - Propagate output format to * the input end of a bridge * @bridge: bridge control structure * @bridge_state: new bridge state * @crtc_state: new CRTC state * @conn_state: new connector state * @output_fmt: tested output bus format * @num_input_fmts: will contain the size of the returned array * * This helper is a pluggable implementation of the * &drm_bridge_funcs.atomic_get_input_bus_fmts operation for bridges that don't * modify the bus configuration between their input and their output. It * returns an array of input formats with a single element set to @output_fmt. * * RETURNS: * a valid format array of size @num_input_fmts, or NULL if the allocation * failed */ u32 * drm_atomic_helper_bridge_propagate_bus_fmt(struct drm_bridge *bridge, struct drm_bridge_state *bridge_state, struct drm_crtc_state *crtc_state, struct drm_connector_state *conn_state, u32 output_fmt, unsigned int *num_input_fmts) { u32 *input_fmts; input_fmts = kzalloc(sizeof(*input_fmts), GFP_KERNEL); if (!input_fmts) { *num_input_fmts = 0; return NULL; } *num_input_fmts = 1; input_fmts[0] = output_fmt; return input_fmts; } EXPORT_SYMBOL(drm_atomic_helper_bridge_propagate_bus_fmt); |
| 1535 1535 33 1656 33 33 1666 1199 1201 1788 1204 1667 54 54 54 54 1786 1779 1781 40 3536 1044 1149 1769 1781 424 25 25 24 7 7 12 1578 25 71 29 105 142 18 519 521 290 1283 1287 1287 1284 1201 1287 20 15 19 1720 1732 1732 1719 1732 7 2 80 520 520 36 32 8 37 577 398 520 436 577 9 9 9 138 139 1344 32 127 22 345 20 10 20 20 20 389 390 389 388 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* memcontrol.h - Memory Controller * * Copyright IBM Corporation, 2007 * Author Balbir Singh <balbir@linux.vnet.ibm.com> * * Copyright 2007 OpenVZ SWsoft Inc * Author: Pavel Emelianov <xemul@openvz.org> */ #ifndef _LINUX_MEMCONTROL_H #define _LINUX_MEMCONTROL_H #include <linux/cgroup.h> #include <linux/vm_event_item.h> #include <linux/hardirq.h> #include <linux/jump_label.h> #include <linux/kernel.h> #include <linux/page_counter.h> #include <linux/vmpressure.h> #include <linux/eventfd.h> #include <linux/mm.h> #include <linux/vmstat.h> #include <linux/writeback.h> #include <linux/page-flags.h> #include <linux/shrinker.h> struct mem_cgroup; struct obj_cgroup; struct page; struct mm_struct; struct kmem_cache; /* Cgroup-specific page state, on top of universal node page state */ enum memcg_stat_item { MEMCG_SWAP = NR_VM_NODE_STAT_ITEMS, MEMCG_SOCK, MEMCG_PERCPU_B, MEMCG_VMALLOC, MEMCG_KMEM, MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, MEMCG_NR_STAT, }; enum memcg_memory_event { MEMCG_LOW, MEMCG_HIGH, MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL, MEMCG_OOM_GROUP_KILL, MEMCG_SWAP_HIGH, MEMCG_SWAP_MAX, MEMCG_SWAP_FAIL, MEMCG_NR_MEMORY_EVENTS, }; struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; int generation; }; #ifdef CONFIG_MEMCG #define MEM_CGROUP_ID_SHIFT 16 struct mem_cgroup_id { int id; refcount_t ref; }; struct memcg_vmstats_percpu; struct memcg1_events_percpu; struct memcg_vmstats; struct lruvec_stats_percpu; struct lruvec_stats; struct mem_cgroup_reclaim_iter { struct mem_cgroup *position; /* scan generation, increased every round-trip */ atomic_t generation; }; /* * per-node information in memory controller. */ struct mem_cgroup_per_node { /* Keep the read-only fields at the start */ struct mem_cgroup *memcg; /* Back pointer, we cannot */ /* use container_of */ struct lruvec_stats_percpu __percpu *lruvec_stats_percpu; struct lruvec_stats *lruvec_stats; struct shrinker_info __rcu *shrinker_info; #ifdef CONFIG_MEMCG_V1 /* * Memcg-v1 only stuff in middle as buffer between read mostly fields * and update often fields to avoid false sharing. If v1 stuff is * not present, an explicit padding is needed. */ struct rb_node tree_node; /* RB tree node */ unsigned long usage_in_excess;/* Set to the value by which */ /* the soft limit is exceeded*/ bool on_tree; #else CACHELINE_PADDING(_pad1_); #endif /* Fields which get updated often at the end. */ struct lruvec lruvec; CACHELINE_PADDING(_pad2_); unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter; #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC /* slab stats for nmi context */ atomic_t slab_reclaimable; atomic_t slab_unreclaimable; #endif }; struct mem_cgroup_threshold { struct eventfd_ctx *eventfd; unsigned long threshold; }; /* For threshold */ struct mem_cgroup_threshold_ary { /* An array index points to threshold just below or equal to usage. */ int current_threshold; /* Size of entries[] */ unsigned int size; /* Array of thresholds */ struct mem_cgroup_threshold entries[] __counted_by(size); }; struct mem_cgroup_thresholds { /* Primary thresholds array */ struct mem_cgroup_threshold_ary *primary; /* * Spare threshold array. * This is needed to make mem_cgroup_unregister_event() "never fail". * It must be able to store at least primary->size - 1 entries. */ struct mem_cgroup_threshold_ary *spare; }; /* * Remember four most recent foreign writebacks with dirty pages in this * cgroup. Inode sharing is expected to be uncommon and, even if we miss * one in a given round, we're likely to catch it later if it keeps * foreign-dirtying, so a fairly low count should be enough. * * See mem_cgroup_track_foreign_dirty_slowpath() for details. */ #define MEMCG_CGWB_FRN_CNT 4 struct memcg_cgwb_frn { u64 bdi_id; /* bdi->id of the foreign inode */ int memcg_id; /* memcg->css.id of foreign inode */ u64 at; /* jiffies_64 at the time of dirtying */ struct wb_completion done; /* tracks in-flight foreign writebacks */ }; /* * Bucket for arbitrarily byte-sized objects charged to a memory * cgroup. The bucket can be reparented in one piece when the cgroup * is destroyed, without having to round up the individual references * of all live memory objects in the wild. */ struct obj_cgroup { struct percpu_ref refcnt; struct mem_cgroup *memcg; atomic_t nr_charged_bytes; union { struct list_head list; /* protected by objcg_lock */ struct rcu_head rcu; }; }; /* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide * statistics based on the statistics developed by Rik Van Riel for clock-pro, * to help the administrator determine what knobs to tune. */ struct mem_cgroup { struct cgroup_subsys_state css; /* Private memcg ID. Used to ID objects that outlive the cgroup */ struct mem_cgroup_id id; /* Accounted resources */ struct page_counter memory; /* Both v1 & v2 */ union { struct page_counter swap; /* v2 only */ struct page_counter memsw; /* v1 only */ }; /* registered local peak watchers */ struct list_head memory_peaks; struct list_head swap_peaks; spinlock_t peaks_lock; /* Range enforcement for interrupt charges */ struct work_struct high_work; #ifdef CONFIG_ZSWAP unsigned long zswap_max; /* * Prevent pages from this memcg from being written back from zswap to * swap, and from being swapped out on zswap store failures. */ bool zswap_writeback; #endif /* vmpressure notifications */ struct vmpressure vmpressure; /* * Should the OOM killer kill all belonging tasks, had it kill one? */ bool oom_group; int swappiness; /* memory.events and memory.events.local */ struct cgroup_file events_file; struct cgroup_file events_local_file; /* handle for "memory.swap.events" */ struct cgroup_file swap_events_file; /* memory.stat */ struct memcg_vmstats *vmstats; /* memory.events */ atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; atomic_long_t memory_events_local[MEMCG_NR_MEMORY_EVENTS]; #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC /* MEMCG_KMEM for nmi context */ atomic_t kmem_stat; #endif /* * Hint of reclaim pressure for socket memroy management. Note * that this indicator should NOT be used in legacy cgroup mode * where socket memory is accounted/charged separately. */ unsigned long socket_pressure; int kmemcg_id; /* * memcg->objcg is wiped out as a part of the objcg repaprenting * process. memcg->orig_objcg preserves a pointer (and a reference) * to the original objcg until the end of live of memcg. */ struct obj_cgroup __rcu *objcg; struct obj_cgroup *orig_objcg; /* list of inherited objcgs, protected by objcg_lock */ struct list_head objcg_list; struct memcg_vmstats_percpu __percpu *vmstats_percpu; #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_list; struct wb_domain cgwb_domain; struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT]; #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE struct deferred_split deferred_split_queue; #endif #ifdef CONFIG_LRU_GEN_WALKS_MMU /* per-memcg mm_struct list */ struct lru_gen_mm_list mm_list; #endif #ifdef CONFIG_MEMCG_V1 /* Legacy consumer-oriented counters */ struct page_counter kmem; /* v1 only */ struct page_counter tcpmem; /* v1 only */ struct memcg1_events_percpu __percpu *events_percpu; unsigned long soft_limit; /* protected by memcg_oom_lock */ bool oom_lock; int under_oom; /* OOM-Killer disable */ int oom_kill_disable; /* protect arrays of thresholds */ struct mutex thresholds_lock; /* thresholds for memory usage. RCU-protected */ struct mem_cgroup_thresholds thresholds; /* thresholds for mem+swap usage. RCU-protected */ struct mem_cgroup_thresholds memsw_thresholds; /* For oom notifier event fd */ struct list_head oom_notify; /* Legacy tcp memory accounting */ bool tcpmem_active; int tcpmem_pressure; /* List of events which userspace want to receive */ struct list_head event_list; spinlock_t event_list_lock; #endif /* CONFIG_MEMCG_V1 */ struct mem_cgroup_per_node *nodeinfo[]; }; /* * size of first charge trial. * TODO: maybe necessary to use big numbers in big irons or dynamic based of the * workload. */ #define MEMCG_CHARGE_BATCH 64U extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ MEMCG_DATA_OBJEXTS = (1UL << 0), /* page has been accounted as a non-slab kernel page */ MEMCG_DATA_KMEM = (1UL << 1), /* the next bit after the last actual flag */ __NR_MEMCG_DATA_FLAGS = (1UL << 2), }; #define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS #else /* CONFIG_MEMCG */ #define __FIRST_OBJEXT_FLAG (1UL << 0) #endif /* CONFIG_MEMCG */ enum objext_flags { /* slabobj_ext vector failed to allocate */ OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG, /* the next bit after the last actual flag */ __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1), }; #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) #ifdef CONFIG_MEMCG static inline bool folio_memcg_kmem(struct folio *folio); /* * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. * * The caller must ensure that the returned memcg won't be released. */ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) { lockdep_assert_once(rcu_read_lock_held() || lockdep_is_held(&cgroup_mutex)); return READ_ONCE(objcg->memcg); } /* * __folio_memcg - Get the memory cgroup associated with a non-kmem folio * @folio: Pointer to the folio. * * Returns a pointer to the memory cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a * proper memory cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios or * kmem folios. */ static inline struct mem_cgroup *__folio_memcg(struct folio *folio) { unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } /* * __folio_objcg - get the object cgroup associated with a kmem folio. * @folio: Pointer to the folio. * * Returns a pointer to the object cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a * proper object cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios or * LRU folios. */ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) { unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } /* * folio_memcg - Get the memory cgroup associated with a folio. * @folio: Pointer to the folio. * * Returns a pointer to the memory cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a * proper memory cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios. * * For a non-kmem folio any of the following ensures folio and memcg binding * stability: * * - the folio lock * - LRU isolation * - exclusive reference * * For a kmem folio a caller should hold an rcu read lock to protect memcg * associated with a kmem folio from being released. */ static inline struct mem_cgroup *folio_memcg(struct folio *folio) { if (folio_memcg_kmem(folio)) return obj_cgroup_memcg(__folio_objcg(folio)); return __folio_memcg(folio); } /* * folio_memcg_charged - If a folio is charged to a memory cgroup. * @folio: Pointer to the folio. * * Returns true if folio is charged to a memory cgroup, otherwise returns false. */ static inline bool folio_memcg_charged(struct folio *folio) { return folio->memcg_data != 0; } /* * folio_memcg_check - Get the memory cgroup associated with a folio. * @folio: Pointer to the folio. * * Returns a pointer to the memory cgroup associated with the folio, * or NULL. This function unlike folio_memcg() can take any folio * as an argument. It has to be used in cases when it's not known if a folio * has an associated memory cgroup pointer or an object cgroups vector or * an object cgroup. * * For a non-kmem folio any of the following ensures folio and memcg binding * stability: * * - the folio lock * - LRU isolation * - exclusive reference * * For a kmem folio a caller should hold an rcu read lock to protect memcg * associated with a kmem folio from being released. */ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) { /* * Because folio->memcg_data might be changed asynchronously * for slabs, READ_ONCE() should be used here. */ unsigned long memcg_data = READ_ONCE(folio->memcg_data); if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; if (memcg_data & MEMCG_DATA_KMEM) { struct obj_cgroup *objcg; objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); return obj_cgroup_memcg(objcg); } return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } static inline struct mem_cgroup *page_memcg_check(struct page *page) { if (PageTail(page)) return NULL; return folio_memcg_check((struct folio *)page); } static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) { struct mem_cgroup *memcg; rcu_read_lock(); retry: memcg = obj_cgroup_memcg(objcg); if (unlikely(!css_tryget(&memcg->css))) goto retry; rcu_read_unlock(); return memcg; } /* * folio_memcg_kmem - Check if the folio has the memcg_kmem flag set. * @folio: Pointer to the folio. * * Checks if the folio has MemcgKmem flag set. The caller must ensure * that the folio has an associated memory cgroup. It's not safe to call * this function against some types of folios, e.g. slab folios. */ static inline bool folio_memcg_kmem(struct folio *folio) { VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page); VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio); return folio->memcg_data & MEMCG_DATA_KMEM; } static inline bool PageMemcgKmem(struct page *page) { return folio_memcg_kmem(page_folio(page)); } static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg) { return (memcg == root_mem_cgroup); } static inline bool mem_cgroup_disabled(void) { return !cgroup_subsys_enabled(memory_cgrp_subsys); } static inline void mem_cgroup_protection(struct mem_cgroup *root, struct mem_cgroup *memcg, unsigned long *min, unsigned long *low) { *min = *low = 0; if (mem_cgroup_disabled()) return; /* * There is no reclaim protection applied to a targeted reclaim. * We are special casing this specific case here because * mem_cgroup_calculate_protection is not robust enough to keep * the protection invariant for calculated effective values for * parallel reclaimers with different reclaim target. This is * especially a problem for tail memcgs (as they have pages on LRU) * which would want to have effective values 0 for targeted reclaim * but a different value for external reclaim. * * Example * Let's have global and A's reclaim in parallel: * | * A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G) * |\ * | C (low = 1G, usage = 2.5G) * B (low = 1G, usage = 0.5G) * * For the global reclaim * A.elow = A.low * B.elow = min(B.usage, B.low) because children_low_usage <= A.elow * C.elow = min(C.usage, C.low) * * With the effective values resetting we have A reclaim * A.elow = 0 * B.elow = B.low * C.elow = C.low * * If the global reclaim races with A's reclaim then * B.elow = C.elow = 0 because children_low_usage > A.elow) * is possible and reclaiming B would be violating the protection. * */ if (root == memcg) return; *min = READ_ONCE(memcg->memory.emin); *low = READ_ONCE(memcg->memory.elow); } void mem_cgroup_calculate_protection(struct mem_cgroup *root, struct mem_cgroup *memcg); static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, struct mem_cgroup *memcg) { /* * The root memcg doesn't account charges, and doesn't support * protection. The target memcg's protection is ignored, see * mem_cgroup_calculate_protection() and mem_cgroup_protection() */ return mem_cgroup_disabled() || mem_cgroup_is_root(memcg) || memcg == target; } static inline bool mem_cgroup_below_low(struct mem_cgroup *target, struct mem_cgroup *memcg) { if (mem_cgroup_unprotected(target, memcg)) return false; return READ_ONCE(memcg->memory.elow) >= page_counter_read(&memcg->memory); } static inline bool mem_cgroup_below_min(struct mem_cgroup *target, struct mem_cgroup *memcg) { if (mem_cgroup_unprotected(target, memcg)) return false; return READ_ONCE(memcg->memory.emin) >= page_counter_read(&memcg->memory); } int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); /** * mem_cgroup_charge - Charge a newly allocated folio to a cgroup. * @folio: Folio to charge. * @mm: mm context of the allocating task. * @gfp: Reclaim mode. * * Try to charge @folio to the memcg that @mm belongs to, reclaiming * pages according to @gfp if necessary. If @mm is NULL, try to * charge to the active memcg. * * Do not use this for folios allocated for swapin. * * Return: 0 on success. Otherwise, an error code is returned. */ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) { if (mem_cgroup_disabled()) return 0; return __mem_cgroup_charge(folio, mm, gfp); } int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp); int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void __mem_cgroup_uncharge(struct folio *folio); /** * mem_cgroup_uncharge - Uncharge a folio. * @folio: Folio to uncharge. * * Uncharge a folio previously charged with mem_cgroup_charge(). */ static inline void mem_cgroup_uncharge(struct folio *folio) { if (mem_cgroup_disabled()) return; __mem_cgroup_uncharge(folio); } void __mem_cgroup_uncharge_folios(struct folio_batch *folios); static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios) { if (mem_cgroup_disabled()) return; __mem_cgroup_uncharge_folios(folios); } void mem_cgroup_replace_folio(struct folio *old, struct folio *new); void mem_cgroup_migrate(struct folio *old, struct folio *new); /** * mem_cgroup_lruvec - get the lru list vector for a memcg & node * @memcg: memcg of the wanted lruvec * @pgdat: pglist_data * * Returns the lru list vector holding pages for a given @memcg & * @pgdat combination. This can be the node lruvec, if the memory * controller is disabled. */ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct pglist_data *pgdat) { struct mem_cgroup_per_node *mz; struct lruvec *lruvec; if (mem_cgroup_disabled()) { lruvec = &pgdat->__lruvec; goto out; } if (!memcg) memcg = root_mem_cgroup; mz = memcg->nodeinfo[pgdat->node_id]; lruvec = &mz->lruvec; out: /* * Since a node can be onlined after the mem_cgroup was created, * we have to be prepared to initialize lruvec->pgdat here; * and if offlined then reonlined, we need to reinitialize it. */ if (unlikely(lruvec->pgdat != pgdat)) lruvec->pgdat = pgdat; return lruvec; } /** * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * * This function relies on folio->mem_cgroup being stable. */ static inline struct lruvec *folio_lruvec(struct folio *folio) { struct mem_cgroup *memcg = folio_memcg(folio); VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled(), folio); return mem_cgroup_lruvec(memcg, folio_pgdat(folio)); } struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); struct mem_cgroup *get_mem_cgroup_from_current(void); struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio); struct lruvec *folio_lruvec_lock(struct folio *folio); struct lruvec *folio_lruvec_lock_irq(struct folio *folio); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); #ifdef CONFIG_DEBUG_VM void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); #else static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) { } #endif static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; } static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg) { return percpu_ref_tryget(&objcg->refcnt); } static inline void obj_cgroup_get(struct obj_cgroup *objcg) { percpu_ref_get(&objcg->refcnt); } static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, unsigned long nr) { percpu_ref_get_many(&objcg->refcnt, nr); } static inline void obj_cgroup_put(struct obj_cgroup *objcg) { if (objcg) percpu_ref_put(&objcg->refcnt); } static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg) { return !memcg || css_tryget(&memcg->css); } static inline bool mem_cgroup_tryget_online(struct mem_cgroup *memcg) { return !memcg || css_tryget_online(&memcg->css); } static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) css_put(&memcg->css); } #define mem_cgroup_from_counter(counter, member) \ container_of(counter, struct mem_cgroup, member) struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, struct mem_cgroup *, struct mem_cgroup_reclaim_cookie *); void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *); void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, int (*)(struct task_struct *, void *), void *arg); static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { if (mem_cgroup_disabled()) return 0; return memcg->id.id; } struct mem_cgroup *mem_cgroup_from_id(unsigned short id); #ifdef CONFIG_SHRINKER_DEBUG static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg) { return memcg ? cgroup_ino(memcg->css.cgroup) : 0; } struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino); #endif static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m) { return mem_cgroup_from_css(seq_css(m)); } static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec) { struct mem_cgroup_per_node *mz; if (mem_cgroup_disabled()) return NULL; mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); return mz->memcg; } /** * parent_mem_cgroup - find the accounting parent of a memcg * @memcg: memcg whose parent to find * * Returns the parent memcg, or NULL if this is the root. */ static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return mem_cgroup_from_css(memcg->css.parent); } static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, struct mem_cgroup *root) { if (root == memcg) return true; return cgroup_is_descendant(memcg->css.cgroup, root->css.cgroup); } static inline bool mm_match_cgroup(struct mm_struct *mm, struct mem_cgroup *memcg) { struct mem_cgroup *task_memcg; bool match = false; rcu_read_lock(); task_memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); if (task_memcg) match = mem_cgroup_is_descendant(task_memcg, memcg); rcu_read_unlock(); return match; } struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio); ino_t page_cgroup_ino(struct page *page); static inline bool mem_cgroup_online(struct mem_cgroup *memcg) { if (mem_cgroup_disabled()) return true; return !!(memcg->css.flags & CSS_ONLINE); } void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages); static inline unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) { struct mem_cgroup_per_node *mz; mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); return READ_ONCE(mz->lru_zone_size[zone_idx][lru]); } void mem_cgroup_handle_over_high(gfp_t gfp_mask); unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg); unsigned long mem_cgroup_size(struct mem_cgroup *memcg); void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p); void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg); struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); /* idx can be of type enum memcg_stat_item or node_stat_item */ void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val); static inline void mod_memcg_page_state(struct page *page, enum memcg_stat_item idx, int val) { struct mem_cgroup *memcg; if (mem_cgroup_disabled()) return; rcu_read_lock(); memcg = folio_memcg(page_folio(page)); if (memcg) mod_memcg_state(memcg, idx, val); rcu_read_unlock(); } unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx); unsigned long lruvec_page_state_local(struct lruvec *lruvec, enum node_stat_item idx); void mem_cgroup_flush_stats(struct mem_cgroup *memcg); void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg); void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val); static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val) { unsigned long flags; local_irq_save(flags); __mod_lruvec_kmem_state(p, idx, val); local_irq_restore(flags); } void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count); static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { struct mem_cgroup *memcg = folio_memcg(folio); if (memcg) count_memcg_events(memcg, idx, nr); } static inline void count_memcg_events_mm(struct mm_struct *mm, enum vm_event_item idx, unsigned long count) { struct mem_cgroup *memcg; if (mem_cgroup_disabled()) return; rcu_read_lock(); memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); if (likely(memcg)) count_memcg_events(memcg, idx, count); rcu_read_unlock(); } static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { count_memcg_events_mm(mm, idx, 1); } static inline void memcg_memory_event(struct mem_cgroup *memcg, enum memcg_memory_event event) { bool swap_event = event == MEMCG_SWAP_HIGH || event == MEMCG_SWAP_MAX || event == MEMCG_SWAP_FAIL; atomic_long_inc(&memcg->memory_events_local[event]); if (!swap_event) cgroup_file_notify(&memcg->events_local_file); do { atomic_long_inc(&memcg->memory_events[event]); if (swap_event) cgroup_file_notify(&memcg->swap_events_file); else cgroup_file_notify(&memcg->events_file); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) break; if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) break; } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); } static inline void memcg_memory_event_mm(struct mm_struct *mm, enum memcg_memory_event event) { struct mem_cgroup *memcg; if (mem_cgroup_disabled()) return; rcu_read_lock(); memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); if (likely(memcg)) memcg_memory_event(memcg, event); rcu_read_unlock(); } void split_page_memcg(struct page *first, unsigned order); void folio_split_memcg_refs(struct folio *folio, unsigned old_order, unsigned new_order); static inline u64 cgroup_id_from_mm(struct mm_struct *mm) { struct mem_cgroup *memcg; u64 id; if (mem_cgroup_disabled()) return 0; rcu_read_lock(); memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); if (!memcg) memcg = root_mem_cgroup; id = cgroup_id(memcg->css.cgroup); rcu_read_unlock(); return id; } extern int mem_cgroup_init(void); #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 static inline struct mem_cgroup *folio_memcg(struct folio *folio) { return NULL; } static inline bool folio_memcg_charged(struct folio *folio) { return false; } static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) { return NULL; } static inline struct mem_cgroup *page_memcg_check(struct page *page) { return NULL; } static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) { return NULL; } static inline bool folio_memcg_kmem(struct folio *folio) { return false; } static inline bool PageMemcgKmem(struct page *page) { return false; } static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg) { return true; } static inline bool mem_cgroup_disabled(void) { return true; } static inline void memcg_memory_event(struct mem_cgroup *memcg, enum memcg_memory_event event) { } static inline void memcg_memory_event_mm(struct mm_struct *mm, enum memcg_memory_event event) { } static inline void mem_cgroup_protection(struct mem_cgroup *root, struct mem_cgroup *memcg, unsigned long *min, unsigned long *low) { *min = *low = 0; } static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, struct mem_cgroup *memcg) { } static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, struct mem_cgroup *memcg) { return true; } static inline bool mem_cgroup_below_low(struct mem_cgroup *target, struct mem_cgroup *memcg) { return false; } static inline bool mem_cgroup_below_min(struct mem_cgroup *target, struct mem_cgroup *memcg) { return false; } static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) { return 0; } static inline int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp) { return 0; } static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { return 0; } static inline void mem_cgroup_uncharge(struct folio *folio) { } static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios) { } static inline void mem_cgroup_replace_folio(struct folio *old, struct folio *new) { } static inline void mem_cgroup_migrate(struct folio *old, struct folio *new) { } static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct pglist_data *pgdat) { return &pgdat->__lruvec; } static inline struct lruvec *folio_lruvec(struct folio *folio) { struct pglist_data *pgdat = folio_pgdat(folio); return &pgdat->__lruvec; } static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) { } static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; } static inline bool mm_match_cgroup(struct mm_struct *mm, struct mem_cgroup *memcg) { return true; } static inline struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) { return NULL; } static inline struct mem_cgroup *get_mem_cgroup_from_current(void) { return NULL; } static inline struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { return NULL; } static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css) { return NULL; } static inline void obj_cgroup_get(struct obj_cgroup *objcg) { } static inline void obj_cgroup_put(struct obj_cgroup *objcg) { } static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg) { return true; } static inline bool mem_cgroup_tryget_online(struct mem_cgroup *memcg) { return true; } static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } static inline struct lruvec *folio_lruvec_lock(struct folio *folio) { struct pglist_data *pgdat = folio_pgdat(folio); spin_lock(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } static inline struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { struct pglist_data *pgdat = folio_pgdat(folio); spin_lock_irq(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } static inline struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flagsp) { struct pglist_data *pgdat = folio_pgdat(folio); spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); return &pgdat->__lruvec; } static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, struct mem_cgroup_reclaim_cookie *reclaim) { return NULL; } static inline void mem_cgroup_iter_break(struct mem_cgroup *root, struct mem_cgroup *prev) { } static inline void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, int (*fn)(struct task_struct *, void *), void *arg) { } static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { return 0; } static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id) { WARN_ON_ONCE(id); /* XXX: This should always return root_mem_cgroup */ return NULL; } #ifdef CONFIG_SHRINKER_DEBUG static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg) { return 0; } static inline struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino) { return NULL; } #endif static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m) { return NULL; } static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec) { return NULL; } static inline bool mem_cgroup_online(struct mem_cgroup *memcg) { return true; } static inline unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) { return 0; } static inline unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg) { return 0; } static inline unsigned long mem_cgroup_size(struct mem_cgroup *memcg) { return 0; } static inline void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p) { } static inline void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { } static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask) { } static inline struct mem_cgroup *mem_cgroup_get_oom_group( struct task_struct *victim, struct mem_cgroup *oom_domain) { return NULL; } static inline void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) { } static inline void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int nr) { } static inline void mod_memcg_page_state(struct page *page, enum memcg_stat_item idx, int val) { } static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { return 0; } static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { return node_page_state(lruvec_pgdat(lruvec), idx); } static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, enum node_stat_item idx) { return node_page_state(lruvec_pgdat(lruvec), idx); } static inline void mem_cgroup_flush_stats(struct mem_cgroup *memcg) { } static inline void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg) { } static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val) { struct page *page = virt_to_head_page(p); __mod_node_page_state(page_pgdat(page), idx, val); } static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val) { struct page *page = virt_to_head_page(p); mod_node_page_state(page_pgdat(page), idx, val); } static inline void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) { } static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { } static inline void count_memcg_events_mm(struct mm_struct *mm, enum vm_event_item idx, unsigned long count) { } static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { } static inline void split_page_memcg(struct page *first, unsigned order) { } static inline void folio_split_memcg_refs(struct folio *folio, unsigned old_order, unsigned new_order) { } static inline u64 cgroup_id_from_mm(struct mm_struct *mm) { return 0; } static inline int mem_cgroup_init(void) { return 0; } #endif /* CONFIG_MEMCG */ /* * Extended information for slab objects stored as an array in page->memcg_data * if MEMCG_DATA_OBJEXTS is set. */ struct slabobj_ext { #ifdef CONFIG_MEMCG struct obj_cgroup *objcg; #endif #ifdef CONFIG_MEM_ALLOC_PROFILING union codetag_ref ref; #endif } __aligned(8); static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx) { __mod_lruvec_kmem_state(p, idx, 1); } static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx) { __mod_lruvec_kmem_state(p, idx, -1); } static inline struct lruvec *parent_lruvec(struct lruvec *lruvec) { struct mem_cgroup *memcg; memcg = lruvec_memcg(lruvec); if (!memcg) return NULL; memcg = parent_mem_cgroup(memcg); if (!memcg) return NULL; return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } static inline void unlock_page_lruvec(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); } static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); } static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); } /* Test requires a stable folio->memcg binding, see folio_memcg() */ static inline bool folio_matches_lruvec(struct folio *folio, struct lruvec *lruvec) { return lruvec_pgdat(lruvec) == folio_pgdat(folio) && lruvec_memcg(lruvec) == folio_memcg(folio); } /* Don't lock again iff page's lruvec locked */ static inline struct lruvec *folio_lruvec_relock_irq(struct folio *folio, struct lruvec *locked_lruvec) { if (locked_lruvec) { if (folio_matches_lruvec(folio, locked_lruvec)) return locked_lruvec; unlock_page_lruvec_irq(locked_lruvec); } return folio_lruvec_lock_irq(folio); } /* Don't lock again iff folio's lruvec locked */ static inline void folio_lruvec_relock_irqsave(struct folio *folio, struct lruvec **lruvecp, unsigned long *flags) { if (*lruvecp) { if (folio_matches_lruvec(folio, *lruvecp)) return; unlock_page_lruvec_irqrestore(*lruvecp, *flags); } *lruvecp = folio_lruvec_lock_irqsave(folio, flags); } #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, unsigned long *pheadroom, unsigned long *pdirty, unsigned long *pwriteback); void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio, struct bdi_writeback *wb); static inline void mem_cgroup_track_foreign_dirty(struct folio *folio, struct bdi_writeback *wb) { struct mem_cgroup *memcg; if (mem_cgroup_disabled()) return; memcg = folio_memcg(folio); if (unlikely(memcg && &memcg->css != wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(folio, wb); } void mem_cgroup_flush_foreign(struct bdi_writeback *wb); #else /* CONFIG_CGROUP_WRITEBACK */ static inline struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) { return NULL; } static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, unsigned long *pheadroom, unsigned long *pdirty, unsigned long *pwriteback) { } static inline void mem_cgroup_track_foreign_dirty(struct folio *folio, struct bdi_writeback *wb) { } static inline void mem_cgroup_flush_foreign(struct bdi_writeback *wb) { } #endif /* CONFIG_CGROUP_WRITEBACK */ struct sock; bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, gfp_t gfp_mask); void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages); #ifdef CONFIG_MEMCG extern struct static_key_false memcg_sockets_enabled_key; #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key) void mem_cgroup_sk_alloc(struct sock *sk); void mem_cgroup_sk_free(struct sock *sk); static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { #ifdef CONFIG_MEMCG_V1 if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return !!memcg->tcpmem_pressure; #endif /* CONFIG_MEMCG_V1 */ do { if (time_before(jiffies, READ_ONCE(memcg->socket_pressure))) return true; } while ((memcg = parent_mem_cgroup(memcg))); return false; } int alloc_shrinker_info(struct mem_cgroup *memcg); void free_shrinker_info(struct mem_cgroup *memcg); void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id); void reparent_shrinker_deferred(struct mem_cgroup *memcg); #else #define mem_cgroup_sockets_enabled 0 static inline void mem_cgroup_sk_alloc(struct sock *sk) { }; static inline void mem_cgroup_sk_free(struct sock *sk) { }; static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { return false; } static inline void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) { } #endif #ifdef CONFIG_MEMCG bool mem_cgroup_kmem_disabled(void); int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); /* * The returned objcg pointer is safe to use without additional * protection within a scope. The scope is defined either by * the current task (similar to the "current" global variable) * or by set_active_memcg() pair. * Please, use obj_cgroup_get() to get a reference if the pointer * needs to be used outside of the local scope. */ struct obj_cgroup *current_obj_cgroup(void); struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio); static inline struct obj_cgroup *get_obj_cgroup_from_current(void) { struct obj_cgroup *objcg = current_obj_cgroup(); if (objcg) obj_cgroup_get(objcg); return objcg; } int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); extern struct static_key_false memcg_bpf_enabled_key; static inline bool memcg_bpf_enabled(void) { return static_branch_likely(&memcg_bpf_enabled_key); } extern struct static_key_false memcg_kmem_online_key; static inline bool memcg_kmem_online(void) { return static_branch_likely(&memcg_kmem_online_key); } static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { if (memcg_kmem_online()) return __memcg_kmem_charge_page(page, gfp, order); return 0; } static inline void memcg_kmem_uncharge_page(struct page *page, int order) { if (memcg_kmem_online()) __memcg_kmem_uncharge_page(page, order); } /* * A helper for accessing memcg's kmem_id, used for getting * corresponding LRU lists. */ static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return memcg ? memcg->kmemcg_id : -1; } struct mem_cgroup *mem_cgroup_from_slab_obj(void *p); static inline void count_objcg_events(struct obj_cgroup *objcg, enum vm_event_item idx, unsigned long count) { struct mem_cgroup *memcg; if (!memcg_kmem_online()) return; rcu_read_lock(); memcg = obj_cgroup_memcg(objcg); count_memcg_events(memcg, idx, count); rcu_read_unlock(); } bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid); #else static inline bool mem_cgroup_kmem_disabled(void) { return true; } static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { return 0; } static inline void memcg_kmem_uncharge_page(struct page *page, int order) { } static inline int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { return 0; } static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) { return NULL; } static inline bool memcg_bpf_enabled(void) { return false; } static inline bool memcg_kmem_online(void) { return false; } static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return -1; } static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) { return NULL; } static inline void count_objcg_events(struct obj_cgroup *objcg, enum vm_event_item idx, unsigned long count) { } static inline ino_t page_cgroup_ino(struct page *page) { return 0; } static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid) { return true; } #endif /* CONFIG_MEMCG */ #if defined(CONFIG_MEMCG) && defined(CONFIG_ZSWAP) bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size); bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg); #else static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) { return true; } static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size) { } static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) { } static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) { /* if zswap is disabled, do not block pages going to the swapping device */ return true; } #endif /* Cgroup v1-related declarations */ #ifdef CONFIG_MEMCG_V1 unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); bool mem_cgroup_oom_synchronize(bool wait); static inline bool task_in_memcg_oom(struct task_struct *p) { return p->memcg_in_oom; } static inline void mem_cgroup_enter_user_fault(void) { WARN_ON(current->in_user_fault); current->in_user_fault = 1; } static inline void mem_cgroup_exit_user_fault(void) { WARN_ON(!current->in_user_fault); current->in_user_fault = 0; } void memcg1_swapout(struct folio *folio, swp_entry_t entry); void memcg1_swapin(swp_entry_t entry, unsigned int nr_pages); #else /* CONFIG_MEMCG_V1 */ static inline unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned) { return 0; } static inline bool task_in_memcg_oom(struct task_struct *p) { return false; } static inline bool mem_cgroup_oom_synchronize(bool wait) { return false; } static inline void mem_cgroup_enter_user_fault(void) { } static inline void mem_cgroup_exit_user_fault(void) { } static inline void memcg1_swapout(struct folio *folio, swp_entry_t entry) { } static inline void memcg1_swapin(swp_entry_t entry, unsigned int nr_pages) { } #endif /* CONFIG_MEMCG_V1 */ #endif /* _LINUX_MEMCONTROL_H */ |
| 15 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_TRACE_EVENT_H #define _LINUX_TRACE_EVENT_H #include <linux/ring_buffer.h> #include <linux/trace_seq.h> #include <linux/percpu.h> #include <linux/hardirq.h> #include <linux/perf_event.h> #include <linux/tracepoint.h> struct trace_array; struct array_buffer; struct tracer; struct dentry; struct bpf_prog; union bpf_attr; /* Used for event string fields when they are NULL */ #define EVENT_NULL_STR "(null)" const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); #if BITS_PER_LONG == 32 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, unsigned long long flags, const struct trace_print_flags_u64 *flag_array); const char *trace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, const struct trace_print_flags_u64 *symbol_array); #endif const char *trace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr, unsigned int bitmask_size); const char *trace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len, bool concatenate); const char *trace_print_array_seq(struct trace_seq *p, const void *buf, int count, size_t el_size); const char * trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str, int prefix_type, int rowsize, int groupsize, const void *buf, size_t len, bool ascii); struct trace_iterator; struct trace_event; int trace_raw_output_prep(struct trace_iterator *iter, struct trace_event *event); extern __printf(2, 3) void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...); /* Used to find the offset and length of dynamic fields in trace events */ struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 len; u16 offset; #else u16 offset; u16 len; #endif } __packed; /* * The trace entry - the most basic unit of tracing. This is what * is printed in the end as a single line in the trace output, such as: * * bash-15816 [01] 235.197585: idle_cpu <- irq_enter */ struct trace_entry { unsigned short type; unsigned char flags; unsigned char preempt_count; int pid; }; #define TRACE_EVENT_TYPE_MAX \ ((1 << (sizeof(((struct trace_entry *)0)->type) * 8)) - 1) /* * Trace iterator - used by printout routines who present trace * results to users and which routines might sleep, etc: */ struct trace_iterator { struct trace_array *tr; struct tracer *trace; struct array_buffer *array_buffer; void *private; int cpu_file; struct mutex mutex; struct ring_buffer_iter **buffer_iter; unsigned long iter_flags; void *temp; /* temp holder */ unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; cpumask_var_t started; /* Set when the file is closed to prevent new waiters */ bool closed; /* it's true when current open file is snapshot */ bool snapshot; /* The below is zeroed out in pipe_read */ struct trace_seq seq; struct trace_entry *ent; unsigned long lost_events; int leftover; int ent_size; int cpu; u64 ts; loff_t pos; long idx; /* All new field here will be zeroed out in pipe_read */ }; enum trace_iter_flags { TRACE_FILE_LAT_FMT = 1, TRACE_FILE_ANNOTATE = 2, TRACE_FILE_TIME_IN_NS = 4, }; typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter, int flags, struct trace_event *event); struct trace_event_functions { trace_print_func trace; trace_print_func raw; trace_print_func hex; trace_print_func binary; }; struct trace_event { struct hlist_node node; int type; struct trace_event_functions *funcs; }; extern int register_trace_event(struct trace_event *event); extern int unregister_trace_event(struct trace_event *event); /* Return values for print_line callback */ enum print_line_t { TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */ TRACE_TYPE_HANDLED = 1, TRACE_TYPE_UNHANDLED = 2, /* Relay to other output functions */ TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */ }; enum print_line_t trace_handle_return(struct trace_seq *s); static inline void tracing_generic_entry_update(struct trace_entry *entry, unsigned short type, unsigned int trace_ctx) { entry->preempt_count = trace_ctx & 0xff; entry->pid = current->pid; entry->type = type; entry->flags = trace_ctx >> 16; } unsigned int tracing_gen_ctx_irq_test(unsigned int irqs_status); enum trace_flag_type { TRACE_FLAG_IRQS_OFF = 0x01, TRACE_FLAG_NEED_RESCHED_LAZY = 0x02, TRACE_FLAG_NEED_RESCHED = 0x04, TRACE_FLAG_HARDIRQ = 0x08, TRACE_FLAG_SOFTIRQ = 0x10, TRACE_FLAG_PREEMPT_RESCHED = 0x20, TRACE_FLAG_NMI = 0x40, TRACE_FLAG_BH_OFF = 0x80, }; static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags) { unsigned int irq_status = irqs_disabled_flags(irqflags) ? TRACE_FLAG_IRQS_OFF : 0; return tracing_gen_ctx_irq_test(irq_status); } static inline unsigned int tracing_gen_ctx(void) { unsigned long irqflags; local_save_flags(irqflags); return tracing_gen_ctx_flags(irqflags); } static inline unsigned int tracing_gen_ctx_dec(void) { unsigned int trace_ctx; trace_ctx = tracing_gen_ctx(); /* * Subtract one from the preemption counter if preemption is enabled, * see trace_event_buffer_reserve()for details. */ if (IS_ENABLED(CONFIG_PREEMPTION)) trace_ctx--; return trace_ctx; } struct trace_event_file; struct ring_buffer_event * trace_event_buffer_lock_reserve(struct trace_buffer **current_buffer, struct trace_event_file *trace_file, int type, unsigned long len, unsigned int trace_ctx); #define TRACE_RECORD_CMDLINE BIT(0) #define TRACE_RECORD_TGID BIT(1) void tracing_record_taskinfo(struct task_struct *task, int flags); void tracing_record_taskinfo_sched_switch(struct task_struct *prev, struct task_struct *next, int flags); void tracing_record_cmdline(struct task_struct *task); void tracing_record_tgid(struct task_struct *task); int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...) __printf(3, 4); struct event_filter; enum trace_reg { TRACE_REG_REGISTER, TRACE_REG_UNREGISTER, #ifdef CONFIG_PERF_EVENTS TRACE_REG_PERF_REGISTER, TRACE_REG_PERF_UNREGISTER, TRACE_REG_PERF_OPEN, TRACE_REG_PERF_CLOSE, /* * These (ADD/DEL) use a 'boolean' return value, where 1 (true) means a * custom action was taken and the default action is not to be * performed. */ TRACE_REG_PERF_ADD, TRACE_REG_PERF_DEL, #endif }; struct trace_event_call; #define TRACE_FUNCTION_TYPE ((const char *)~0UL) struct trace_event_fields { const char *type; union { struct { const char *name; const int size; const int align; const unsigned int is_signed:1; unsigned int needs_test:1; const int filter_type; const int len; }; int (*define_fields)(struct trace_event_call *); }; }; struct trace_event_class { const char *system; void *probe; #ifdef CONFIG_PERF_EVENTS void *perf_probe; #endif int (*reg)(struct trace_event_call *event, enum trace_reg type, void *data); struct trace_event_fields *fields_array; struct list_head *(*get_fields)(struct trace_event_call *); struct list_head fields; int (*raw_init)(struct trace_event_call *); }; extern int trace_event_reg(struct trace_event_call *event, enum trace_reg type, void *data); struct trace_event_buffer { struct trace_buffer *buffer; struct ring_buffer_event *event; struct trace_event_file *trace_file; void *entry; unsigned int trace_ctx; struct pt_regs *regs; }; void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer, struct trace_event_file *trace_file, unsigned long len); void trace_event_buffer_commit(struct trace_event_buffer *fbuffer); enum { TRACE_EVENT_FL_CAP_ANY_BIT, TRACE_EVENT_FL_NO_SET_FILTER_BIT, TRACE_EVENT_FL_IGNORE_ENABLE_BIT, TRACE_EVENT_FL_TRACEPOINT_BIT, TRACE_EVENT_FL_DYNAMIC_BIT, TRACE_EVENT_FL_KPROBE_BIT, TRACE_EVENT_FL_UPROBE_BIT, TRACE_EVENT_FL_EPROBE_BIT, TRACE_EVENT_FL_FPROBE_BIT, TRACE_EVENT_FL_CUSTOM_BIT, TRACE_EVENT_FL_TEST_STR_BIT, }; /* * Event flags: * CAP_ANY - Any user can enable for perf * NO_SET_FILTER - Set when filter has error and is to be ignored * IGNORE_ENABLE - For trace internal events, do not enable with debugfs file * TRACEPOINT - Event is a tracepoint * DYNAMIC - Event is a dynamic event (created at run time) * KPROBE - Event is a kprobe * UPROBE - Event is a uprobe * EPROBE - Event is an event probe * FPROBE - Event is an function probe * CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint) * This is set when the custom event has not been attached * to a tracepoint yet, then it is cleared when it is. * TEST_STR - The event has a "%s" that points to a string outside the event */ enum { TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT), TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), TRACE_EVENT_FL_DYNAMIC = (1 << TRACE_EVENT_FL_DYNAMIC_BIT), TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT), TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT), TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT), TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT), TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT), TRACE_EVENT_FL_TEST_STR = (1 << TRACE_EVENT_FL_TEST_STR_BIT), }; #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE) struct trace_event_call { struct list_head list; struct trace_event_class *class; union { const char *name; /* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */ struct tracepoint *tp; }; struct trace_event event; char *print_fmt; /* * Static events can disappear with modules, * where as dynamic ones need their own ref count. */ union { void *module; atomic_t refcnt; }; void *data; /* See the TRACE_EVENT_FL_* flags above */ int flags; /* static flags of different events */ #ifdef CONFIG_PERF_EVENTS int perf_refcount; struct hlist_head __percpu *perf_events; struct bpf_prog_array __rcu *prog_array; int (*perf_perm)(struct trace_event_call *, struct perf_event *); #endif }; #ifdef CONFIG_DYNAMIC_EVENTS bool trace_event_dyn_try_get_ref(struct trace_event_call *call); void trace_event_dyn_put_ref(struct trace_event_call *call); bool trace_event_dyn_busy(struct trace_event_call *call); #else static inline bool trace_event_dyn_try_get_ref(struct trace_event_call *call) { /* Without DYNAMIC_EVENTS configured, nothing should be calling this */ return false; } static inline void trace_event_dyn_put_ref(struct trace_event_call *call) { } static inline bool trace_event_dyn_busy(struct trace_event_call *call) { /* Nothing should call this without DYNAIMIC_EVENTS configured. */ return true; } #endif static inline bool trace_event_try_get_ref(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_DYNAMIC) return trace_event_dyn_try_get_ref(call); else return try_module_get(call->module); } static inline void trace_event_put_ref(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_DYNAMIC) trace_event_dyn_put_ref(call); else module_put(call->module); } #ifdef CONFIG_PERF_EVENTS static inline bool bpf_prog_array_valid(struct trace_event_call *call) { /* * This inline function checks whether call->prog_array * is valid or not. The function is called in various places, * outside rcu_read_lock/unlock, as a heuristic to speed up execution. * * If this function returns true, and later call->prog_array * becomes false inside rcu_read_lock/unlock region, * we bail out then. If this function return false, * there is a risk that we might miss a few events if the checking * were delayed until inside rcu_read_lock/unlock region and * call->prog_array happened to become non-NULL then. * * Here, READ_ONCE() is used instead of rcu_access_pointer(). * rcu_access_pointer() requires the actual definition of * "struct bpf_prog_array" while READ_ONCE() only needs * a declaration of the same type. */ return !!READ_ONCE(call->prog_array); } #endif static inline const char * trace_event_name(struct trace_event_call *call) { if (call->flags & TRACE_EVENT_FL_CUSTOM) return call->name; else if (call->flags & TRACE_EVENT_FL_TRACEPOINT) return call->tp ? call->tp->name : NULL; else return call->name; } static inline struct list_head * trace_get_fields(struct trace_event_call *event_call) { if (!event_call->class->get_fields) return &event_call->class->fields; return event_call->class->get_fields(event_call); } struct trace_subsystem_dir; enum { EVENT_FILE_FL_ENABLED_BIT, EVENT_FILE_FL_RECORDED_CMD_BIT, EVENT_FILE_FL_RECORDED_TGID_BIT, EVENT_FILE_FL_FILTERED_BIT, EVENT_FILE_FL_NO_SET_FILTER_BIT, EVENT_FILE_FL_SOFT_MODE_BIT, EVENT_FILE_FL_SOFT_DISABLED_BIT, EVENT_FILE_FL_TRIGGER_MODE_BIT, EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, EVENT_FILE_FL_FREED_BIT, }; extern struct trace_event_file *trace_get_event_file(const char *instance, const char *system, const char *event); extern void trace_put_event_file(struct trace_event_file *file); #define MAX_DYNEVENT_CMD_LEN (2048) enum dynevent_type { DYNEVENT_TYPE_SYNTH = 1, DYNEVENT_TYPE_KPROBE, DYNEVENT_TYPE_NONE, }; struct dynevent_cmd; typedef int (*dynevent_create_fn_t)(struct dynevent_cmd *cmd); struct dynevent_cmd { struct seq_buf seq; const char *event_name; unsigned int n_fields; enum dynevent_type type; dynevent_create_fn_t run_command; void *private_data; }; extern int dynevent_create(struct dynevent_cmd *cmd); extern int synth_event_delete(const char *name); extern void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen); extern int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name, struct module *mod, ...); #define synth_event_gen_cmd_start(cmd, name, mod, ...) \ __synth_event_gen_cmd_start(cmd, name, mod, ## __VA_ARGS__, NULL) struct synth_field_desc { const char *type; const char *name; }; extern int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name, struct module *mod, struct synth_field_desc *fields, unsigned int n_fields); extern int synth_event_create(const char *name, struct synth_field_desc *fields, unsigned int n_fields, struct module *mod); extern int synth_event_add_field(struct dynevent_cmd *cmd, const char *type, const char *name); extern int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name); extern int synth_event_add_fields(struct dynevent_cmd *cmd, struct synth_field_desc *fields, unsigned int n_fields); #define synth_event_gen_cmd_end(cmd) \ dynevent_create(cmd) struct synth_event; struct synth_event_trace_state { struct trace_event_buffer fbuffer; struct synth_trace_event *entry; struct trace_buffer *buffer; struct synth_event *event; unsigned int cur_field; unsigned int n_u64; bool disabled; bool add_next; bool add_name; }; extern int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...); extern int synth_event_trace_array(struct trace_event_file *file, u64 *vals, unsigned int n_vals); extern int synth_event_trace_start(struct trace_event_file *file, struct synth_event_trace_state *trace_state); extern int synth_event_add_next_val(u64 val, struct synth_event_trace_state *trace_state); extern int synth_event_add_val(const char *field_name, u64 val, struct synth_event_trace_state *trace_state); extern int synth_event_trace_end(struct synth_event_trace_state *trace_state); extern int kprobe_event_delete(const char *name); extern void kprobe_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen); #define kprobe_event_gen_cmd_start(cmd, name, loc, ...) \ __kprobe_event_gen_cmd_start(cmd, false, name, loc, ## __VA_ARGS__, NULL) #define kretprobe_event_gen_cmd_start(cmd, name, loc, ...) \ __kprobe_event_gen_cmd_start(cmd, true, name, loc, ## __VA_ARGS__, NULL) extern int __kprobe_event_gen_cmd_start(struct dynevent_cmd *cmd, bool kretprobe, const char *name, const char *loc, ...); #define kprobe_event_add_fields(cmd, ...) \ __kprobe_event_add_fields(cmd, ## __VA_ARGS__, NULL) #define kprobe_event_add_field(cmd, field) \ __kprobe_event_add_fields(cmd, field, NULL) extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); #define kprobe_event_gen_cmd_end(cmd) \ dynevent_create(cmd) #define kretprobe_event_gen_cmd_end(cmd) \ dynevent_create(cmd) /* * Event file flags: * ENABLED - The event is enabled * RECORDED_CMD - The comms should be recorded at sched_switch * RECORDED_TGID - The tgids should be recorded at sched_switch * FILTERED - The event has a filter attached * NO_SET_FILTER - Set when filter has error and is to be ignored * SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED * SOFT_DISABLED - When set, do not trace the event (even though its * tracepoint may be enabled) * TRIGGER_MODE - When set, invoke the triggers associated with the event * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), EVENT_FILE_FL_RECORDED_CMD = (1 << EVENT_FILE_FL_RECORDED_CMD_BIT), EVENT_FILE_FL_RECORDED_TGID = (1 << EVENT_FILE_FL_RECORDED_TGID_BIT), EVENT_FILE_FL_FILTERED = (1 << EVENT_FILE_FL_FILTERED_BIT), EVENT_FILE_FL_NO_SET_FILTER = (1 << EVENT_FILE_FL_NO_SET_FILTER_BIT), EVENT_FILE_FL_SOFT_MODE = (1 << EVENT_FILE_FL_SOFT_MODE_BIT), EVENT_FILE_FL_SOFT_DISABLED = (1 << EVENT_FILE_FL_SOFT_DISABLED_BIT), EVENT_FILE_FL_TRIGGER_MODE = (1 << EVENT_FILE_FL_TRIGGER_MODE_BIT), EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), }; struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; struct eventfs_inode *ei; struct trace_array *tr; struct trace_subsystem_dir *system; struct list_head triggers; /* * 32 bit flags: * bit 0: enabled * bit 1: enabled cmd record * bit 2: enable/disable with the soft disable bit * bit 3: soft disabled * bit 4: trigger enabled * * Note: The bits must be set atomically to prevent races * from other writers. Reads of flags do not need to be in * sync as they occur in critical sections. But the way flags * is currently used, these changes do not affect the code * except that when a change is made, it may have a slight * delay in propagating the changes to other CPUs due to * caching and such. Which is mostly OK ;-) */ unsigned long flags; refcount_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; #ifdef CONFIG_HIST_TRIGGERS extern struct irq_work hist_poll_work; extern wait_queue_head_t hist_poll_wq; static inline void hist_poll_wakeup(void) { if (wq_has_sleeper(&hist_poll_wq)) irq_work_queue(&hist_poll_work); } #define hist_poll_wait(file, wait) \ poll_wait(file, &hist_poll_wq, wait) #endif #define __TRACE_EVENT_FLAGS(name, value) \ static int __init trace_init_flags_##name(void) \ { \ event_##name.flags |= value; \ return 0; \ } \ early_initcall(trace_init_flags_##name); #define __TRACE_EVENT_PERF_PERM(name, expr...) \ static int perf_perm_##name(struct trace_event_call *tp_event, \ struct perf_event *p_event) \ { \ return ({ expr; }); \ } \ static int __init trace_init_perf_perm_##name(void) \ { \ event_##name.perf_perm = &perf_perm_##name; \ return 0; \ } \ early_initcall(trace_init_perf_perm_##name); #define PERF_MAX_TRACE_SIZE 8192 #define MAX_FILTER_STR_VAL 256U /* Should handle KSYM_SYMBOL_LEN */ enum event_trigger_type { ETT_NONE = (0), ETT_TRACE_ONOFF = (1 << 0), ETT_SNAPSHOT = (1 << 1), ETT_STACKTRACE = (1 << 2), ETT_EVENT_ENABLE = (1 << 3), ETT_EVENT_HIST = (1 << 4), ETT_HIST_ENABLE = (1 << 5), ETT_EVENT_EPROBE = (1 << 6), }; extern int filter_match_preds(struct event_filter *filter, void *rec); extern enum event_trigger_type event_triggers_call(struct trace_event_file *file, struct trace_buffer *buffer, void *rec, struct ring_buffer_event *event); extern void event_triggers_post_call(struct trace_event_file *file, enum event_trigger_type tt); bool trace_event_ignore_this_pid(struct trace_event_file *trace_file); bool __trace_trigger_soft_disabled(struct trace_event_file *file); /** * trace_trigger_soft_disabled - do triggers and test if soft disabled * @file: The file pointer of the event to test * * If any triggers without filters are attached to this event, they * will be called here. If the event is soft disabled and has no * triggers that require testing the fields, it will return true, * otherwise false. */ static __always_inline bool trace_trigger_soft_disabled(struct trace_event_file *file) { unsigned long eflags = file->flags; if (likely(!(eflags & (EVENT_FILE_FL_TRIGGER_MODE | EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_PID_FILTER)))) return false; if (likely(eflags & EVENT_FILE_FL_TRIGGER_COND)) return false; return __trace_trigger_soft_disabled(file); } #ifdef CONFIG_BPF_EVENTS unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie); void perf_event_detach_bpf_prog(struct perf_event *event); int perf_event_query_prog_array(struct perf_event *event, void __user *info); struct bpf_raw_tp_link; int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name); void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr, unsigned long *missed); int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { return 1; } static inline int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie) { return -EOPNOTSUPP; } static inline void perf_event_detach_bpf_prog(struct perf_event *event) { } static inline int perf_event_query_prog_array(struct perf_event *event, void __user *info) { return -EOPNOTSUPP; } struct bpf_raw_tp_link; static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link) { return -EOPNOTSUPP; } static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link) { return -EOPNOTSUPP; } static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name) { return NULL; } static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) { } static inline int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr, unsigned long *missed) { return -EOPNOTSUPP; } static inline int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } static inline int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } #endif enum { FILTER_OTHER = 0, FILTER_STATIC_STRING, FILTER_DYN_STRING, FILTER_RDYN_STRING, FILTER_PTR_STRING, FILTER_TRACE_FN, FILTER_CPUMASK, FILTER_COMM, FILTER_CPU, FILTER_STACKTRACE, }; extern int trace_event_raw_init(struct trace_event_call *call); extern int trace_define_field(struct trace_event_call *call, const char *type, const char *name, int offset, int size, int is_signed, int filter_type); extern int trace_add_event_call(struct trace_event_call *call); extern int trace_remove_event_call(struct trace_event_call *call); extern int trace_event_get_offsets(struct trace_event_call *call); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); int trace_array_set_clr_event(struct trace_array *tr, const char *system, const char *event, bool enable); #ifdef CONFIG_PERF_EVENTS struct perf_event; DECLARE_PER_CPU(struct pt_regs, perf_trace_regs); extern int perf_trace_init(struct perf_event *event); extern void perf_trace_destroy(struct perf_event *event); extern int perf_trace_add(struct perf_event *event, int flags); extern void perf_trace_del(struct perf_event *event, int flags); #ifdef CONFIG_KPROBE_EVENTS extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe); extern void perf_kprobe_destroy(struct perf_event *event); extern int bpf_get_kprobe_info(const struct perf_event *event, u32 *fd_type, const char **symbol, u64 *probe_offset, u64 *probe_addr, unsigned long *missed, bool perf_type_tracepoint); #endif #ifdef CONFIG_UPROBE_EVENTS extern int perf_uprobe_init(struct perf_event *event, unsigned long ref_ctr_offset, bool is_retprobe); extern void perf_uprobe_destroy(struct perf_event *event); extern int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type, const char **filename, u64 *probe_offset, u64 *probe_addr, bool perf_type_tracepoint); #endif extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); extern void ftrace_profile_free_filter(struct perf_event *event); void perf_trace_buf_update(void *record, u16 type); void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp); int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie); void perf_event_free_bpf_prog(struct perf_event *event); void bpf_trace_run1(struct bpf_raw_tp_link *link, u64 arg1); void bpf_trace_run2(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2); void bpf_trace_run3(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3); void bpf_trace_run4(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4); void bpf_trace_run5(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5); void bpf_trace_run6(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6); void bpf_trace_run7(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7); void bpf_trace_run8(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8); void bpf_trace_run9(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9); void bpf_trace_run10(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10); void bpf_trace_run11(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10, u64 arg11); void bpf_trace_run12(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2, u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7, u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12); void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx, struct trace_event_call *call, u64 count, struct pt_regs *regs, struct hlist_head *head, struct task_struct *task); static inline void perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type, u64 count, struct pt_regs *regs, void *head, struct task_struct *task) { perf_tp_event(type, count, raw_data, size, regs, head, rctx, task); } #endif #define TRACE_EVENT_STR_MAX 512 /* * gcc warns that you can not use a va_list in an inlined * function. But lets me make it into a macro :-/ */ #define __trace_event_vstr_len(fmt, va) \ ({ \ va_list __ap; \ int __ret; \ \ va_copy(__ap, *(va)); \ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ va_end(__ap); \ \ min(__ret, TRACE_EVENT_STR_MAX); \ }) #endif /* _LINUX_TRACE_EVENT_H */ /* * Note: we keep the TRACE_CUSTOM_EVENT outside the include file ifdef protection. * This is due to the way trace custom events work. If a file includes two * trace event headers under one "CREATE_CUSTOM_TRACE_EVENTS" the first include * will override the TRACE_CUSTOM_EVENT and break the second include. */ #ifndef TRACE_CUSTOM_EVENT #define DECLARE_CUSTOM_EVENT_CLASS(name, proto, args, tstruct, assign, print) #define DEFINE_CUSTOM_EVENT(template, name, proto, args) #define TRACE_CUSTOM_EVENT(name, proto, args, struct, assign, print) #endif /* ifdef TRACE_CUSTOM_EVENT (see note above) */ |
| 11 11 9 4 7 7 7 12 10 5 10 7 7 12 12 12 12 12 12 11 12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | // SPDX-License-Identifier: GPL-2.0-or-later /* * ALSA sequencer device management * Copyright (c) 1999 by Takashi Iwai <tiwai@suse.de> * *---------------------------------------------------------------- * * This device handler separates the card driver module from sequencer * stuff (sequencer core, synth drivers, etc), so that user can avoid * to spend unnecessary resources e.g. if he needs only listening to * MP3s. * * The card (or lowlevel) driver creates a sequencer device entry * via snd_seq_device_new(). This is an entry pointer to communicate * with the sequencer device "driver", which is involved with the * actual part to communicate with the sequencer core. * Each sequencer device entry has an id string and the corresponding * driver with the same id is loaded when required. For example, * lowlevel codes to access emu8000 chip on sbawe card are included in * emu8000-synth module. To activate this module, the hardware * resources like i/o port are passed via snd_seq_device argument. */ #include <linux/device.h> #include <linux/init.h> #include <linux/module.h> #include <sound/core.h> #include <sound/info.h> #include <sound/seq_device.h> #include <sound/seq_kernel.h> #include <sound/initval.h> #include <linux/kmod.h> #include <linux/slab.h> #include <linux/mutex.h> MODULE_AUTHOR("Takashi Iwai <tiwai@suse.de>"); MODULE_DESCRIPTION("ALSA sequencer device management"); MODULE_LICENSE("GPL"); /* * bus definition */ static int snd_seq_bus_match(struct device *dev, const struct device_driver *drv) { struct snd_seq_device *sdev = to_seq_dev(dev); const struct snd_seq_driver *sdrv = to_seq_drv(drv); return strcmp(sdrv->id, sdev->id) == 0 && sdrv->argsize == sdev->argsize; } static const struct bus_type snd_seq_bus_type = { .name = "snd_seq", .match = snd_seq_bus_match, }; /* * proc interface -- just for compatibility */ #ifdef CONFIG_SND_PROC_FS static struct snd_info_entry *info_entry; static int print_dev_info(struct device *dev, void *data) { struct snd_seq_device *sdev = to_seq_dev(dev); struct snd_info_buffer *buffer = data; snd_iprintf(buffer, "snd-%s,%s,%d\n", sdev->id, dev->driver ? "loaded" : "empty", dev->driver ? 1 : 0); return 0; } static void snd_seq_device_info(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { bus_for_each_dev(&snd_seq_bus_type, NULL, buffer, print_dev_info); } #endif /* * load all registered drivers (called from seq_clientmgr.c) */ #ifdef CONFIG_MODULES /* flag to block auto-loading */ static atomic_t snd_seq_in_init = ATOMIC_INIT(1); /* blocked as default */ static int request_seq_drv(struct device *dev, void *data) { struct snd_seq_device *sdev = to_seq_dev(dev); if (!dev->driver) request_module("snd-%s", sdev->id); return 0; } static void autoload_drivers(struct work_struct *work) { /* avoid reentrance */ if (atomic_inc_return(&snd_seq_in_init) == 1) bus_for_each_dev(&snd_seq_bus_type, NULL, NULL, request_seq_drv); atomic_dec(&snd_seq_in_init); } static DECLARE_WORK(autoload_work, autoload_drivers); static void queue_autoload_drivers(void) { schedule_work(&autoload_work); } void snd_seq_autoload_init(void) { atomic_dec(&snd_seq_in_init); #ifdef CONFIG_SND_SEQUENCER_MODULE /* initial autoload only when snd-seq is a module */ queue_autoload_drivers(); #endif } EXPORT_SYMBOL(snd_seq_autoload_init); void snd_seq_autoload_exit(void) { atomic_inc(&snd_seq_in_init); } EXPORT_SYMBOL(snd_seq_autoload_exit); void snd_seq_device_load_drivers(void) { queue_autoload_drivers(); flush_work(&autoload_work); } EXPORT_SYMBOL(snd_seq_device_load_drivers); static inline void cancel_autoload_drivers(void) { cancel_work_sync(&autoload_work); } #else static inline void queue_autoload_drivers(void) { } static inline void cancel_autoload_drivers(void) { } #endif /* * device management */ static int snd_seq_device_dev_free(struct snd_device *device) { struct snd_seq_device *dev = device->device_data; cancel_autoload_drivers(); if (dev->private_free) dev->private_free(dev); put_device(&dev->dev); return 0; } static int snd_seq_device_dev_register(struct snd_device *device) { struct snd_seq_device *dev = device->device_data; int err; err = device_add(&dev->dev); if (err < 0) return err; if (!dev->dev.driver) queue_autoload_drivers(); return 0; } static int snd_seq_device_dev_disconnect(struct snd_device *device) { struct snd_seq_device *dev = device->device_data; device_del(&dev->dev); return 0; } static void snd_seq_dev_release(struct device *dev) { kfree(to_seq_dev(dev)); } /* * register a sequencer device * card = card info * device = device number (if any) * id = id of driver * result = return pointer (NULL allowed if unnecessary) */ int snd_seq_device_new(struct snd_card *card, int device, const char *id, int argsize, struct snd_seq_device **result) { struct snd_seq_device *dev; int err; static const struct snd_device_ops dops = { .dev_free = snd_seq_device_dev_free, .dev_register = snd_seq_device_dev_register, .dev_disconnect = snd_seq_device_dev_disconnect, }; if (result) *result = NULL; if (snd_BUG_ON(!id)) return -EINVAL; dev = kzalloc(sizeof(*dev) + argsize, GFP_KERNEL); if (!dev) return -ENOMEM; /* set up device info */ dev->card = card; dev->device = device; dev->id = id; dev->argsize = argsize; device_initialize(&dev->dev); dev->dev.parent = &card->card_dev; dev->dev.bus = &snd_seq_bus_type; dev->dev.release = snd_seq_dev_release; dev_set_name(&dev->dev, "%s-%d-%d", dev->id, card->number, device); /* add this device to the list */ err = snd_device_new(card, SNDRV_DEV_SEQUENCER, dev, &dops); if (err < 0) { put_device(&dev->dev); return err; } if (result) *result = dev; return 0; } EXPORT_SYMBOL(snd_seq_device_new); /* * driver registration */ int __snd_seq_driver_register(struct snd_seq_driver *drv, struct module *mod) { if (WARN_ON(!drv->driver.name || !drv->id)) return -EINVAL; drv->driver.bus = &snd_seq_bus_type; drv->driver.owner = mod; return driver_register(&drv->driver); } EXPORT_SYMBOL_GPL(__snd_seq_driver_register); void snd_seq_driver_unregister(struct snd_seq_driver *drv) { driver_unregister(&drv->driver); } EXPORT_SYMBOL_GPL(snd_seq_driver_unregister); /* * module part */ static int __init seq_dev_proc_init(void) { #ifdef CONFIG_SND_PROC_FS info_entry = snd_info_create_module_entry(THIS_MODULE, "drivers", snd_seq_root); if (info_entry == NULL) return -ENOMEM; info_entry->content = SNDRV_INFO_CONTENT_TEXT; info_entry->c.text.read = snd_seq_device_info; if (snd_info_register(info_entry) < 0) { snd_info_free_entry(info_entry); return -ENOMEM; } #endif return 0; } static int __init alsa_seq_device_init(void) { int err; err = bus_register(&snd_seq_bus_type); if (err < 0) return err; err = seq_dev_proc_init(); if (err < 0) bus_unregister(&snd_seq_bus_type); return err; } static void __exit alsa_seq_device_exit(void) { #ifdef CONFIG_MODULES cancel_work_sync(&autoload_work); #endif #ifdef CONFIG_SND_PROC_FS snd_info_free_entry(info_entry); #endif bus_unregister(&snd_seq_bus_type); } subsys_initcall(alsa_seq_device_init) module_exit(alsa_seq_device_exit) |
| 52 52 52 52 52 52 29 52 52 52 51 52 29 27 52 51 51 52 52 52 51 52 52 52 51 52 51 52 51 51 5 52 52 51 52 4 38 38 38 38 38 36 33 27 29 29 29 29 26 27 27 3 27 27 38 25 37 24 13 13 13 3 38 38 38 38 38 38 38 38 38 38 38 29 38 38 38 38 38 38 38 38 38 38 1 1 1 45 43 45 45 44 45 38 38 38 38 38 36 36 47 47 47 47 47 124 125 125 125 125 13 1 12 13 13 13 7 13 13 13 127 127 83 84 84 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Initialization routines * Copyright (c) by Jaroslav Kysela <perex@perex.cz> */ #include <linux/init.h> #include <linux/sched.h> #include <linux/module.h> #include <linux/device.h> #include <linux/file.h> #include <linux/slab.h> #include <linux/time.h> #include <linux/ctype.h> #include <linux/pm.h> #include <linux/debugfs.h> #include <linux/completion.h> #include <linux/interrupt.h> #include <sound/core.h> #include <sound/control.h> #include <sound/info.h> /* monitor files for graceful shutdown (hotplug) */ struct snd_monitor_file { struct file *file; const struct file_operations *disconnected_f_op; struct list_head shutdown_list; /* still need to shutdown */ struct list_head list; /* link of monitor files */ }; static DEFINE_SPINLOCK(shutdown_lock); static LIST_HEAD(shutdown_files); static const struct file_operations snd_shutdown_f_ops; /* locked for registering/using */ static DECLARE_BITMAP(snd_cards_lock, SNDRV_CARDS); static struct snd_card *snd_cards[SNDRV_CARDS]; static DEFINE_MUTEX(snd_card_mutex); static char *slots[SNDRV_CARDS]; module_param_array(slots, charp, NULL, 0444); MODULE_PARM_DESC(slots, "Module names assigned to the slots."); /* return non-zero if the given index is reserved for the given * module via slots option */ static int module_slot_match(struct module *module, int idx) { int match = 1; #ifdef CONFIG_MODULES const char *s1, *s2; if (!module || !*module->name || !slots[idx]) return 0; s1 = module->name; s2 = slots[idx]; if (*s2 == '!') { match = 0; /* negative match */ s2++; } /* compare module name strings * hyphens are handled as equivalent with underscore */ for (;;) { char c1 = *s1++; char c2 = *s2++; if (c1 == '-') c1 = '_'; if (c2 == '-') c2 = '_'; if (c1 != c2) return !match; if (!c1) break; } #endif /* CONFIG_MODULES */ return match; } #if IS_ENABLED(CONFIG_SND_MIXER_OSS) int (*snd_mixer_oss_notify_callback)(struct snd_card *card, int free_flag); EXPORT_SYMBOL(snd_mixer_oss_notify_callback); #endif static int check_empty_slot(struct module *module, int slot) { return !slots[slot] || !*slots[slot]; } /* return an empty slot number (>= 0) found in the given bitmask @mask. * @mask == -1 == 0xffffffff means: take any free slot up to 32 * when no slot is available, return the original @mask as is. */ static int get_slot_from_bitmask(int mask, int (*check)(struct module *, int), struct module *module) { int slot; for (slot = 0; slot < SNDRV_CARDS; slot++) { if (slot < 32 && !(mask & (1U << slot))) continue; if (!test_bit(slot, snd_cards_lock)) { if (check(module, slot)) return slot; /* found */ } } return mask; /* unchanged */ } /* the default release callback set in snd_device_alloc() */ static void default_release_alloc(struct device *dev) { kfree(dev); } /** * snd_device_alloc - Allocate and initialize struct device for sound devices * @dev_p: pointer to store the allocated device * @card: card to assign, optional * * For releasing the allocated device, call put_device(). */ int snd_device_alloc(struct device **dev_p, struct snd_card *card) { struct device *dev; *dev_p = NULL; dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return -ENOMEM; device_initialize(dev); if (card) dev->parent = &card->card_dev; dev->class = &sound_class; dev->release = default_release_alloc; *dev_p = dev; return 0; } EXPORT_SYMBOL_GPL(snd_device_alloc); static int snd_card_init(struct snd_card *card, struct device *parent, int idx, const char *xid, struct module *module, size_t extra_size); static int snd_card_do_free(struct snd_card *card); static const struct attribute_group card_dev_attr_group; static void release_card_device(struct device *dev) { snd_card_do_free(dev_to_snd_card(dev)); } /** * snd_card_new - create and initialize a soundcard structure * @parent: the parent device object * @idx: card index (address) [0 ... (SNDRV_CARDS-1)] * @xid: card identification (ASCII string) * @module: top level module for locking * @extra_size: allocate this extra size after the main soundcard structure * @card_ret: the pointer to store the created card instance * * The function allocates snd_card instance via kzalloc with the given * space for the driver to use freely. The allocated struct is stored * in the given card_ret pointer. * * Return: Zero if successful or a negative error code. */ int snd_card_new(struct device *parent, int idx, const char *xid, struct module *module, int extra_size, struct snd_card **card_ret) { struct snd_card *card; int err; if (snd_BUG_ON(!card_ret)) return -EINVAL; *card_ret = NULL; if (extra_size < 0) extra_size = 0; card = kzalloc(sizeof(*card) + extra_size, GFP_KERNEL); if (!card) return -ENOMEM; err = snd_card_init(card, parent, idx, xid, module, extra_size); if (err < 0) return err; /* card is freed by error handler */ *card_ret = card; return 0; } EXPORT_SYMBOL(snd_card_new); static void __snd_card_release(struct device *dev, void *data) { snd_card_free(data); } /** * snd_devm_card_new - managed snd_card object creation * @parent: the parent device object * @idx: card index (address) [0 ... (SNDRV_CARDS-1)] * @xid: card identification (ASCII string) * @module: top level module for locking * @extra_size: allocate this extra size after the main soundcard structure * @card_ret: the pointer to store the created card instance * * This function works like snd_card_new() but manages the allocated resource * via devres, i.e. you don't need to free explicitly. * * When a snd_card object is created with this function and registered via * snd_card_register(), the very first devres action to call snd_card_free() * is added automatically. In that way, the resource disconnection is assured * at first, then released in the expected order. * * If an error happens at the probe before snd_card_register() is called and * there have been other devres resources, you'd need to free the card manually * via snd_card_free() call in the error; otherwise it may lead to UAF due to * devres call orders. You can use snd_card_free_on_error() helper for * handling it more easily. * * Return: zero if successful, or a negative error code */ int snd_devm_card_new(struct device *parent, int idx, const char *xid, struct module *module, size_t extra_size, struct snd_card **card_ret) { struct snd_card *card; int err; *card_ret = NULL; card = devres_alloc(__snd_card_release, sizeof(*card) + extra_size, GFP_KERNEL); if (!card) return -ENOMEM; card->managed = true; err = snd_card_init(card, parent, idx, xid, module, extra_size); if (err < 0) { devres_free(card); /* in managed mode, we need to free manually */ return err; } devres_add(parent, card); *card_ret = card; return 0; } EXPORT_SYMBOL_GPL(snd_devm_card_new); /** * snd_card_free_on_error - a small helper for handling devm probe errors * @dev: the managed device object * @ret: the return code from the probe callback * * This function handles the explicit snd_card_free() call at the error from * the probe callback. It's just a small helper for simplifying the error * handling with the managed devices. * * Return: zero if successful, or a negative error code */ int snd_card_free_on_error(struct device *dev, int ret) { struct snd_card *card; if (!ret) return 0; card = devres_find(dev, __snd_card_release, NULL, NULL); if (card) snd_card_free(card); return ret; } EXPORT_SYMBOL_GPL(snd_card_free_on_error); static int snd_card_init(struct snd_card *card, struct device *parent, int idx, const char *xid, struct module *module, size_t extra_size) { int err; if (extra_size > 0) card->private_data = (char *)card + sizeof(struct snd_card); if (xid) strscpy(card->id, xid, sizeof(card->id)); err = 0; scoped_guard(mutex, &snd_card_mutex) { if (idx < 0) /* first check the matching module-name slot */ idx = get_slot_from_bitmask(idx, module_slot_match, module); if (idx < 0) /* if not matched, assign an empty slot */ idx = get_slot_from_bitmask(idx, check_empty_slot, module); if (idx < 0) err = -ENODEV; else if (idx < snd_ecards_limit) { if (test_bit(idx, snd_cards_lock)) err = -EBUSY; /* invalid */ } else if (idx >= SNDRV_CARDS) err = -ENODEV; if (!err) { set_bit(idx, snd_cards_lock); /* lock it */ if (idx >= snd_ecards_limit) snd_ecards_limit = idx + 1; /* increase the limit */ } } if (err < 0) { dev_err(parent, "cannot find the slot for index %d (range 0-%i), error: %d\n", idx, snd_ecards_limit - 1, err); if (!card->managed) kfree(card); /* manually free here, as no destructor called */ return err; } card->dev = parent; card->number = idx; WARN_ON(IS_MODULE(CONFIG_SND) && !module); card->module = module; INIT_LIST_HEAD(&card->devices); init_rwsem(&card->controls_rwsem); rwlock_init(&card->controls_rwlock); INIT_LIST_HEAD(&card->controls); INIT_LIST_HEAD(&card->ctl_files); #ifdef CONFIG_SND_CTL_FAST_LOOKUP xa_init(&card->ctl_numids); xa_init(&card->ctl_hash); #endif spin_lock_init(&card->files_lock); INIT_LIST_HEAD(&card->files_list); mutex_init(&card->memory_mutex); #ifdef CONFIG_PM init_waitqueue_head(&card->power_sleep); init_waitqueue_head(&card->power_ref_sleep); atomic_set(&card->power_ref, 0); #endif init_waitqueue_head(&card->remove_sleep); card->sync_irq = -1; device_initialize(&card->card_dev); card->card_dev.parent = parent; card->card_dev.class = &sound_class; card->card_dev.release = release_card_device; card->card_dev.groups = card->dev_groups; card->dev_groups[0] = &card_dev_attr_group; err = kobject_set_name(&card->card_dev.kobj, "card%d", idx); if (err < 0) goto __error; snprintf(card->irq_descr, sizeof(card->irq_descr), "%s:%s", dev_driver_string(card->dev), dev_name(&card->card_dev)); /* the control interface cannot be accessed from the user space until */ /* snd_cards_bitmask and snd_cards are set with snd_card_register */ err = snd_ctl_create(card); if (err < 0) { dev_err(parent, "unable to register control minors\n"); goto __error; } err = snd_info_card_create(card); if (err < 0) { dev_err(parent, "unable to create card info\n"); goto __error_ctl; } #ifdef CONFIG_SND_DEBUG card->debugfs_root = debugfs_create_dir(dev_name(&card->card_dev), sound_debugfs_root); #endif return 0; __error_ctl: snd_device_free_all(card); __error: put_device(&card->card_dev); return err; } /** * snd_card_ref - Get the card object from the index * @idx: the card index * * Returns a card object corresponding to the given index or NULL if not found. * Release the object via snd_card_unref(). * * Return: a card object or NULL */ struct snd_card *snd_card_ref(int idx) { struct snd_card *card; guard(mutex)(&snd_card_mutex); card = snd_cards[idx]; if (card) get_device(&card->card_dev); return card; } EXPORT_SYMBOL_GPL(snd_card_ref); /* return non-zero if a card is already locked */ int snd_card_locked(int card) { guard(mutex)(&snd_card_mutex); return test_bit(card, snd_cards_lock); } static loff_t snd_disconnect_llseek(struct file *file, loff_t offset, int orig) { return -ENODEV; } static ssize_t snd_disconnect_read(struct file *file, char __user *buf, size_t count, loff_t *offset) { return -ENODEV; } static ssize_t snd_disconnect_write(struct file *file, const char __user *buf, size_t count, loff_t *offset) { return -ENODEV; } static int snd_disconnect_release(struct inode *inode, struct file *file) { struct snd_monitor_file *df = NULL, *_df; scoped_guard(spinlock, &shutdown_lock) { list_for_each_entry(_df, &shutdown_files, shutdown_list) { if (_df->file == file) { df = _df; list_del_init(&df->shutdown_list); break; } } } if (likely(df)) { if ((file->f_flags & FASYNC) && df->disconnected_f_op->fasync) df->disconnected_f_op->fasync(-1, file, 0); return df->disconnected_f_op->release(inode, file); } panic("%s(%p, %p) failed!", __func__, inode, file); } static __poll_t snd_disconnect_poll(struct file * file, poll_table * wait) { return EPOLLERR | EPOLLNVAL; } static long snd_disconnect_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { return -ENODEV; } static int snd_disconnect_mmap(struct file *file, struct vm_area_struct *vma) { return -ENODEV; } static int snd_disconnect_fasync(int fd, struct file *file, int on) { return -ENODEV; } static const struct file_operations snd_shutdown_f_ops = { .owner = THIS_MODULE, .llseek = snd_disconnect_llseek, .read = snd_disconnect_read, .write = snd_disconnect_write, .release = snd_disconnect_release, .poll = snd_disconnect_poll, .unlocked_ioctl = snd_disconnect_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = snd_disconnect_ioctl, #endif .mmap = snd_disconnect_mmap, .fasync = snd_disconnect_fasync }; /** * snd_card_disconnect - disconnect all APIs from the file-operations (user space) * @card: soundcard structure * * Disconnects all APIs from the file-operations (user space). * * Return: Zero, otherwise a negative error code. * * Note: The current implementation replaces all active file->f_op with special * dummy file operations (they do nothing except release). */ void snd_card_disconnect(struct snd_card *card) { struct snd_monitor_file *mfile; if (!card) return; scoped_guard(spinlock, &card->files_lock) { if (card->shutdown) return; card->shutdown = 1; /* replace file->f_op with special dummy operations */ list_for_each_entry(mfile, &card->files_list, list) { /* it's critical part, use endless loop */ /* we have no room to fail */ mfile->disconnected_f_op = mfile->file->f_op; scoped_guard(spinlock, &shutdown_lock) list_add(&mfile->shutdown_list, &shutdown_files); mfile->file->f_op = &snd_shutdown_f_ops; fops_get(mfile->file->f_op); } } #ifdef CONFIG_PM /* wake up sleepers here before other callbacks for avoiding potential * deadlocks with other locks (e.g. in kctls); * then this notifies the shutdown and sleepers would abort immediately */ wake_up_all(&card->power_sleep); #endif /* notify all connected devices about disconnection */ /* at this point, they cannot respond to any calls except release() */ #if IS_ENABLED(CONFIG_SND_MIXER_OSS) if (snd_mixer_oss_notify_callback) snd_mixer_oss_notify_callback(card, SND_MIXER_OSS_NOTIFY_DISCONNECT); #endif /* notify all devices that we are disconnected */ snd_device_disconnect_all(card); if (card->sync_irq > 0) synchronize_irq(card->sync_irq); snd_info_card_disconnect(card); #ifdef CONFIG_SND_DEBUG debugfs_remove(card->debugfs_root); card->debugfs_root = NULL; #endif if (card->registered) { device_del(&card->card_dev); card->registered = false; } /* disable fops (user space) operations for ALSA API */ scoped_guard(mutex, &snd_card_mutex) { snd_cards[card->number] = NULL; clear_bit(card->number, snd_cards_lock); } snd_power_sync_ref(card); } EXPORT_SYMBOL(snd_card_disconnect); /** * snd_card_disconnect_sync - disconnect card and wait until files get closed * @card: card object to disconnect * * This calls snd_card_disconnect() for disconnecting all belonging components * and waits until all pending files get closed. * It assures that all accesses from user-space finished so that the driver * can release its resources gracefully. */ void snd_card_disconnect_sync(struct snd_card *card) { snd_card_disconnect(card); guard(spinlock_irq)(&card->files_lock); wait_event_lock_irq(card->remove_sleep, list_empty(&card->files_list), card->files_lock); } EXPORT_SYMBOL_GPL(snd_card_disconnect_sync); static int snd_card_do_free(struct snd_card *card) { card->releasing = true; #if IS_ENABLED(CONFIG_SND_MIXER_OSS) if (snd_mixer_oss_notify_callback) snd_mixer_oss_notify_callback(card, SND_MIXER_OSS_NOTIFY_FREE); #endif snd_device_free_all(card); if (card->private_free) card->private_free(card); if (snd_info_card_free(card) < 0) { dev_warn(card->dev, "unable to free card info\n"); /* Not fatal error */ } if (card->release_completion) complete(card->release_completion); if (!card->managed) kfree(card); return 0; } /** * snd_card_free_when_closed - Disconnect the card, free it later eventually * @card: soundcard structure * * Unlike snd_card_free(), this function doesn't try to release the card * resource immediately, but tries to disconnect at first. When the card * is still in use, the function returns before freeing the resources. * The card resources will be freed when the refcount gets to zero. * * Return: zero if successful, or a negative error code */ void snd_card_free_when_closed(struct snd_card *card) { if (!card) return; snd_card_disconnect(card); put_device(&card->card_dev); return; } EXPORT_SYMBOL(snd_card_free_when_closed); /** * snd_card_free - frees given soundcard structure * @card: soundcard structure * * This function releases the soundcard structure and the all assigned * devices automatically. That is, you don't have to release the devices * by yourself. * * This function waits until the all resources are properly released. * * Return: Zero. Frees all associated devices and frees the control * interface associated to given soundcard. */ void snd_card_free(struct snd_card *card) { DECLARE_COMPLETION_ONSTACK(released); /* The call of snd_card_free() is allowed from various code paths; * a manual call from the driver and the call via devres_free, and * we need to avoid double-free. Moreover, the release via devres * may call snd_card_free() twice due to its nature, we need to have * the check here at the beginning. */ if (card->releasing) return; card->release_completion = &released; snd_card_free_when_closed(card); /* wait, until all devices are ready for the free operation */ wait_for_completion(&released); } EXPORT_SYMBOL(snd_card_free); /* check, if the character is in the valid ASCII range */ static inline bool safe_ascii_char(char c) { return isascii(c) && isalnum(c); } /* retrieve the last word of shortname or longname */ static const char *retrieve_id_from_card_name(const char *name) { const char *spos = name; while (*name) { if (isspace(*name) && safe_ascii_char(name[1])) spos = name + 1; name++; } return spos; } /* return true if the given id string doesn't conflict any other card ids */ static bool card_id_ok(struct snd_card *card, const char *id) { int i; if (!snd_info_check_reserved_words(id)) return false; for (i = 0; i < snd_ecards_limit; i++) { if (snd_cards[i] && snd_cards[i] != card && !strcmp(snd_cards[i]->id, id)) return false; } return true; } /* copy to card->id only with valid letters from nid */ static void copy_valid_id_string(struct snd_card *card, const char *src, const char *nid) { char *id = card->id; while (*nid && !safe_ascii_char(*nid)) nid++; if (isdigit(*nid)) *id++ = isalpha(*src) ? *src : 'D'; while (*nid && (size_t)(id - card->id) < sizeof(card->id) - 1) { if (safe_ascii_char(*nid)) *id++ = *nid; nid++; } *id = 0; } /* Set card->id from the given string * If the string conflicts with other ids, add a suffix to make it unique. */ static void snd_card_set_id_no_lock(struct snd_card *card, const char *src, const char *nid) { int len, loops; bool is_default = false; char *id; copy_valid_id_string(card, src, nid); id = card->id; again: /* use "Default" for obviously invalid strings * ("card" conflicts with proc directories) */ if (!*id || !strncmp(id, "card", 4)) { strcpy(id, "Default"); is_default = true; } len = strlen(id); for (loops = 0; loops < SNDRV_CARDS; loops++) { char *spos; char sfxstr[5]; /* "_012" */ int sfxlen; if (card_id_ok(card, id)) return; /* OK */ /* Add _XYZ suffix */ sprintf(sfxstr, "_%X", loops + 1); sfxlen = strlen(sfxstr); if (len + sfxlen >= sizeof(card->id)) spos = id + sizeof(card->id) - sfxlen - 1; else spos = id + len; strcpy(spos, sfxstr); } /* fallback to the default id */ if (!is_default) { *id = 0; goto again; } /* last resort... */ dev_err(card->dev, "unable to set card id (%s)\n", id); if (card->proc_root->name) strscpy(card->id, card->proc_root->name, sizeof(card->id)); } /** * snd_card_set_id - set card identification name * @card: soundcard structure * @nid: new identification string * * This function sets the card identification and checks for name * collisions. */ void snd_card_set_id(struct snd_card *card, const char *nid) { /* check if user specified own card->id */ if (card->id[0] != '\0') return; guard(mutex)(&snd_card_mutex); snd_card_set_id_no_lock(card, nid, nid); } EXPORT_SYMBOL(snd_card_set_id); static ssize_t id_show(struct device *dev, struct device_attribute *attr, char *buf) { struct snd_card *card = container_of(dev, struct snd_card, card_dev); return sysfs_emit(buf, "%s\n", card->id); } static ssize_t id_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct snd_card *card = container_of(dev, struct snd_card, card_dev); char buf1[sizeof(card->id)]; size_t copy = count > sizeof(card->id) - 1 ? sizeof(card->id) - 1 : count; size_t idx; int c; for (idx = 0; idx < copy; idx++) { c = buf[idx]; if (!safe_ascii_char(c) && c != '_' && c != '-') return -EINVAL; } memcpy(buf1, buf, copy); buf1[copy] = '\0'; guard(mutex)(&snd_card_mutex); if (!card_id_ok(NULL, buf1)) return -EEXIST; strcpy(card->id, buf1); snd_info_card_id_change(card); return count; } static DEVICE_ATTR_RW(id); static ssize_t number_show(struct device *dev, struct device_attribute *attr, char *buf) { struct snd_card *card = container_of(dev, struct snd_card, card_dev); return sysfs_emit(buf, "%i\n", card->number); } static DEVICE_ATTR_RO(number); static struct attribute *card_dev_attrs[] = { &dev_attr_id.attr, &dev_attr_number.attr, NULL }; static const struct attribute_group card_dev_attr_group = { .attrs = card_dev_attrs, }; /** * snd_card_add_dev_attr - Append a new sysfs attribute group to card * @card: card instance * @group: attribute group to append * * Return: zero if successful, or a negative error code */ int snd_card_add_dev_attr(struct snd_card *card, const struct attribute_group *group) { int i; /* loop for (arraysize-1) here to keep NULL at the last entry */ for (i = 0; i < ARRAY_SIZE(card->dev_groups) - 1; i++) { if (!card->dev_groups[i]) { card->dev_groups[i] = group; return 0; } } dev_err(card->dev, "Too many groups assigned\n"); return -ENOSPC; } EXPORT_SYMBOL_GPL(snd_card_add_dev_attr); static void trigger_card_free(void *data) { snd_card_free(data); } /** * snd_card_register - register the soundcard * @card: soundcard structure * * This function registers all the devices assigned to the soundcard. * Until calling this, the ALSA control interface is blocked from the * external accesses. Thus, you should call this function at the end * of the initialization of the card. * * Return: Zero otherwise a negative error code if the registration failed. */ int snd_card_register(struct snd_card *card) { int err; if (snd_BUG_ON(!card)) return -EINVAL; if (!card->registered) { err = device_add(&card->card_dev); if (err < 0) return err; card->registered = true; } else { if (card->managed) devm_remove_action(card->dev, trigger_card_free, card); } if (card->managed) { err = devm_add_action(card->dev, trigger_card_free, card); if (err < 0) return err; } err = snd_device_register_all(card); if (err < 0) return err; scoped_guard(mutex, &snd_card_mutex) { if (snd_cards[card->number]) { /* already registered */ return snd_info_card_register(card); /* register pending info */ } if (*card->id) { /* make a unique id name from the given string */ char tmpid[sizeof(card->id)]; memcpy(tmpid, card->id, sizeof(card->id)); snd_card_set_id_no_lock(card, tmpid, tmpid); } else { /* create an id from either shortname or longname */ const char *src; src = *card->shortname ? card->shortname : card->longname; snd_card_set_id_no_lock(card, src, retrieve_id_from_card_name(src)); } snd_cards[card->number] = card; } err = snd_info_card_register(card); if (err < 0) return err; #if IS_ENABLED(CONFIG_SND_MIXER_OSS) if (snd_mixer_oss_notify_callback) snd_mixer_oss_notify_callback(card, SND_MIXER_OSS_NOTIFY_REGISTER); #endif return 0; } EXPORT_SYMBOL(snd_card_register); #ifdef CONFIG_SND_PROC_FS static void snd_card_info_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { int idx, count; struct snd_card *card; for (idx = count = 0; idx < SNDRV_CARDS; idx++) { guard(mutex)(&snd_card_mutex); card = snd_cards[idx]; if (card) { count++; snd_iprintf(buffer, "%2i [%-15s]: %s - %s\n", idx, card->id, card->driver, card->shortname); snd_iprintf(buffer, " %s\n", card->longname); } } if (!count) snd_iprintf(buffer, "--- no soundcards ---\n"); } #ifdef CONFIG_SND_OSSEMUL void snd_card_info_read_oss(struct snd_info_buffer *buffer) { int idx, count; struct snd_card *card; for (idx = count = 0; idx < SNDRV_CARDS; idx++) { guard(mutex)(&snd_card_mutex); card = snd_cards[idx]; if (card) { count++; snd_iprintf(buffer, "%s\n", card->longname); } } if (!count) { snd_iprintf(buffer, "--- no soundcards ---\n"); } } #endif #ifdef CONFIG_MODULES static void snd_card_module_info_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { int idx; struct snd_card *card; for (idx = 0; idx < SNDRV_CARDS; idx++) { guard(mutex)(&snd_card_mutex); card = snd_cards[idx]; if (card) snd_iprintf(buffer, "%2i %s\n", idx, card->module->name); } } #endif int __init snd_card_info_init(void) { struct snd_info_entry *entry; entry = snd_info_create_module_entry(THIS_MODULE, "cards", NULL); if (! entry) return -ENOMEM; entry->c.text.read = snd_card_info_read; if (snd_info_register(entry) < 0) return -ENOMEM; /* freed in error path */ #ifdef CONFIG_MODULES entry = snd_info_create_module_entry(THIS_MODULE, "modules", NULL); if (!entry) return -ENOMEM; entry->c.text.read = snd_card_module_info_read; if (snd_info_register(entry) < 0) return -ENOMEM; /* freed in error path */ #endif return 0; } #endif /* CONFIG_SND_PROC_FS */ /** * snd_component_add - add a component string * @card: soundcard structure * @component: the component id string * * This function adds the component id string to the supported list. * The component can be referred from the alsa-lib. * * Return: Zero otherwise a negative error code. */ int snd_component_add(struct snd_card *card, const char *component) { char *ptr; int len = strlen(component); ptr = strstr(card->components, component); if (ptr != NULL) { if (ptr[len] == '\0' || ptr[len] == ' ') /* already there */ return 1; } if (strlen(card->components) + 1 + len + 1 > sizeof(card->components)) { snd_BUG(); return -ENOMEM; } if (card->components[0] != '\0') strcat(card->components, " "); strcat(card->components, component); return 0; } EXPORT_SYMBOL(snd_component_add); /** * snd_card_file_add - add the file to the file list of the card * @card: soundcard structure * @file: file pointer * * This function adds the file to the file linked-list of the card. * This linked-list is used to keep tracking the connection state, * and to avoid the release of busy resources by hotplug. * * Return: zero or a negative error code. */ int snd_card_file_add(struct snd_card *card, struct file *file) { struct snd_monitor_file *mfile; mfile = kmalloc(sizeof(*mfile), GFP_KERNEL); if (mfile == NULL) return -ENOMEM; mfile->file = file; mfile->disconnected_f_op = NULL; INIT_LIST_HEAD(&mfile->shutdown_list); guard(spinlock)(&card->files_lock); if (card->shutdown) { kfree(mfile); return -ENODEV; } list_add(&mfile->list, &card->files_list); get_device(&card->card_dev); return 0; } EXPORT_SYMBOL(snd_card_file_add); /** * snd_card_file_remove - remove the file from the file list * @card: soundcard structure * @file: file pointer * * This function removes the file formerly added to the card via * snd_card_file_add() function. * If all files are removed and snd_card_free_when_closed() was * called beforehand, it processes the pending release of * resources. * * Return: Zero or a negative error code. */ int snd_card_file_remove(struct snd_card *card, struct file *file) { struct snd_monitor_file *mfile, *found = NULL; scoped_guard(spinlock, &card->files_lock) { list_for_each_entry(mfile, &card->files_list, list) { if (mfile->file == file) { list_del(&mfile->list); scoped_guard(spinlock, &shutdown_lock) list_del(&mfile->shutdown_list); if (mfile->disconnected_f_op) fops_put(mfile->disconnected_f_op); found = mfile; break; } } if (list_empty(&card->files_list)) wake_up_all(&card->remove_sleep); } if (!found) { dev_err(card->dev, "card file remove problem (%p)\n", file); return -ENOENT; } kfree(found); put_device(&card->card_dev); return 0; } EXPORT_SYMBOL(snd_card_file_remove); #ifdef CONFIG_PM /** * snd_power_ref_and_wait - wait until the card gets powered up * @card: soundcard structure * * Take the power_ref reference count of the given card, and * wait until the card gets powered up to SNDRV_CTL_POWER_D0 state. * The refcount is down again while sleeping until power-up, hence this * function can be used for syncing the floating control ops accesses, * typically around calling control ops. * * The caller needs to pull down the refcount via snd_power_unref() later * no matter whether the error is returned from this function or not. * * Return: Zero if successful, or a negative error code. */ int snd_power_ref_and_wait(struct snd_card *card) { snd_power_ref(card); if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0) return 0; wait_event_cmd(card->power_sleep, card->shutdown || snd_power_get_state(card) == SNDRV_CTL_POWER_D0, snd_power_unref(card), snd_power_ref(card)); return card->shutdown ? -ENODEV : 0; } EXPORT_SYMBOL_GPL(snd_power_ref_and_wait); /** * snd_power_wait - wait until the card gets powered up (old form) * @card: soundcard structure * * Wait until the card gets powered up to SNDRV_CTL_POWER_D0 state. * * Return: Zero if successful, or a negative error code. */ int snd_power_wait(struct snd_card *card) { int ret; ret = snd_power_ref_and_wait(card); snd_power_unref(card); return ret; } EXPORT_SYMBOL(snd_power_wait); #endif /* CONFIG_PM */ |
| 9 8642 5236 8665 12 1613 1 7008 592 1121 101 101 272 15612 14527 15597 14517 650 650 3805 1961 223 164 11 42 34 97 103 2970 2669 2977 36 2974 2 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_SEQLOCK_H #define __LINUX_SEQLOCK_H /* * seqcount_t / seqlock_t - a reader-writer consistency mechanism with * lockless readers (read-only retry loops), and no writer starvation. * * See Documentation/locking/seqlock.rst * * Copyrights: * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli * - Sequence counters with associated locks, (C) 2020 Linutronix GmbH */ #include <linux/compiler.h> #include <linux/kcsan-checks.h> #include <linux/lockdep.h> #include <linux/mutex.h> #include <linux/preempt.h> #include <linux/seqlock_types.h> #include <linux/spinlock.h> #include <asm/processor.h> /* * The seqlock seqcount_t interface does not prescribe a precise sequence of * read begin/retry/end. For readers, typically there is a call to * read_seqcount_begin() and read_seqcount_retry(), however, there are more * esoteric cases which do not follow this pattern. * * As a consequence, we take the following best-effort approach for raw usage * via seqcount_t under KCSAN: upon beginning a seq-reader critical section, * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as * atomics; if there is a matching read_seqcount_retry() call, no following * memory operations are considered atomic. Usage of the seqlock_t interface * is not affected. */ #define KCSAN_SEQLOCK_REGION_MAX 1000 static inline void __seqcount_init(seqcount_t *s, const char *name, struct lock_class_key *key) { /* * Make sure we are not reinitializing a held lock: */ lockdep_init_map(&s->dep_map, name, key, 0); s->sequence = 0; } #ifdef CONFIG_DEBUG_LOCK_ALLOC # define SEQCOUNT_DEP_MAP_INIT(lockname) \ .dep_map = { .name = #lockname } /** * seqcount_init() - runtime initializer for seqcount_t * @s: Pointer to the seqcount_t instance */ # define seqcount_init(s) \ do { \ static struct lock_class_key __key; \ __seqcount_init((s), #s, &__key); \ } while (0) static inline void seqcount_lockdep_reader_access(const seqcount_t *s) { seqcount_t *l = (seqcount_t *)s; unsigned long flags; local_irq_save(flags); seqcount_acquire_read(&l->dep_map, 0, 0, _RET_IP_); seqcount_release(&l->dep_map, _RET_IP_); local_irq_restore(flags); } #else # define SEQCOUNT_DEP_MAP_INIT(lockname) # define seqcount_init(s) __seqcount_init(s, NULL, NULL) # define seqcount_lockdep_reader_access(x) #endif /** * SEQCNT_ZERO() - static initializer for seqcount_t * @name: Name of the seqcount_t instance */ #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) } /* * Sequence counters with associated locks (seqcount_LOCKNAME_t) * * A sequence counter which associates the lock used for writer * serialization at initialization time. This enables lockdep to validate * that the write side critical section is properly serialized. * * For associated locks which do not implicitly disable preemption, * preemption protection is enforced in the write side function. * * Lockdep is never used in any for the raw write variants. * * See Documentation/locking/seqlock.rst */ /* * typedef seqcount_LOCKNAME_t - sequence counter with LOCKNAME associated * @seqcount: The real sequence counter * @lock: Pointer to the associated lock * * A plain sequence counter with external writer synchronization by * LOCKNAME @lock. The lock is associated to the sequence counter in the * static initializer or init function. This enables lockdep to validate * that the write side critical section is properly serialized. * * LOCKNAME: raw_spinlock, spinlock, rwlock or mutex */ /* * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t * @s: Pointer to the seqcount_LOCKNAME_t instance * @lock: Pointer to the associated lock */ #define seqcount_LOCKNAME_init(s, _lock, lockname) \ do { \ seqcount_##lockname##_t *____s = (s); \ seqcount_init(&____s->seqcount); \ __SEQ_LOCK(____s->lock = (_lock)); \ } while (0) #define seqcount_raw_spinlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, raw_spinlock) #define seqcount_spinlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, spinlock) #define seqcount_rwlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, rwlock) #define seqcount_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, mutex) /* * SEQCOUNT_LOCKNAME() - Instantiate seqcount_LOCKNAME_t and helpers * seqprop_LOCKNAME_*() - Property accessors for seqcount_LOCKNAME_t * * @lockname: "LOCKNAME" part of seqcount_LOCKNAME_t * @locktype: LOCKNAME canonical C data type * @preemptible: preemptibility of above locktype * @lockbase: prefix for associated lock/unlock */ #define SEQCOUNT_LOCKNAME(lockname, locktype, preemptible, lockbase) \ static __always_inline seqcount_t * \ __seqprop_##lockname##_ptr(seqcount_##lockname##_t *s) \ { \ return &s->seqcount; \ } \ \ static __always_inline const seqcount_t * \ __seqprop_##lockname##_const_ptr(const seqcount_##lockname##_t *s) \ { \ return &s->seqcount; \ } \ \ static __always_inline unsigned \ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \ { \ unsigned seq = smp_load_acquire(&s->seqcount.sequence); \ \ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ return seq; \ \ if (preemptible && unlikely(seq & 1)) { \ __SEQ_LOCK(lockbase##_lock(s->lock)); \ __SEQ_LOCK(lockbase##_unlock(s->lock)); \ \ /* \ * Re-read the sequence counter since the (possibly \ * preempted) writer made progress. \ */ \ seq = smp_load_acquire(&s->seqcount.sequence); \ } \ \ return seq; \ } \ \ static __always_inline bool \ __seqprop_##lockname##_preemptible(const seqcount_##lockname##_t *s) \ { \ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ return preemptible; \ \ /* PREEMPT_RT relies on the above LOCK+UNLOCK */ \ return false; \ } \ \ static __always_inline void \ __seqprop_##lockname##_assert(const seqcount_##lockname##_t *s) \ { \ __SEQ_LOCK(lockdep_assert_held(s->lock)); \ } /* * __seqprop() for seqcount_t */ static inline seqcount_t *__seqprop_ptr(seqcount_t *s) { return s; } static inline const seqcount_t *__seqprop_const_ptr(const seqcount_t *s) { return s; } static inline unsigned __seqprop_sequence(const seqcount_t *s) { return smp_load_acquire(&s->sequence); } static inline bool __seqprop_preemptible(const seqcount_t *s) { return false; } static inline void __seqprop_assert(const seqcount_t *s) { lockdep_assert_preemption_disabled(); } #define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT) SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, raw_spin) SEQCOUNT_LOCKNAME(spinlock, spinlock_t, __SEQ_RT, spin) SEQCOUNT_LOCKNAME(rwlock, rwlock_t, __SEQ_RT, read) SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) #undef SEQCOUNT_LOCKNAME /* * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t * @name: Name of the seqcount_LOCKNAME_t instance * @lock: Pointer to the associated LOCKNAME */ #define SEQCOUNT_LOCKNAME_ZERO(seq_name, assoc_lock) { \ .seqcount = SEQCNT_ZERO(seq_name.seqcount), \ __SEQ_LOCK(.lock = (assoc_lock)) \ } #define SEQCNT_RAW_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock) #define SEQCNT_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock) #define SEQCNT_RWLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock) #define SEQCNT_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock) #define SEQCNT_WW_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock) #define __seqprop_case(s, lockname, prop) \ seqcount_##lockname##_t: __seqprop_##lockname##_##prop #define __seqprop(s, prop) _Generic(*(s), \ seqcount_t: __seqprop_##prop, \ __seqprop_case((s), raw_spinlock, prop), \ __seqprop_case((s), spinlock, prop), \ __seqprop_case((s), rwlock, prop), \ __seqprop_case((s), mutex, prop)) #define seqprop_ptr(s) __seqprop(s, ptr)(s) #define seqprop_const_ptr(s) __seqprop(s, const_ptr)(s) #define seqprop_sequence(s) __seqprop(s, sequence)(s) #define seqprop_preemptible(s) __seqprop(s, preemptible)(s) #define seqprop_assert(s) __seqprop(s, assert)(s) /** * __read_seqcount_begin() - begin a seqcount_t read section * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Return: count to be passed to read_seqcount_retry() */ #define __read_seqcount_begin(s) \ ({ \ unsigned __seq; \ \ while (unlikely((__seq = seqprop_sequence(s)) & 1)) \ cpu_relax(); \ \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ __seq; \ }) /** * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Return: count to be passed to read_seqcount_retry() */ #define raw_read_seqcount_begin(s) __read_seqcount_begin(s) /** * read_seqcount_begin() - begin a seqcount_t read critical section * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Return: count to be passed to read_seqcount_retry() */ #define read_seqcount_begin(s) \ ({ \ seqcount_lockdep_reader_access(seqprop_const_ptr(s)); \ raw_read_seqcount_begin(s); \ }) /** * raw_read_seqcount() - read the raw seqcount_t counter value * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * raw_read_seqcount opens a read critical section of the given * seqcount_t, without any lockdep checking, and without checking or * masking the sequence counter LSB. Calling code is responsible for * handling that. * * Return: count to be passed to read_seqcount_retry() */ #define raw_read_seqcount(s) \ ({ \ unsigned __seq = seqprop_sequence(s); \ \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ __seq; \ }) /** * raw_seqcount_try_begin() - begin a seqcount_t read critical section * w/o lockdep and w/o counter stabilization * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * @start: count to be passed to read_seqcount_retry() * * Similar to raw_seqcount_begin(), except it enables eliding the critical * section entirely if odd, instead of doing the speculation knowing it will * fail. * * Useful when counter stabilization is more or less equivalent to taking * the lock and there is a slowpath that does that. * * If true, start will be set to the (even) sequence count read. * * Return: true when a read critical section is started. */ #define raw_seqcount_try_begin(s, start) \ ({ \ start = raw_read_seqcount(s); \ !(start & 1); \ }) /** * raw_seqcount_begin() - begin a seqcount_t read critical section w/o * lockdep and w/o counter stabilization * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * raw_seqcount_begin opens a read critical section of the given * seqcount_t. Unlike read_seqcount_begin(), this function will not wait * for the count to stabilize. If a writer is active when it begins, it * will fail the read_seqcount_retry() at the end of the read critical * section instead of stabilizing at the beginning of it. * * Use this only in special kernel hot paths where the read section is * small and has a high probability of success through other external * means. It will save a single branching instruction. * * Return: count to be passed to read_seqcount_retry() */ #define raw_seqcount_begin(s) \ ({ \ /* \ * If the counter is odd, let read_seqcount_retry() fail \ * by decrementing the counter. \ */ \ raw_read_seqcount(s) & ~1; \ }) /** * __read_seqcount_retry() - end a seqcount_t read section w/o barrier * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * @start: count, from read_seqcount_begin() * * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb() * barrier. Callers should ensure that smp_rmb() or equivalent ordering is * provided before actually loading any of the variables that are to be * protected in this critical section. * * Use carefully, only in critical code, and comment how the barrier is * provided. * * Return: true if a read section retry is required, else false */ #define __read_seqcount_retry(s, start) \ do___read_seqcount_retry(seqprop_const_ptr(s), start) static inline int do___read_seqcount_retry(const seqcount_t *s, unsigned start) { kcsan_atomic_next(0); return unlikely(READ_ONCE(s->sequence) != start); } /** * read_seqcount_retry() - end a seqcount_t read critical section * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * @start: count, from read_seqcount_begin() * * read_seqcount_retry closes the read critical section of given * seqcount_t. If the critical section was invalid, it must be ignored * (and typically retried). * * Return: true if a read section retry is required, else false */ #define read_seqcount_retry(s, start) \ do_read_seqcount_retry(seqprop_const_ptr(s), start) static inline int do_read_seqcount_retry(const seqcount_t *s, unsigned start) { smp_rmb(); return do___read_seqcount_retry(s, start); } /** * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Context: check write_seqcount_begin() */ #define raw_write_seqcount_begin(s) \ do { \ if (seqprop_preemptible(s)) \ preempt_disable(); \ \ do_raw_write_seqcount_begin(seqprop_ptr(s)); \ } while (0) static inline void do_raw_write_seqcount_begin(seqcount_t *s) { kcsan_nestable_atomic_begin(); s->sequence++; smp_wmb(); } /** * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Context: check write_seqcount_end() */ #define raw_write_seqcount_end(s) \ do { \ do_raw_write_seqcount_end(seqprop_ptr(s)); \ \ if (seqprop_preemptible(s)) \ preempt_enable(); \ } while (0) static inline void do_raw_write_seqcount_end(seqcount_t *s) { smp_wmb(); s->sequence++; kcsan_nestable_atomic_end(); } /** * write_seqcount_begin_nested() - start a seqcount_t write section with * custom lockdep nesting level * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * @subclass: lockdep nesting level * * See Documentation/locking/lockdep-design.rst * Context: check write_seqcount_begin() */ #define write_seqcount_begin_nested(s, subclass) \ do { \ seqprop_assert(s); \ \ if (seqprop_preemptible(s)) \ preempt_disable(); \ \ do_write_seqcount_begin_nested(seqprop_ptr(s), subclass); \ } while (0) static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass) { seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_); do_raw_write_seqcount_begin(s); } /** * write_seqcount_begin() - start a seqcount_t write side critical section * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Context: sequence counter write side sections must be serialized and * non-preemptible. Preemption will be automatically disabled if and * only if the seqcount write serialization lock is associated, and * preemptible. If readers can be invoked from hardirq or softirq * context, interrupts or bottom halves must be respectively disabled. */ #define write_seqcount_begin(s) \ do { \ seqprop_assert(s); \ \ if (seqprop_preemptible(s)) \ preempt_disable(); \ \ do_write_seqcount_begin(seqprop_ptr(s)); \ } while (0) static inline void do_write_seqcount_begin(seqcount_t *s) { do_write_seqcount_begin_nested(s, 0); } /** * write_seqcount_end() - end a seqcount_t write side critical section * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * Context: Preemption will be automatically re-enabled if and only if * the seqcount write serialization lock is associated, and preemptible. */ #define write_seqcount_end(s) \ do { \ do_write_seqcount_end(seqprop_ptr(s)); \ \ if (seqprop_preemptible(s)) \ preempt_enable(); \ } while (0) static inline void do_write_seqcount_end(seqcount_t *s) { seqcount_release(&s->dep_map, _RET_IP_); do_raw_write_seqcount_end(s); } /** * raw_write_seqcount_barrier() - do a seqcount_t write barrier * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * This can be used to provide an ordering guarantee instead of the usual * consistency guarantee. It is one wmb cheaper, because it can collapse * the two back-to-back wmb()s. * * Note that writes surrounding the barrier should be declared atomic (e.g. * via WRITE_ONCE): a) to ensure the writes become visible to other threads * atomically, avoiding compiler optimizations; b) to document which writes are * meant to propagate to the reader critical section. This is necessary because * neither writes before nor after the barrier are enclosed in a seq-writer * critical section that would ensure readers are aware of ongoing writes:: * * seqcount_t seq; * bool X = true, Y = false; * * void read(void) * { * bool x, y; * * do { * int s = read_seqcount_begin(&seq); * * x = X; y = Y; * * } while (read_seqcount_retry(&seq, s)); * * BUG_ON(!x && !y); * } * * void write(void) * { * WRITE_ONCE(Y, true); * * raw_write_seqcount_barrier(seq); * * WRITE_ONCE(X, false); * } */ #define raw_write_seqcount_barrier(s) \ do_raw_write_seqcount_barrier(seqprop_ptr(s)) static inline void do_raw_write_seqcount_barrier(seqcount_t *s) { kcsan_nestable_atomic_begin(); s->sequence++; smp_wmb(); s->sequence++; kcsan_nestable_atomic_end(); } /** * write_seqcount_invalidate() - invalidate in-progress seqcount_t read * side operations * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * * After write_seqcount_invalidate, no seqcount_t read side operations * will complete successfully and see data older than this. */ #define write_seqcount_invalidate(s) \ do_write_seqcount_invalidate(seqprop_ptr(s)) static inline void do_write_seqcount_invalidate(seqcount_t *s) { smp_wmb(); kcsan_nestable_atomic_begin(); s->sequence+=2; kcsan_nestable_atomic_end(); } /* * Latch sequence counters (seqcount_latch_t) * * A sequence counter variant where the counter even/odd value is used to * switch between two copies of protected data. This allows the read path, * typically NMIs, to safely interrupt the write side critical section. * * As the write sections are fully preemptible, no special handling for * PREEMPT_RT is needed. */ typedef struct { seqcount_t seqcount; } seqcount_latch_t; /** * SEQCNT_LATCH_ZERO() - static initializer for seqcount_latch_t * @seq_name: Name of the seqcount_latch_t instance */ #define SEQCNT_LATCH_ZERO(seq_name) { \ .seqcount = SEQCNT_ZERO(seq_name.seqcount), \ } /** * seqcount_latch_init() - runtime initializer for seqcount_latch_t * @s: Pointer to the seqcount_latch_t instance */ #define seqcount_latch_init(s) seqcount_init(&(s)->seqcount) /** * raw_read_seqcount_latch() - pick even/odd latch data copy * @s: Pointer to seqcount_latch_t * * See raw_write_seqcount_latch() for details and a full reader/writer * usage example. * * Return: sequence counter raw value. Use the lowest bit as an index for * picking which data copy to read. The full counter must then be checked * with raw_read_seqcount_latch_retry(). */ static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s) { /* * Pairs with the first smp_wmb() in raw_write_seqcount_latch(). * Due to the dependent load, a full smp_rmb() is not needed. */ return READ_ONCE(s->seqcount.sequence); } /** * read_seqcount_latch() - pick even/odd latch data copy * @s: Pointer to seqcount_latch_t * * See write_seqcount_latch() for details and a full reader/writer usage * example. * * Return: sequence counter raw value. Use the lowest bit as an index for * picking which data copy to read. The full counter must then be checked * with read_seqcount_latch_retry(). */ static __always_inline unsigned read_seqcount_latch(const seqcount_latch_t *s) { kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); return raw_read_seqcount_latch(s); } /** * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section * @s: Pointer to seqcount_latch_t * @start: count, from raw_read_seqcount_latch() * * Return: true if a read section retry is required, else false */ static __always_inline int raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start) { smp_rmb(); return unlikely(READ_ONCE(s->seqcount.sequence) != start); } /** * read_seqcount_latch_retry() - end a seqcount_latch_t read section * @s: Pointer to seqcount_latch_t * @start: count, from read_seqcount_latch() * * Return: true if a read section retry is required, else false */ static __always_inline int read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start) { kcsan_atomic_next(0); return raw_read_seqcount_latch_retry(s, start); } /** * raw_write_seqcount_latch() - redirect latch readers to even/odd copy * @s: Pointer to seqcount_latch_t */ static __always_inline void raw_write_seqcount_latch(seqcount_latch_t *s) { smp_wmb(); /* prior stores before incrementing "sequence" */ s->seqcount.sequence++; smp_wmb(); /* increment "sequence" before following stores */ } /** * write_seqcount_latch_begin() - redirect latch readers to odd copy * @s: Pointer to seqcount_latch_t * * The latch technique is a multiversion concurrency control method that allows * queries during non-atomic modifications. If you can guarantee queries never * interrupt the modification -- e.g. the concurrency is strictly between CPUs * -- you most likely do not need this. * * Where the traditional RCU/lockless data structures rely on atomic * modifications to ensure queries observe either the old or the new state the * latch allows the same for non-atomic updates. The trade-off is doubling the * cost of storage; we have to maintain two copies of the entire data * structure. * * Very simply put: we first modify one copy and then the other. This ensures * there is always one copy in a stable state, ready to give us an answer. * * The basic form is a data structure like:: * * struct latch_struct { * seqcount_latch_t seq; * struct data_struct data[2]; * }; * * Where a modification, which is assumed to be externally serialized, does the * following:: * * void latch_modify(struct latch_struct *latch, ...) * { * write_seqcount_latch_begin(&latch->seq); * modify(latch->data[0], ...); * write_seqcount_latch(&latch->seq); * modify(latch->data[1], ...); * write_seqcount_latch_end(&latch->seq); * } * * The query will have a form like:: * * struct entry *latch_query(struct latch_struct *latch, ...) * { * struct entry *entry; * unsigned seq, idx; * * do { * seq = read_seqcount_latch(&latch->seq); * * idx = seq & 0x01; * entry = data_query(latch->data[idx], ...); * * // This includes needed smp_rmb() * } while (read_seqcount_latch_retry(&latch->seq, seq)); * * return entry; * } * * So during the modification, queries are first redirected to data[1]. Then we * modify data[0]. When that is complete, we redirect queries back to data[0] * and we can modify data[1]. * * NOTE: * * The non-requirement for atomic modifications does _NOT_ include * the publishing of new entries in the case where data is a dynamic * data structure. * * An iteration might start in data[0] and get suspended long enough * to miss an entire modification sequence, once it resumes it might * observe the new entry. * * NOTE2: * * When data is a dynamic data structure; one should use regular RCU * patterns to manage the lifetimes of the objects within. */ static __always_inline void write_seqcount_latch_begin(seqcount_latch_t *s) { kcsan_nestable_atomic_begin(); raw_write_seqcount_latch(s); } /** * write_seqcount_latch() - redirect latch readers to even copy * @s: Pointer to seqcount_latch_t */ static __always_inline void write_seqcount_latch(seqcount_latch_t *s) { raw_write_seqcount_latch(s); } /** * write_seqcount_latch_end() - end a seqcount_latch_t write section * @s: Pointer to seqcount_latch_t * * Marks the end of a seqcount_latch_t writer section, after all copies of the * latch-protected data have been updated. */ static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s) { kcsan_nestable_atomic_end(); } #define __SEQLOCK_UNLOCKED(lockname) \ { \ .seqcount = SEQCNT_SPINLOCK_ZERO(lockname, &(lockname).lock), \ .lock = __SPIN_LOCK_UNLOCKED(lockname) \ } /** * seqlock_init() - dynamic initializer for seqlock_t * @sl: Pointer to the seqlock_t instance */ #define seqlock_init(sl) \ do { \ spin_lock_init(&(sl)->lock); \ seqcount_spinlock_init(&(sl)->seqcount, &(sl)->lock); \ } while (0) /** * DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t * @sl: Name of the seqlock_t instance */ #define DEFINE_SEQLOCK(sl) \ seqlock_t sl = __SEQLOCK_UNLOCKED(sl) /** * read_seqbegin() - start a seqlock_t read side critical section * @sl: Pointer to seqlock_t * * Return: count, to be passed to read_seqretry() */ static inline unsigned read_seqbegin(const seqlock_t *sl) { return read_seqcount_begin(&sl->seqcount); } /** * read_seqretry() - end a seqlock_t read side section * @sl: Pointer to seqlock_t * @start: count, from read_seqbegin() * * read_seqretry closes the read side critical section of given seqlock_t. * If the critical section was invalid, it must be ignored (and typically * retried). * * Return: true if a read section retry is required, else false */ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start) { return read_seqcount_retry(&sl->seqcount, start); } /* * For all seqlock_t write side functions, use the internal * do_write_seqcount_begin() instead of generic write_seqcount_begin(). * This way, no redundant lockdep_assert_held() checks are added. */ /** * write_seqlock() - start a seqlock_t write side critical section * @sl: Pointer to seqlock_t * * write_seqlock opens a write side critical section for the given * seqlock_t. It also implicitly acquires the spinlock_t embedded inside * that sequential lock. All seqlock_t write side sections are thus * automatically serialized and non-preemptible. * * Context: if the seqlock_t read section, or other write side critical * sections, can be invoked from hardirq or softirq contexts, use the * _irqsave or _bh variants of this function instead. */ static inline void write_seqlock(seqlock_t *sl) { spin_lock(&sl->lock); do_write_seqcount_begin(&sl->seqcount.seqcount); } /** * write_sequnlock() - end a seqlock_t write side critical section * @sl: Pointer to seqlock_t * * write_sequnlock closes the (serialized and non-preemptible) write side * critical section of given seqlock_t. */ static inline void write_sequnlock(seqlock_t *sl) { do_write_seqcount_end(&sl->seqcount.seqcount); spin_unlock(&sl->lock); } /** * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section * @sl: Pointer to seqlock_t * * _bh variant of write_seqlock(). Use only if the read side section, or * other write side sections, can be invoked from softirq contexts. */ static inline void write_seqlock_bh(seqlock_t *sl) { spin_lock_bh(&sl->lock); do_write_seqcount_begin(&sl->seqcount.seqcount); } /** * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section * @sl: Pointer to seqlock_t * * write_sequnlock_bh closes the serialized, non-preemptible, and * softirqs-disabled, seqlock_t write side critical section opened with * write_seqlock_bh(). */ static inline void write_sequnlock_bh(seqlock_t *sl) { do_write_seqcount_end(&sl->seqcount.seqcount); spin_unlock_bh(&sl->lock); } /** * write_seqlock_irq() - start a non-interruptible seqlock_t write section * @sl: Pointer to seqlock_t * * _irq variant of write_seqlock(). Use only if the read side section, or * other write sections, can be invoked from hardirq contexts. */ static inline void write_seqlock_irq(seqlock_t *sl) { spin_lock_irq(&sl->lock); do_write_seqcount_begin(&sl->seqcount.seqcount); } /** * write_sequnlock_irq() - end a non-interruptible seqlock_t write section * @sl: Pointer to seqlock_t * * write_sequnlock_irq closes the serialized and non-interruptible * seqlock_t write side section opened with write_seqlock_irq(). */ static inline void write_sequnlock_irq(seqlock_t *sl) { do_write_seqcount_end(&sl->seqcount.seqcount); spin_unlock_irq(&sl->lock); } static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl) { unsigned long flags; spin_lock_irqsave(&sl->lock, flags); do_write_seqcount_begin(&sl->seqcount.seqcount); return flags; } /** * write_seqlock_irqsave() - start a non-interruptible seqlock_t write * section * @lock: Pointer to seqlock_t * @flags: Stack-allocated storage for saving caller's local interrupt * state, to be passed to write_sequnlock_irqrestore(). * * _irqsave variant of write_seqlock(). Use it only if the read side * section, or other write sections, can be invoked from hardirq context. */ #define write_seqlock_irqsave(lock, flags) \ do { flags = __write_seqlock_irqsave(lock); } while (0) /** * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write * section * @sl: Pointer to seqlock_t * @flags: Caller's saved interrupt state, from write_seqlock_irqsave() * * write_sequnlock_irqrestore closes the serialized and non-interruptible * seqlock_t write section previously opened with write_seqlock_irqsave(). */ static inline void write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags) { do_write_seqcount_end(&sl->seqcount.seqcount); spin_unlock_irqrestore(&sl->lock, flags); } /** * read_seqlock_excl() - begin a seqlock_t locking reader section * @sl: Pointer to seqlock_t * * read_seqlock_excl opens a seqlock_t locking reader critical section. A * locking reader exclusively locks out *both* other writers *and* other * locking readers, but it does not update the embedded sequence number. * * Locking readers act like a normal spin_lock()/spin_unlock(). * * Context: if the seqlock_t write section, *or other read sections*, can * be invoked from hardirq or softirq contexts, use the _irqsave or _bh * variant of this function instead. * * The opened read section must be closed with read_sequnlock_excl(). */ static inline void read_seqlock_excl(seqlock_t *sl) { spin_lock(&sl->lock); } /** * read_sequnlock_excl() - end a seqlock_t locking reader critical section * @sl: Pointer to seqlock_t */ static inline void read_sequnlock_excl(seqlock_t *sl) { spin_unlock(&sl->lock); } /** * read_seqlock_excl_bh() - start a seqlock_t locking reader section with * softirqs disabled * @sl: Pointer to seqlock_t * * _bh variant of read_seqlock_excl(). Use this variant only if the * seqlock_t write side section, *or other read sections*, can be invoked * from softirq contexts. */ static inline void read_seqlock_excl_bh(seqlock_t *sl) { spin_lock_bh(&sl->lock); } /** * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking * reader section * @sl: Pointer to seqlock_t */ static inline void read_sequnlock_excl_bh(seqlock_t *sl) { spin_unlock_bh(&sl->lock); } /** * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking * reader section * @sl: Pointer to seqlock_t * * _irq variant of read_seqlock_excl(). Use this only if the seqlock_t * write side section, *or other read sections*, can be invoked from a * hardirq context. */ static inline void read_seqlock_excl_irq(seqlock_t *sl) { spin_lock_irq(&sl->lock); } /** * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t * locking reader section * @sl: Pointer to seqlock_t */ static inline void read_sequnlock_excl_irq(seqlock_t *sl) { spin_unlock_irq(&sl->lock); } static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl) { unsigned long flags; spin_lock_irqsave(&sl->lock, flags); return flags; } /** * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t * locking reader section * @lock: Pointer to seqlock_t * @flags: Stack-allocated storage for saving caller's local interrupt * state, to be passed to read_sequnlock_excl_irqrestore(). * * _irqsave variant of read_seqlock_excl(). Use this only if the seqlock_t * write side section, *or other read sections*, can be invoked from a * hardirq context. */ #define read_seqlock_excl_irqsave(lock, flags) \ do { flags = __read_seqlock_excl_irqsave(lock); } while (0) /** * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t * locking reader section * @sl: Pointer to seqlock_t * @flags: Caller saved interrupt state, from read_seqlock_excl_irqsave() */ static inline void read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags) { spin_unlock_irqrestore(&sl->lock, flags); } /** * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader * @lock: Pointer to seqlock_t * @seq : Marker and return parameter. If the passed value is even, the * reader will become a *lockless* seqlock_t reader as in read_seqbegin(). * If the passed value is odd, the reader will become a *locking* reader * as in read_seqlock_excl(). In the first call to this function, the * caller *must* initialize and pass an even value to @seq; this way, a * lockless read can be optimistically tried first. * * read_seqbegin_or_lock is an API designed to optimistically try a normal * lockless seqlock_t read section first. If an odd counter is found, the * lockless read trial has failed, and the next read iteration transforms * itself into a full seqlock_t locking reader. * * This is typically used to avoid seqlock_t lockless readers starvation * (too much retry loops) in the case of a sharp spike in write side * activity. * * Context: if the seqlock_t write section, *or other read sections*, can * be invoked from hardirq or softirq contexts, use the _irqsave or _bh * variant of this function instead. * * Check Documentation/locking/seqlock.rst for template example code. * * Return: the encountered sequence counter value, through the @seq * parameter, which is overloaded as a return parameter. This returned * value must be checked with need_seqretry(). If the read section need to * be retried, this returned value must also be passed as the @seq * parameter of the next read_seqbegin_or_lock() iteration. */ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq) { if (!(*seq & 1)) /* Even */ *seq = read_seqbegin(lock); else /* Odd */ read_seqlock_excl(lock); } /** * need_seqretry() - validate seqlock_t "locking or lockless" read section * @lock: Pointer to seqlock_t * @seq: sequence count, from read_seqbegin_or_lock() * * Return: true if a read section retry is required, false otherwise */ static inline int need_seqretry(seqlock_t *lock, int seq) { return !(seq & 1) && read_seqretry(lock, seq); } /** * done_seqretry() - end seqlock_t "locking or lockless" reader section * @lock: Pointer to seqlock_t * @seq: count, from read_seqbegin_or_lock() * * done_seqretry finishes the seqlock_t read side critical section started * with read_seqbegin_or_lock() and validated by need_seqretry(). */ static inline void done_seqretry(seqlock_t *lock, int seq) { if (seq & 1) read_sequnlock_excl(lock); } /** * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or * a non-interruptible locking reader * @lock: Pointer to seqlock_t * @seq: Marker and return parameter. Check read_seqbegin_or_lock(). * * This is the _irqsave variant of read_seqbegin_or_lock(). Use it only if * the seqlock_t write section, *or other read sections*, can be invoked * from hardirq context. * * Note: Interrupts will be disabled only for "locking reader" mode. * * Return: * * 1. The saved local interrupts state in case of a locking reader, to * be passed to done_seqretry_irqrestore(). * * 2. The encountered sequence counter value, returned through @seq * overloaded as a return parameter. Check read_seqbegin_or_lock(). */ static inline unsigned long read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq) { unsigned long flags = 0; if (!(*seq & 1)) /* Even */ *seq = read_seqbegin(lock); else /* Odd */ read_seqlock_excl_irqsave(lock, flags); return flags; } /** * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a * non-interruptible locking reader section * @lock: Pointer to seqlock_t * @seq: Count, from read_seqbegin_or_lock_irqsave() * @flags: Caller's saved local interrupt state in case of a locking * reader, also from read_seqbegin_or_lock_irqsave() * * This is the _irqrestore variant of done_seqretry(). The read section * must've been opened with read_seqbegin_or_lock_irqsave(), and validated * by need_seqretry(). */ static inline void done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) { if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } #endif /* __LINUX_SEQLOCK_H */ |
| 39 38 39 39 58 1 41 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | /* SPDX-License-Identifier: GPL-2.0 */ /* * This header is used to share core functionality between the * standalone connection tracking module, and the compatibility layer's use * of connection tracking. * * 16 Dec 2003: Yasuyuki Kozakai @USAGI <yasuyuki.kozakai@toshiba.co.jp> * - generalize L3 protocol dependent part. * * Derived from include/linux/netfiter_ipv4/ip_conntrack_core.h */ #ifndef _NF_CONNTRACK_CORE_H #define _NF_CONNTRACK_CORE_H #include <linux/netfilter.h> #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_l4proto.h> /* This header is used to share core functionality between the standalone connection tracking module, and the compatibility layer's use of connection tracking. */ unsigned int nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state); int nf_conntrack_init_net(struct net *net); void nf_conntrack_cleanup_net(struct net *net); void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list); void nf_conntrack_proto_pernet_init(struct net *net); int nf_conntrack_proto_init(void); void nf_conntrack_proto_fini(void); int nf_conntrack_init_start(void); void nf_conntrack_cleanup_start(void); void nf_conntrack_init_end(void); void nf_conntrack_cleanup_end(void); bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse, const struct nf_conntrack_tuple *orig); /* Find a connection corresponding to a tuple. */ struct nf_conntrack_tuple_hash * nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple); int __nf_conntrack_confirm(struct sk_buff *skb); /* Confirm a connection: returns NF_DROP if packet must be dropped. */ static inline int nf_conntrack_confirm(struct sk_buff *skb) { struct nf_conn *ct = (struct nf_conn *)skb_nfct(skb); int ret = NF_ACCEPT; if (ct) { if (!nf_ct_is_confirmed(ct)) { ret = __nf_conntrack_confirm(skb); if (ret == NF_ACCEPT) ct = (struct nf_conn *)skb_nfct(skb); } if (ret == NF_ACCEPT && nf_ct_ecache_exist(ct)) nf_ct_deliver_cached_events(ct); } return ret; } unsigned int nf_confirm(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); void print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple, const struct nf_conntrack_l4proto *proto); #define CONNTRACK_LOCKS 1024 extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS]; void nf_conntrack_lock(spinlock_t *lock); extern spinlock_t nf_conntrack_expect_lock; /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ static inline void __nf_ct_set_timeout(struct nf_conn *ct, u64 timeout) { if (timeout > INT_MAX) timeout = INT_MAX; if (nf_ct_is_confirmed(ct)) WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); else ct->timeout = (u32)timeout; } int __nf_ct_change_timeout(struct nf_conn *ct, u64 cta_timeout); void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off); int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status); #endif /* _NF_CONNTRACK_CORE_H */ |
| 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 | /* * llc_pdu.c - access to PDU internals * * Copyright (c) 1997 by Procom Technology, Inc. * 2001-2003 by Arnaldo Carvalho de Melo <acme@conectiva.com.br> * * This program can be redistributed or modified under the terms of the * GNU General Public License as published by the Free Software Foundation. * This program is distributed without any warranty or implied warranty * of merchantability or fitness for a particular purpose. * * See the GNU General Public License for more details. */ #include <linux/netdevice.h> #include <net/llc_pdu.h> static void llc_pdu_decode_pdu_type(struct sk_buff *skb, u8 *type); static u8 llc_pdu_get_pf_bit(struct llc_pdu_sn *pdu); void llc_pdu_set_cmd_rsp(struct sk_buff *skb, u8 pdu_type) { llc_pdu_un_hdr(skb)->ssap |= pdu_type; } /** * llc_pdu_set_pf_bit - sets poll/final bit in LLC header * @skb: Frame to set bit in * @bit_value: poll/final bit (0 or 1). * * This function sets poll/final bit in LLC header (based on type of PDU). * in I or S pdus, p/f bit is right bit of fourth byte in header. in U * pdus p/f bit is fifth bit of third byte. */ void llc_pdu_set_pf_bit(struct sk_buff *skb, u8 bit_value) { u8 pdu_type; struct llc_pdu_sn *pdu; llc_pdu_decode_pdu_type(skb, &pdu_type); pdu = llc_pdu_sn_hdr(skb); switch (pdu_type) { case LLC_PDU_TYPE_I: case LLC_PDU_TYPE_S: pdu->ctrl_2 = (pdu->ctrl_2 & 0xFE) | bit_value; break; case LLC_PDU_TYPE_U: pdu->ctrl_1 |= (pdu->ctrl_1 & 0xEF) | (bit_value << 4); break; } } /** * llc_pdu_decode_pf_bit - extracs poll/final bit from LLC header * @skb: input skb that p/f bit must be extracted from it * @pf_bit: poll/final bit (0 or 1) * * This function extracts poll/final bit from LLC header (based on type of * PDU). In I or S pdus, p/f bit is right bit of fourth byte in header. In * U pdus p/f bit is fifth bit of third byte. */ void llc_pdu_decode_pf_bit(struct sk_buff *skb, u8 *pf_bit) { u8 pdu_type; struct llc_pdu_sn *pdu; llc_pdu_decode_pdu_type(skb, &pdu_type); pdu = llc_pdu_sn_hdr(skb); switch (pdu_type) { case LLC_PDU_TYPE_I: case LLC_PDU_TYPE_S: *pf_bit = pdu->ctrl_2 & LLC_S_PF_BIT_MASK; break; case LLC_PDU_TYPE_U: *pf_bit = (pdu->ctrl_1 & LLC_U_PF_BIT_MASK) >> 4; break; } } /** * llc_pdu_init_as_disc_cmd - Builds DISC PDU * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * * Builds a pdu frame as a DISC command. */ void llc_pdu_init_as_disc_cmd(struct sk_buff *skb, u8 p_bit) { struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_U; pdu->ctrl_1 |= LLC_2_PDU_CMD_DISC; pdu->ctrl_1 |= ((p_bit & 1) << 4) & LLC_U_PF_BIT_MASK; } /** * llc_pdu_init_as_i_cmd - builds I pdu * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * @ns: The sequence number of the data PDU * @nr: The seq. number of the expected I PDU from the remote * * Builds a pdu frame as an I command. */ void llc_pdu_init_as_i_cmd(struct sk_buff *skb, u8 p_bit, u8 ns, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_I; pdu->ctrl_2 = 0; pdu->ctrl_2 |= (p_bit & LLC_I_PF_BIT_MASK); /* p/f bit */ pdu->ctrl_1 |= (ns << 1) & 0xFE; /* set N(S) in bits 2..8 */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_rej_cmd - builds REJ PDU * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * @nr: The seq. number of the expected I PDU from the remote * * Builds a pdu frame as a REJ command. */ void llc_pdu_init_as_rej_cmd(struct sk_buff *skb, u8 p_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_CMD_REJ; pdu->ctrl_2 = 0; pdu->ctrl_2 |= p_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_rnr_cmd - builds RNR pdu * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * @nr: The seq. number of the expected I PDU from the remote * * Builds a pdu frame as an RNR command. */ void llc_pdu_init_as_rnr_cmd(struct sk_buff *skb, u8 p_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_CMD_RNR; pdu->ctrl_2 = 0; pdu->ctrl_2 |= p_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_rr_cmd - Builds RR pdu * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * @nr: The seq. number of the expected I PDU from the remote * * Builds a pdu frame as an RR command. */ void llc_pdu_init_as_rr_cmd(struct sk_buff *skb, u8 p_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_CMD_RR; pdu->ctrl_2 = p_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_sabme_cmd - builds SABME pdu * @skb: Address of the skb to build * @p_bit: The P bit to set in the PDU * * Builds a pdu frame as an SABME command. */ void llc_pdu_init_as_sabme_cmd(struct sk_buff *skb, u8 p_bit) { struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_U; pdu->ctrl_1 |= LLC_2_PDU_CMD_SABME; pdu->ctrl_1 |= ((p_bit & 1) << 4) & LLC_U_PF_BIT_MASK; } /** * llc_pdu_init_as_dm_rsp - builds DM response pdu * @skb: Address of the skb to build * @f_bit: The F bit to set in the PDU * * Builds a pdu frame as a DM response. */ void llc_pdu_init_as_dm_rsp(struct sk_buff *skb, u8 f_bit) { struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_U; pdu->ctrl_1 |= LLC_2_PDU_RSP_DM; pdu->ctrl_1 |= ((f_bit & 1) << 4) & LLC_U_PF_BIT_MASK; } /** * llc_pdu_init_as_frmr_rsp - builds FRMR response PDU * @skb: Address of the frame to build * @prev_pdu: The rejected PDU frame * @f_bit: The F bit to set in the PDU * @vs: tx state vari value for the data link conn at the rejecting LLC * @vr: rx state var value for the data link conn at the rejecting LLC * @vzyxw: completely described in the IEEE Std 802.2 document (Pg 55) * * Builds a pdu frame as a FRMR response. */ void llc_pdu_init_as_frmr_rsp(struct sk_buff *skb, struct llc_pdu_sn *prev_pdu, u8 f_bit, u8 vs, u8 vr, u8 vzyxw) { struct llc_frmr_info *frmr_info; u8 prev_pf = 0; u8 *ctrl; struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_U; pdu->ctrl_1 |= LLC_2_PDU_RSP_FRMR; pdu->ctrl_1 |= ((f_bit & 1) << 4) & LLC_U_PF_BIT_MASK; frmr_info = (struct llc_frmr_info *)&pdu->ctrl_2; ctrl = (u8 *)&prev_pdu->ctrl_1; FRMR_INFO_SET_REJ_CNTRL(frmr_info,ctrl); FRMR_INFO_SET_Vs(frmr_info, vs); FRMR_INFO_SET_Vr(frmr_info, vr); prev_pf = llc_pdu_get_pf_bit(prev_pdu); FRMR_INFO_SET_C_R_BIT(frmr_info, prev_pf); FRMR_INFO_SET_INVALID_PDU_CTRL_IND(frmr_info, vzyxw); FRMR_INFO_SET_INVALID_PDU_INFO_IND(frmr_info, vzyxw); FRMR_INFO_SET_PDU_INFO_2LONG_IND(frmr_info, vzyxw); FRMR_INFO_SET_PDU_INVALID_Nr_IND(frmr_info, vzyxw); FRMR_INFO_SET_PDU_INVALID_Ns_IND(frmr_info, vzyxw); skb_put(skb, sizeof(struct llc_frmr_info)); } /** * llc_pdu_init_as_rr_rsp - builds RR response pdu * @skb: Address of the skb to build * @f_bit: The F bit to set in the PDU * @nr: The seq. number of the expected data PDU from the remote * * Builds a pdu frame as an RR response. */ void llc_pdu_init_as_rr_rsp(struct sk_buff *skb, u8 f_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_RSP_RR; pdu->ctrl_2 = 0; pdu->ctrl_2 |= f_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_rej_rsp - builds REJ response pdu * @skb: Address of the skb to build * @f_bit: The F bit to set in the PDU * @nr: The seq. number of the expected data PDU from the remote * * Builds a pdu frame as a REJ response. */ void llc_pdu_init_as_rej_rsp(struct sk_buff *skb, u8 f_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_RSP_REJ; pdu->ctrl_2 = 0; pdu->ctrl_2 |= f_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_rnr_rsp - builds RNR response pdu * @skb: Address of the frame to build * @f_bit: The F bit to set in the PDU * @nr: The seq. number of the expected data PDU from the remote * * Builds a pdu frame as an RNR response. */ void llc_pdu_init_as_rnr_rsp(struct sk_buff *skb, u8 f_bit, u8 nr) { struct llc_pdu_sn *pdu = llc_pdu_sn_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_S; pdu->ctrl_1 |= LLC_2_PDU_RSP_RNR; pdu->ctrl_2 = 0; pdu->ctrl_2 |= f_bit & LLC_S_PF_BIT_MASK; pdu->ctrl_1 &= 0x0F; /* setting bits 5..8 to zero(reserved) */ pdu->ctrl_2 |= (nr << 1) & 0xFE; /* set N(R) in bits 10..16 */ } /** * llc_pdu_init_as_ua_rsp - builds UA response pdu * @skb: Address of the frame to build * @f_bit: The F bit to set in the PDU * * Builds a pdu frame as a UA response. */ void llc_pdu_init_as_ua_rsp(struct sk_buff *skb, u8 f_bit) { struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); pdu->ctrl_1 = LLC_PDU_TYPE_U; pdu->ctrl_1 |= LLC_2_PDU_RSP_UA; pdu->ctrl_1 |= ((f_bit & 1) << 4) & LLC_U_PF_BIT_MASK; } /** * llc_pdu_decode_pdu_type - designates PDU type * @skb: input skb that type of it must be designated. * @type: type of PDU (output argument). * * This function designates type of PDU (I, S or U). */ static void llc_pdu_decode_pdu_type(struct sk_buff *skb, u8 *type) { struct llc_pdu_un *pdu = llc_pdu_un_hdr(skb); if (pdu->ctrl_1 & 1) { if ((pdu->ctrl_1 & LLC_PDU_TYPE_U) == LLC_PDU_TYPE_U) *type = LLC_PDU_TYPE_U; else *type = LLC_PDU_TYPE_S; } else *type = LLC_PDU_TYPE_I; } /** * llc_pdu_get_pf_bit - extracts p/f bit of input PDU * @pdu: pointer to LLC header. * * This function extracts p/f bit of input PDU. at first examines type of * PDU and then extracts p/f bit. Returns the p/f bit. */ static u8 llc_pdu_get_pf_bit(struct llc_pdu_sn *pdu) { u8 pdu_type; u8 pf_bit = 0; if (pdu->ctrl_1 & 1) { if ((pdu->ctrl_1 & LLC_PDU_TYPE_U) == LLC_PDU_TYPE_U) pdu_type = LLC_PDU_TYPE_U; else pdu_type = LLC_PDU_TYPE_S; } else pdu_type = LLC_PDU_TYPE_I; switch (pdu_type) { case LLC_PDU_TYPE_I: case LLC_PDU_TYPE_S: pf_bit = pdu->ctrl_2 & LLC_S_PF_BIT_MASK; break; case LLC_PDU_TYPE_U: pf_bit = (pdu->ctrl_1 & LLC_U_PF_BIT_MASK) >> 4; break; } return pf_bit; } |
| 12 68 68 65 7 61 73 70 40 67 72 73 70 3 74 60 60 60 61 1 61 60 45 128 72 125 72 72 70 1 70 60 61 61 76 76 115 116 60 65 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 | /* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef __SOUND_PCM_PARAMS_H #define __SOUND_PCM_PARAMS_H /* * PCM params helpers * Copyright (c) by Abramo Bagnara <abramo@alsa-project.org> */ #include <sound/pcm.h> int snd_pcm_hw_param_first(struct snd_pcm_substream *pcm, struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var, int *dir); int snd_pcm_hw_param_last(struct snd_pcm_substream *pcm, struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var, int *dir); int snd_pcm_hw_param_value(const struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var, int *dir); #define SNDRV_MASK_BITS 64 /* we use so far 64bits only */ #define SNDRV_MASK_SIZE (SNDRV_MASK_BITS / 32) #define MASK_OFS(i) ((i) >> 5) #define MASK_BIT(i) (1U << ((i) & 31)) static inline void snd_mask_none(struct snd_mask *mask) { memset(mask, 0, sizeof(*mask)); } static inline void snd_mask_any(struct snd_mask *mask) { memset(mask, 0xff, SNDRV_MASK_SIZE * sizeof(u_int32_t)); } static inline int snd_mask_empty(const struct snd_mask *mask) { int i; for (i = 0; i < SNDRV_MASK_SIZE; i++) if (mask->bits[i]) return 0; return 1; } static inline unsigned int snd_mask_min(const struct snd_mask *mask) { int i; for (i = 0; i < SNDRV_MASK_SIZE; i++) { if (mask->bits[i]) return __ffs(mask->bits[i]) + (i << 5); } return 0; } static inline unsigned int snd_mask_max(const struct snd_mask *mask) { int i; for (i = SNDRV_MASK_SIZE - 1; i >= 0; i--) { if (mask->bits[i]) return __fls(mask->bits[i]) + (i << 5); } return 0; } static inline void snd_mask_set(struct snd_mask *mask, unsigned int val) { mask->bits[MASK_OFS(val)] |= MASK_BIT(val); } /* Most of drivers need only this one */ static inline void snd_mask_set_format(struct snd_mask *mask, snd_pcm_format_t format) { snd_mask_set(mask, (__force unsigned int)format); } static inline void snd_mask_reset(struct snd_mask *mask, unsigned int val) { mask->bits[MASK_OFS(val)] &= ~MASK_BIT(val); } static inline void snd_mask_set_range(struct snd_mask *mask, unsigned int from, unsigned int to) { unsigned int i; for (i = from; i <= to; i++) mask->bits[MASK_OFS(i)] |= MASK_BIT(i); } static inline void snd_mask_reset_range(struct snd_mask *mask, unsigned int from, unsigned int to) { unsigned int i; for (i = from; i <= to; i++) mask->bits[MASK_OFS(i)] &= ~MASK_BIT(i); } static inline void snd_mask_leave(struct snd_mask *mask, unsigned int val) { unsigned int v; v = mask->bits[MASK_OFS(val)] & MASK_BIT(val); snd_mask_none(mask); mask->bits[MASK_OFS(val)] = v; } static inline void snd_mask_intersect(struct snd_mask *mask, const struct snd_mask *v) { int i; for (i = 0; i < SNDRV_MASK_SIZE; i++) mask->bits[i] &= v->bits[i]; } static inline int snd_mask_eq(const struct snd_mask *mask, const struct snd_mask *v) { return ! memcmp(mask, v, SNDRV_MASK_SIZE * sizeof(u_int32_t)); } static inline void snd_mask_copy(struct snd_mask *mask, const struct snd_mask *v) { *mask = *v; } static inline int snd_mask_test(const struct snd_mask *mask, unsigned int val) { return mask->bits[MASK_OFS(val)] & MASK_BIT(val); } /* Most of drivers need only this one */ static inline int snd_mask_test_format(const struct snd_mask *mask, snd_pcm_format_t format) { return snd_mask_test(mask, (__force unsigned int)format); } static inline int snd_mask_single(const struct snd_mask *mask) { int i, c = 0; for (i = 0; i < SNDRV_MASK_SIZE; i++) { if (! mask->bits[i]) continue; if (mask->bits[i] & (mask->bits[i] - 1)) return 0; if (c) return 0; c++; } return 1; } static inline int snd_mask_refine(struct snd_mask *mask, const struct snd_mask *v) { struct snd_mask old; snd_mask_copy(&old, mask); snd_mask_intersect(mask, v); if (snd_mask_empty(mask)) return -EINVAL; return !snd_mask_eq(mask, &old); } static inline int snd_mask_refine_first(struct snd_mask *mask) { if (snd_mask_single(mask)) return 0; snd_mask_leave(mask, snd_mask_min(mask)); return 1; } static inline int snd_mask_refine_last(struct snd_mask *mask) { if (snd_mask_single(mask)) return 0; snd_mask_leave(mask, snd_mask_max(mask)); return 1; } static inline int snd_mask_refine_min(struct snd_mask *mask, unsigned int val) { if (snd_mask_min(mask) >= val) return 0; snd_mask_reset_range(mask, 0, val - 1); if (snd_mask_empty(mask)) return -EINVAL; return 1; } static inline int snd_mask_refine_max(struct snd_mask *mask, unsigned int val) { if (snd_mask_max(mask) <= val) return 0; snd_mask_reset_range(mask, val + 1, SNDRV_MASK_BITS); if (snd_mask_empty(mask)) return -EINVAL; return 1; } static inline int snd_mask_refine_set(struct snd_mask *mask, unsigned int val) { int changed; changed = !snd_mask_single(mask); snd_mask_leave(mask, val); if (snd_mask_empty(mask)) return -EINVAL; return changed; } static inline int snd_mask_value(const struct snd_mask *mask) { return snd_mask_min(mask); } static inline void snd_interval_any(struct snd_interval *i) { i->min = 0; i->openmin = 0; i->max = UINT_MAX; i->openmax = 0; i->integer = 0; i->empty = 0; } static inline void snd_interval_none(struct snd_interval *i) { i->empty = 1; } static inline int snd_interval_checkempty(const struct snd_interval *i) { return (i->min > i->max || (i->min == i->max && (i->openmin || i->openmax))); } static inline int snd_interval_empty(const struct snd_interval *i) { return i->empty; } static inline int snd_interval_single(const struct snd_interval *i) { return (i->min == i->max || (i->min + 1 == i->max && (i->openmin || i->openmax))); } static inline int snd_interval_value(const struct snd_interval *i) { if (i->openmin && !i->openmax) return i->max; return i->min; } static inline int snd_interval_min(const struct snd_interval *i) { return i->min; } static inline int snd_interval_max(const struct snd_interval *i) { unsigned int v; v = i->max; if (i->openmax) v--; return v; } static inline int snd_interval_test(const struct snd_interval *i, unsigned int val) { return !((i->min > val || (i->min == val && i->openmin) || i->max < val || (i->max == val && i->openmax))); } static inline void snd_interval_copy(struct snd_interval *d, const struct snd_interval *s) { *d = *s; } static inline int snd_interval_setinteger(struct snd_interval *i) { if (i->integer) return 0; if (i->openmin && i->openmax && i->min == i->max) return -EINVAL; i->integer = 1; return 1; } static inline int snd_interval_eq(const struct snd_interval *i1, const struct snd_interval *i2) { if (i1->empty) return i2->empty; if (i2->empty) return i1->empty; return i1->min == i2->min && i1->openmin == i2->openmin && i1->max == i2->max && i1->openmax == i2->openmax; } /** * params_access - get the access type from the hw params * @p: hw params */ static inline snd_pcm_access_t params_access(const struct snd_pcm_hw_params *p) { return (__force snd_pcm_access_t)snd_mask_min(hw_param_mask_c(p, SNDRV_PCM_HW_PARAM_ACCESS)); } /** * params_format - get the sample format from the hw params * @p: hw params */ static inline snd_pcm_format_t params_format(const struct snd_pcm_hw_params *p) { return (__force snd_pcm_format_t)snd_mask_min(hw_param_mask_c(p, SNDRV_PCM_HW_PARAM_FORMAT)); } /** * params_subformat - get the sample subformat from the hw params * @p: hw params */ static inline snd_pcm_subformat_t params_subformat(const struct snd_pcm_hw_params *p) { return (__force snd_pcm_subformat_t)snd_mask_min(hw_param_mask_c(p, SNDRV_PCM_HW_PARAM_SUBFORMAT)); } /** * params_period_bytes - get the period size (in bytes) from the hw params * @p: hw params */ static inline unsigned int params_period_bytes(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_PERIOD_BYTES)->min; } /** * params_width - get the number of bits of the sample format from the hw params * @p: hw params * * This function returns the number of bits per sample that the selected sample * format of the hw params has. */ static inline int params_width(const struct snd_pcm_hw_params *p) { return snd_pcm_format_width(params_format(p)); } /* * params_physical_width - get the storage size of the sample format from the hw params * @p: hw params * * This functions returns the number of bits per sample that the selected sample * format of the hw params takes up in memory. This will be equal or larger than * params_width(). */ static inline int params_physical_width(const struct snd_pcm_hw_params *p) { return snd_pcm_format_physical_width(params_format(p)); } int snd_pcm_hw_params_bits(const struct snd_pcm_hw_params *p); static inline void params_set_format(struct snd_pcm_hw_params *p, snd_pcm_format_t fmt) { snd_mask_set_format(hw_param_mask(p, SNDRV_PCM_HW_PARAM_FORMAT), fmt); } #endif /* __SOUND_PCM_PARAMS_H */ |
| 23 1 8 8 1 1 7 6 3 8 7 1 2 1 1 3 2 2 4 4 2 4 1 2 18 40 41 35 34 8 8 8 5 5 5 2 5 8 41 53 52 54 54 54 54 38 38 18 38 31 32 12 18 18 18 18 18 11 11 11 11 11 4 11 18 53 4 3 3 2 2 14 14 12 9 1 8 8 2 1 3 3 3 3 1 9 9 14 3 2 2 13 1 12 1 11 5 5 10 9 1 1 5 8 4 4 3 3 4 1 5 3 6 29 15 8 1 14 1 12 1 6 1 11 1 10 4 1 9 15 4 4 1 29 24 23 13 10 5 5 1 5 3 14 14 14 14 5 3 1 2 1 2 9 1 8 1 7 1 6 7 6 1 5 2 5 4 17 21 20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/ethtool_netlink.h> #include <linux/bitmap.h> #include "netlink.h" #include "bitset.h" /* Some bitmaps are internally represented as an array of unsigned long, some * as an array of u32 (some even as single u32 for now). To avoid the need of * wrappers on caller side, we provide two set of functions: those with "32" * suffix in their names expect u32 based bitmaps, those without it expect * unsigned long bitmaps. */ static u32 ethnl_lower_bits(unsigned int n) { return ~(u32)0 >> (32 - n % 32); } static u32 ethnl_upper_bits(unsigned int n) { return ~(u32)0 << (n % 32); } /** * ethnl_bitmap32_clear() - Clear u32 based bitmap * @dst: bitmap to clear * @start: beginning of the interval * @end: end of the interval * @mod: set if bitmap was modified * * Clear @nbits bits of a bitmap with indices @start <= i < @end */ static void ethnl_bitmap32_clear(u32 *dst, unsigned int start, unsigned int end, bool *mod) { unsigned int start_word = start / 32; unsigned int end_word = end / 32; unsigned int i; u32 mask; if (end <= start) return; if (start % 32) { mask = ethnl_upper_bits(start); if (end_word == start_word) { mask &= ethnl_lower_bits(end); if (dst[start_word] & mask) { dst[start_word] &= ~mask; *mod = true; } return; } if (dst[start_word] & mask) { dst[start_word] &= ~mask; *mod = true; } start_word++; } for (i = start_word; i < end_word; i++) { if (dst[i]) { dst[i] = 0; *mod = true; } } if (end % 32) { mask = ethnl_lower_bits(end); if (dst[end_word] & mask) { dst[end_word] &= ~mask; *mod = true; } } } /** * ethnl_bitmap32_not_zero() - Check if any bit is set in an interval * @map: bitmap to test * @start: beginning of the interval * @end: end of the interval * * Return: true if there is non-zero bit with index @start <= i < @end, * false if the whole interval is zero */ static bool ethnl_bitmap32_not_zero(const u32 *map, unsigned int start, unsigned int end) { unsigned int start_word = start / 32; unsigned int end_word = end / 32; u32 mask; if (end <= start) return true; if (start % 32) { mask = ethnl_upper_bits(start); if (end_word == start_word) { mask &= ethnl_lower_bits(end); return map[start_word] & mask; } if (map[start_word] & mask) return true; start_word++; } if (!memchr_inv(map + start_word, '\0', (end_word - start_word) * sizeof(u32))) return true; if (end % 32 == 0) return true; return map[end_word] & ethnl_lower_bits(end); } /** * ethnl_bitmap32_update() - Modify u32 based bitmap according to value/mask * pair * @dst: bitmap to update * @nbits: bit size of the bitmap * @value: values to set * @mask: mask of bits to set * @mod: set to true if bitmap is modified, preserve if not * * Set bits in @dst bitmap which are set in @mask to values from @value, leave * the rest untouched. If destination bitmap was modified, set @mod to true, * leave as it is if not. */ static void ethnl_bitmap32_update(u32 *dst, unsigned int nbits, const u32 *value, const u32 *mask, bool *mod) { while (nbits > 0) { u32 real_mask = mask ? *mask : ~(u32)0; u32 new_value; if (nbits < 32) real_mask &= ethnl_lower_bits(nbits); new_value = (*dst & ~real_mask) | (*value & real_mask); if (new_value != *dst) { *dst = new_value; *mod = true; } if (nbits <= 32) break; dst++; nbits -= 32; value++; if (mask) mask++; } } static bool ethnl_bitmap32_test_bit(const u32 *map, unsigned int index) { return map[index / 32] & (1U << (index % 32)); } /** * ethnl_bitset32_size() - Calculate size of bitset nested attribute * @val: value bitmap (u32 based) * @mask: mask bitmap (u32 based, optional) * @nbits: bit length of the bitset * @names: array of bit names (optional) * @compact: assume compact format for output * * Estimate length of netlink attribute composed by a later call to * ethnl_put_bitset32() call with the same arguments. * * Return: negative error code or attribute length estimate */ int ethnl_bitset32_size(const u32 *val, const u32 *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { unsigned int len = 0; /* list flag */ if (!mask) len += nla_total_size(sizeof(u32)); /* size */ len += nla_total_size(sizeof(u32)); if (compact) { unsigned int nwords = DIV_ROUND_UP(nbits, 32); /* value, mask */ len += (mask ? 2 : 1) * nla_total_size(nwords * sizeof(u32)); } else { unsigned int bits_len = 0; unsigned int bit_len, i; for (i = 0; i < nbits; i++) { const char *name = names ? names[i] : NULL; if (!ethnl_bitmap32_test_bit(mask ?: val, i)) continue; /* index */ bit_len = nla_total_size(sizeof(u32)); /* name */ if (name) bit_len += ethnl_strz_size(name); /* value */ if (mask && ethnl_bitmap32_test_bit(val, i)) bit_len += nla_total_size(0); /* bit nest */ bits_len += nla_total_size(bit_len); } /* bits nest */ len += nla_total_size(bits_len); } /* outermost nest */ return nla_total_size(len); } /** * ethnl_put_bitset32() - Put a bitset nest into a message * @skb: skb with the message * @attrtype: attribute type for the bitset nest * @val: value bitmap (u32 based) * @mask: mask bitmap (u32 based, optional) * @nbits: bit length of the bitset * @names: array of bit names (optional) * @compact: use compact format for the output * * Compose a nested attribute representing a bitset. If @mask is null, simple * bitmap (bit list) is created, if @mask is provided, represent a value/mask * pair. Bit names are only used in verbose mode and when provided by calller. * * Return: 0 on success, negative error value on error */ int ethnl_put_bitset32(struct sk_buff *skb, int attrtype, const u32 *val, const u32 *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { struct nlattr *nest; struct nlattr *attr; nest = nla_nest_start(skb, attrtype); if (!nest) return -EMSGSIZE; if (!mask && nla_put_flag(skb, ETHTOOL_A_BITSET_NOMASK)) goto nla_put_failure; if (nla_put_u32(skb, ETHTOOL_A_BITSET_SIZE, nbits)) goto nla_put_failure; if (compact) { unsigned int nwords = DIV_ROUND_UP(nbits, 32); unsigned int nbytes = nwords * sizeof(u32); u32 *dst; attr = nla_reserve(skb, ETHTOOL_A_BITSET_VALUE, nbytes); if (!attr) goto nla_put_failure; dst = nla_data(attr); memcpy(dst, val, nbytes); if (nbits % 32) dst[nwords - 1] &= ethnl_lower_bits(nbits); if (mask) { attr = nla_reserve(skb, ETHTOOL_A_BITSET_MASK, nbytes); if (!attr) goto nla_put_failure; dst = nla_data(attr); memcpy(dst, mask, nbytes); if (nbits % 32) dst[nwords - 1] &= ethnl_lower_bits(nbits); } } else { struct nlattr *bits; unsigned int i; bits = nla_nest_start(skb, ETHTOOL_A_BITSET_BITS); if (!bits) goto nla_put_failure; for (i = 0; i < nbits; i++) { const char *name = names ? names[i] : NULL; if (!ethnl_bitmap32_test_bit(mask ?: val, i)) continue; attr = nla_nest_start(skb, ETHTOOL_A_BITSET_BITS_BIT); if (!attr) goto nla_put_failure; if (nla_put_u32(skb, ETHTOOL_A_BITSET_BIT_INDEX, i)) goto nla_put_failure; if (name && ethnl_put_strz(skb, ETHTOOL_A_BITSET_BIT_NAME, name)) goto nla_put_failure; if (mask && ethnl_bitmap32_test_bit(val, i) && nla_put_flag(skb, ETHTOOL_A_BITSET_BIT_VALUE)) goto nla_put_failure; nla_nest_end(skb, attr); } nla_nest_end(skb, bits); } nla_nest_end(skb, nest); return 0; nla_put_failure: nla_nest_cancel(skb, nest); return -EMSGSIZE; } static const struct nla_policy bitset_policy[] = { [ETHTOOL_A_BITSET_NOMASK] = { .type = NLA_FLAG }, [ETHTOOL_A_BITSET_SIZE] = NLA_POLICY_MAX(NLA_U32, ETHNL_MAX_BITSET_SIZE), [ETHTOOL_A_BITSET_BITS] = { .type = NLA_NESTED }, [ETHTOOL_A_BITSET_VALUE] = { .type = NLA_BINARY }, [ETHTOOL_A_BITSET_MASK] = { .type = NLA_BINARY }, }; static const struct nla_policy bit_policy[] = { [ETHTOOL_A_BITSET_BIT_INDEX] = { .type = NLA_U32 }, [ETHTOOL_A_BITSET_BIT_NAME] = { .type = NLA_NUL_STRING }, [ETHTOOL_A_BITSET_BIT_VALUE] = { .type = NLA_FLAG }, }; /** * ethnl_bitset_is_compact() - check if bitset attribute represents a compact * bitset * @bitset: nested attribute representing a bitset * @compact: pointer for return value * * Return: 0 on success, negative error code on failure */ int ethnl_bitset_is_compact(const struct nlattr *bitset, bool *compact) { struct nlattr *tb[ARRAY_SIZE(bitset_policy)]; int ret; ret = nla_parse_nested(tb, ARRAY_SIZE(bitset_policy) - 1, bitset, bitset_policy, NULL); if (ret < 0) return ret; if (tb[ETHTOOL_A_BITSET_BITS]) { if (tb[ETHTOOL_A_BITSET_VALUE] || tb[ETHTOOL_A_BITSET_MASK]) return -EINVAL; *compact = false; return 0; } if (!tb[ETHTOOL_A_BITSET_SIZE] || !tb[ETHTOOL_A_BITSET_VALUE]) return -EINVAL; *compact = true; return 0; } /** * ethnl_name_to_idx() - look up string index for a name * @names: array of ETH_GSTRING_LEN sized strings * @n_names: number of strings in the array * @name: name to look up * * Return: index of the string if found, -ENOENT if not found */ static int ethnl_name_to_idx(ethnl_string_array_t names, unsigned int n_names, const char *name) { unsigned int i; if (!names) return -ENOENT; for (i = 0; i < n_names; i++) { /* names[i] may not be null terminated */ if (!strncmp(names[i], name, ETH_GSTRING_LEN) && strlen(name) <= ETH_GSTRING_LEN) return i; } return -ENOENT; } static int ethnl_parse_bit(unsigned int *index, bool *val, unsigned int nbits, const struct nlattr *bit_attr, bool no_mask, ethnl_string_array_t names, struct netlink_ext_ack *extack) { struct nlattr *tb[ARRAY_SIZE(bit_policy)]; int ret, idx; ret = nla_parse_nested(tb, ARRAY_SIZE(bit_policy) - 1, bit_attr, bit_policy, extack); if (ret < 0) return ret; if (tb[ETHTOOL_A_BITSET_BIT_INDEX]) { const char *name; idx = nla_get_u32(tb[ETHTOOL_A_BITSET_BIT_INDEX]); if (idx >= nbits) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_BIT_INDEX], "bit index too high"); return -EOPNOTSUPP; } name = names ? names[idx] : NULL; if (tb[ETHTOOL_A_BITSET_BIT_NAME] && name && strncmp(nla_data(tb[ETHTOOL_A_BITSET_BIT_NAME]), name, nla_len(tb[ETHTOOL_A_BITSET_BIT_NAME]))) { NL_SET_ERR_MSG_ATTR(extack, bit_attr, "bit index and name mismatch"); return -EINVAL; } } else if (tb[ETHTOOL_A_BITSET_BIT_NAME]) { idx = ethnl_name_to_idx(names, nbits, nla_data(tb[ETHTOOL_A_BITSET_BIT_NAME])); if (idx < 0) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_BIT_NAME], "bit name not found"); return -EOPNOTSUPP; } } else { NL_SET_ERR_MSG_ATTR(extack, bit_attr, "neither bit index nor name specified"); return -EINVAL; } *index = idx; *val = no_mask || tb[ETHTOOL_A_BITSET_BIT_VALUE]; return 0; } /** * ethnl_bitmap32_equal() - Compare two bitmaps * @map1: first bitmap * @map2: second bitmap * @nbits: bit size to compare * * Return: true if first @nbits are equal, false if not */ static bool ethnl_bitmap32_equal(const u32 *map1, const u32 *map2, unsigned int nbits) { if (memcmp(map1, map2, nbits / 32 * sizeof(u32))) return false; if (nbits % 32 == 0) return true; return !((map1[nbits / 32] ^ map2[nbits / 32]) & ethnl_lower_bits(nbits % 32)); } static int ethnl_update_bitset32_verbose(u32 *bitmap, unsigned int nbits, const struct nlattr *attr, struct nlattr **tb, ethnl_string_array_t names, struct netlink_ext_ack *extack, bool *mod) { u32 *saved_bitmap = NULL; struct nlattr *bit_attr; bool no_mask; int rem; int ret; if (tb[ETHTOOL_A_BITSET_VALUE]) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_VALUE], "value only allowed in compact bitset"); return -EINVAL; } if (tb[ETHTOOL_A_BITSET_MASK]) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_MASK], "mask only allowed in compact bitset"); return -EINVAL; } no_mask = tb[ETHTOOL_A_BITSET_NOMASK]; if (no_mask) { unsigned int nwords = DIV_ROUND_UP(nbits, 32); unsigned int nbytes = nwords * sizeof(u32); bool dummy; /* The bitmap size is only the size of the map part without * its mask part. */ saved_bitmap = kcalloc(nwords, sizeof(u32), GFP_KERNEL); if (!saved_bitmap) return -ENOMEM; memcpy(saved_bitmap, bitmap, nbytes); ethnl_bitmap32_clear(bitmap, 0, nbits, &dummy); } nla_for_each_nested(bit_attr, tb[ETHTOOL_A_BITSET_BITS], rem) { bool old_val, new_val; unsigned int idx; if (nla_type(bit_attr) != ETHTOOL_A_BITSET_BITS_BIT) { NL_SET_ERR_MSG_ATTR(extack, bit_attr, "only ETHTOOL_A_BITSET_BITS_BIT allowed in ETHTOOL_A_BITSET_BITS"); kfree(saved_bitmap); return -EINVAL; } ret = ethnl_parse_bit(&idx, &new_val, nbits, bit_attr, no_mask, names, extack); if (ret < 0) { kfree(saved_bitmap); return ret; } old_val = bitmap[idx / 32] & ((u32)1 << (idx % 32)); if (new_val != old_val) { if (new_val) bitmap[idx / 32] |= ((u32)1 << (idx % 32)); else bitmap[idx / 32] &= ~((u32)1 << (idx % 32)); if (!no_mask) *mod = true; } } if (no_mask && !ethnl_bitmap32_equal(saved_bitmap, bitmap, nbits)) *mod = true; kfree(saved_bitmap); return 0; } static int ethnl_compact_sanity_checks(unsigned int nbits, const struct nlattr *nest, struct nlattr **tb, struct netlink_ext_ack *extack) { bool no_mask = tb[ETHTOOL_A_BITSET_NOMASK]; unsigned int attr_nbits, attr_nwords; const struct nlattr *test_attr; if (no_mask && tb[ETHTOOL_A_BITSET_MASK]) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_MASK], "mask not allowed in list bitset"); return -EINVAL; } if (!tb[ETHTOOL_A_BITSET_SIZE]) { NL_SET_ERR_MSG_ATTR(extack, nest, "missing size in compact bitset"); return -EINVAL; } if (!tb[ETHTOOL_A_BITSET_VALUE]) { NL_SET_ERR_MSG_ATTR(extack, nest, "missing value in compact bitset"); return -EINVAL; } if (!no_mask && !tb[ETHTOOL_A_BITSET_MASK]) { NL_SET_ERR_MSG_ATTR(extack, nest, "missing mask in compact nonlist bitset"); return -EINVAL; } attr_nbits = nla_get_u32(tb[ETHTOOL_A_BITSET_SIZE]); attr_nwords = DIV_ROUND_UP(attr_nbits, 32); if (nla_len(tb[ETHTOOL_A_BITSET_VALUE]) != attr_nwords * sizeof(u32)) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_VALUE], "bitset value length does not match size"); return -EINVAL; } if (tb[ETHTOOL_A_BITSET_MASK] && nla_len(tb[ETHTOOL_A_BITSET_MASK]) != attr_nwords * sizeof(u32)) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_MASK], "bitset mask length does not match size"); return -EINVAL; } if (attr_nbits <= nbits) return 0; test_attr = no_mask ? tb[ETHTOOL_A_BITSET_VALUE] : tb[ETHTOOL_A_BITSET_MASK]; if (ethnl_bitmap32_not_zero(nla_data(test_attr), nbits, attr_nbits)) { NL_SET_ERR_MSG_ATTR(extack, test_attr, "cannot modify bits past kernel bitset size"); return -EINVAL; } return 0; } /** * ethnl_update_bitset32() - Apply a bitset nest to a u32 based bitmap * @bitmap: bitmap to update * @nbits: size of the updated bitmap in bits * @attr: nest attribute to parse and apply * @names: array of bit names; may be null for compact format * @extack: extack for error reporting * @mod: set this to true if bitmap is modified, leave as it is if not * * Apply bitset netsted attribute to a bitmap. If the attribute represents * a bit list, @bitmap is set to its contents; otherwise, bits in mask are * set to values from value. Bitmaps in the attribute may be longer than * @nbits but the message must not request modifying any bits past @nbits. * * Return: negative error code on failure, 0 on success */ int ethnl_update_bitset32(u32 *bitmap, unsigned int nbits, const struct nlattr *attr, ethnl_string_array_t names, struct netlink_ext_ack *extack, bool *mod) { struct nlattr *tb[ARRAY_SIZE(bitset_policy)]; unsigned int change_bits; bool no_mask; int ret; if (!attr) return 0; ret = nla_parse_nested(tb, ARRAY_SIZE(bitset_policy) - 1, attr, bitset_policy, extack); if (ret < 0) return ret; if (tb[ETHTOOL_A_BITSET_BITS]) return ethnl_update_bitset32_verbose(bitmap, nbits, attr, tb, names, extack, mod); ret = ethnl_compact_sanity_checks(nbits, attr, tb, extack); if (ret < 0) return ret; no_mask = tb[ETHTOOL_A_BITSET_NOMASK]; change_bits = min_t(unsigned int, nla_get_u32(tb[ETHTOOL_A_BITSET_SIZE]), nbits); ethnl_bitmap32_update(bitmap, change_bits, nla_data(tb[ETHTOOL_A_BITSET_VALUE]), no_mask ? NULL : nla_data(tb[ETHTOOL_A_BITSET_MASK]), mod); if (no_mask && change_bits < nbits) ethnl_bitmap32_clear(bitmap, change_bits, nbits, mod); return 0; } /** * ethnl_parse_bitset() - Compute effective value and mask from bitset nest * @val: unsigned long based bitmap to put value into * @mask: unsigned long based bitmap to put mask into * @nbits: size of @val and @mask bitmaps * @attr: nest attribute to parse and apply * @names: array of bit names; may be null for compact format * @extack: extack for error reporting * * Provide @nbits size long bitmaps for value and mask so that * x = (val & mask) | (x & ~mask) would modify any @nbits sized bitmap x * the same way ethnl_update_bitset() with the same bitset attribute would. * * Return: negative error code on failure, 0 on success */ int ethnl_parse_bitset(unsigned long *val, unsigned long *mask, unsigned int nbits, const struct nlattr *attr, ethnl_string_array_t names, struct netlink_ext_ack *extack) { struct nlattr *tb[ARRAY_SIZE(bitset_policy)]; const struct nlattr *bit_attr; bool no_mask; int rem; int ret; if (!attr) return 0; ret = nla_parse_nested(tb, ARRAY_SIZE(bitset_policy) - 1, attr, bitset_policy, extack); if (ret < 0) return ret; no_mask = tb[ETHTOOL_A_BITSET_NOMASK]; if (!tb[ETHTOOL_A_BITSET_BITS]) { unsigned int change_bits; ret = ethnl_compact_sanity_checks(nbits, attr, tb, extack); if (ret < 0) return ret; change_bits = nla_get_u32(tb[ETHTOOL_A_BITSET_SIZE]); if (change_bits > nbits) change_bits = nbits; bitmap_from_arr32(val, nla_data(tb[ETHTOOL_A_BITSET_VALUE]), change_bits); if (change_bits < nbits) bitmap_clear(val, change_bits, nbits - change_bits); if (no_mask) { bitmap_fill(mask, nbits); } else { bitmap_from_arr32(mask, nla_data(tb[ETHTOOL_A_BITSET_MASK]), change_bits); if (change_bits < nbits) bitmap_clear(mask, change_bits, nbits - change_bits); } return 0; } if (tb[ETHTOOL_A_BITSET_VALUE]) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_VALUE], "value only allowed in compact bitset"); return -EINVAL; } if (tb[ETHTOOL_A_BITSET_MASK]) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_BITSET_MASK], "mask only allowed in compact bitset"); return -EINVAL; } bitmap_zero(val, nbits); if (no_mask) bitmap_fill(mask, nbits); else bitmap_zero(mask, nbits); nla_for_each_nested(bit_attr, tb[ETHTOOL_A_BITSET_BITS], rem) { unsigned int idx; bool bit_val; ret = ethnl_parse_bit(&idx, &bit_val, nbits, bit_attr, no_mask, names, extack); if (ret < 0) return ret; if (bit_val) __set_bit(idx, val); if (!no_mask) __set_bit(idx, mask); } return 0; } #if BITS_PER_LONG == 64 && defined(__BIG_ENDIAN) /* 64-bit big endian architectures are the only case when u32 based bitmaps * and unsigned long based bitmaps have different memory layout so that we * cannot simply cast the latter to the former and need actual wrappers * converting the latter to the former. * * To reduce the number of slab allocations, the wrappers use fixed size local * variables for bitmaps up to ETHNL_SMALL_BITMAP_BITS bits which is the * majority of bitmaps used by ethtool. */ #define ETHNL_SMALL_BITMAP_BITS 128 #define ETHNL_SMALL_BITMAP_WORDS DIV_ROUND_UP(ETHNL_SMALL_BITMAP_BITS, 32) int ethnl_bitset_size(const unsigned long *val, const unsigned long *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { u32 small_mask32[ETHNL_SMALL_BITMAP_WORDS]; u32 small_val32[ETHNL_SMALL_BITMAP_WORDS]; u32 *mask32; u32 *val32; int ret; if (nbits > ETHNL_SMALL_BITMAP_BITS) { unsigned int nwords = DIV_ROUND_UP(nbits, 32); val32 = kmalloc_array(2 * nwords, sizeof(u32), GFP_KERNEL); if (!val32) return -ENOMEM; mask32 = val32 + nwords; } else { val32 = small_val32; mask32 = small_mask32; } bitmap_to_arr32(val32, val, nbits); if (mask) bitmap_to_arr32(mask32, mask, nbits); else mask32 = NULL; ret = ethnl_bitset32_size(val32, mask32, nbits, names, compact); if (nbits > ETHNL_SMALL_BITMAP_BITS) kfree(val32); return ret; } int ethnl_put_bitset(struct sk_buff *skb, int attrtype, const unsigned long *val, const unsigned long *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { u32 small_mask32[ETHNL_SMALL_BITMAP_WORDS]; u32 small_val32[ETHNL_SMALL_BITMAP_WORDS]; u32 *mask32; u32 *val32; int ret; if (nbits > ETHNL_SMALL_BITMAP_BITS) { unsigned int nwords = DIV_ROUND_UP(nbits, 32); val32 = kmalloc_array(2 * nwords, sizeof(u32), GFP_KERNEL); if (!val32) return -ENOMEM; mask32 = val32 + nwords; } else { val32 = small_val32; mask32 = small_mask32; } bitmap_to_arr32(val32, val, nbits); if (mask) bitmap_to_arr32(mask32, mask, nbits); else mask32 = NULL; ret = ethnl_put_bitset32(skb, attrtype, val32, mask32, nbits, names, compact); if (nbits > ETHNL_SMALL_BITMAP_BITS) kfree(val32); return ret; } int ethnl_update_bitset(unsigned long *bitmap, unsigned int nbits, const struct nlattr *attr, ethnl_string_array_t names, struct netlink_ext_ack *extack, bool *mod) { u32 small_bitmap32[ETHNL_SMALL_BITMAP_WORDS]; u32 *bitmap32 = small_bitmap32; bool u32_mod = false; int ret; if (nbits > ETHNL_SMALL_BITMAP_BITS) { unsigned int dst_words = DIV_ROUND_UP(nbits, 32); bitmap32 = kmalloc_array(dst_words, sizeof(u32), GFP_KERNEL); if (!bitmap32) return -ENOMEM; } bitmap_to_arr32(bitmap32, bitmap, nbits); ret = ethnl_update_bitset32(bitmap32, nbits, attr, names, extack, &u32_mod); if (u32_mod) { bitmap_from_arr32(bitmap, bitmap32, nbits); *mod = true; } if (nbits > ETHNL_SMALL_BITMAP_BITS) kfree(bitmap32); return ret; } #else /* On little endian 64-bit and all 32-bit architectures, an unsigned long * based bitmap can be interpreted as u32 based one using a simple cast. */ int ethnl_bitset_size(const unsigned long *val, const unsigned long *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { return ethnl_bitset32_size((const u32 *)val, (const u32 *)mask, nbits, names, compact); } int ethnl_put_bitset(struct sk_buff *skb, int attrtype, const unsigned long *val, const unsigned long *mask, unsigned int nbits, ethnl_string_array_t names, bool compact) { return ethnl_put_bitset32(skb, attrtype, (const u32 *)val, (const u32 *)mask, nbits, names, compact); } int ethnl_update_bitset(unsigned long *bitmap, unsigned int nbits, const struct nlattr *attr, ethnl_string_array_t names, struct netlink_ext_ack *extack, bool *mod) { return ethnl_update_bitset32((u32 *)bitmap, nbits, attr, names, extack, mod); } #endif /* BITS_PER_LONG == 64 && defined(__BIG_ENDIAN) */ |
| 4 4 4 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 2 2 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 4 4 3 4 4 2 4 4 4 4 4 1 1 1 1 1 2 1 2 4 1 1 1 1 1 1 1 1 1 4 4 4 4 4 2 2 2 7 7 7 8 6 6 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 | // SPDX-License-Identifier: GPL-1.0+ /* generic HDLC line discipline for Linux * * Written by Paul Fulghum paulkf@microgate.com * for Microgate Corporation * * Microgate and SyncLink are registered trademarks of Microgate Corporation * * Adapted from ppp.c, written by Michael Callahan <callahan@maths.ox.ac.uk>, * Al Longyear <longyear@netcom.com>, * Paul Mackerras <Paul.Mackerras@cs.anu.edu.au> * * Original release 01/11/99 * * This module implements the tty line discipline N_HDLC for use with * tty device drivers that support bit-synchronous HDLC communications. * * All HDLC data is frame oriented which means: * * 1. tty write calls represent one complete transmit frame of data * The device driver should accept the complete frame or none of * the frame (busy) in the write method. Each write call should have * a byte count in the range of 2-65535 bytes (2 is min HDLC frame * with 1 addr byte and 1 ctrl byte). The max byte count of 65535 * should include any crc bytes required. For example, when using * CCITT CRC32, 4 crc bytes are required, so the maximum size frame * the application may transmit is limited to 65531 bytes. For CCITT * CRC16, the maximum application frame size would be 65533. * * * 2. receive callbacks from the device driver represents * one received frame. The device driver should bypass * the tty flip buffer and call the line discipline receive * callback directly to avoid fragmenting or concatenating * multiple frames into a single receive callback. * * The HDLC line discipline queues the receive frames in separate * buffers so complete receive frames can be returned by the * tty read calls. * * 3. tty read calls returns an entire frame of data or nothing. * * 4. all send and receive data is considered raw. No processing * or translation is performed by the line discipline, regardless * of the tty flags * * 5. When line discipline is queried for the amount of receive * data available (FIOC), 0 is returned if no data available, * otherwise the count of the next available frame is returned. * (instead of the sum of all received frame counts). * * These conventions allow the standard tty programming interface * to be used for synchronous HDLC applications when used with * this line discipline (or another line discipline that is frame * oriented such as N_PPP). * * The SyncLink driver (synclink.c) implements both asynchronous * (using standard line discipline N_TTY) and synchronous HDLC * (using N_HDLC) communications, with the latter using the above * conventions. * * This implementation is very basic and does not maintain * any statistics. The main point is to enforce the raw data * and frame orientation of HDLC communications. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. */ #include <linux/module.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/fcntl.h> #include <linux/interrupt.h> #include <linux/ptrace.h> #include <linux/poll.h> #include <linux/in.h> #include <linux/ioctl.h> #include <linux/slab.h> #include <linux/tty.h> #include <linux/errno.h> #include <linux/string.h> /* used in new tty drivers */ #include <linux/signal.h> /* used in new tty drivers */ #include <linux/if.h> #include <linux/bitops.h> #include <linux/uaccess.h> #include "tty.h" /* * Buffers for individual HDLC frames */ #define MAX_HDLC_FRAME_SIZE 65535 #define DEFAULT_RX_BUF_COUNT 10 #define MAX_RX_BUF_COUNT 60 #define DEFAULT_TX_BUF_COUNT 3 struct n_hdlc_buf { struct list_head list_item; size_t count; u8 buf[]; }; struct n_hdlc_buf_list { struct list_head list; int count; spinlock_t spinlock; }; /** * struct n_hdlc - per device instance data structure * @tbusy: reentrancy flag for tx wakeup code * @woke_up: tx wakeup needs to be run again as it was called while @tbusy * @tx_buf_list: list of pending transmit frame buffers * @rx_buf_list: list of received frame buffers * @tx_free_buf_list: list unused transmit frame buffers * @rx_free_buf_list: list unused received frame buffers */ struct n_hdlc { bool tbusy; bool woke_up; struct n_hdlc_buf_list tx_buf_list; struct n_hdlc_buf_list rx_buf_list; struct n_hdlc_buf_list tx_free_buf_list; struct n_hdlc_buf_list rx_free_buf_list; struct work_struct write_work; struct tty_struct *tty_for_write_work; }; /* * HDLC buffer list manipulation functions */ static void n_hdlc_buf_return(struct n_hdlc_buf_list *buf_list, struct n_hdlc_buf *buf); static void n_hdlc_buf_put(struct n_hdlc_buf_list *list, struct n_hdlc_buf *buf); static struct n_hdlc_buf *n_hdlc_buf_get(struct n_hdlc_buf_list *list); /* Local functions */ static struct n_hdlc *n_hdlc_alloc(void); static void n_hdlc_tty_write_work(struct work_struct *work); /* max frame size for memory allocations */ static int maxframe = 4096; static void flush_rx_queue(struct tty_struct *tty) { struct n_hdlc *n_hdlc = tty->disc_data; struct n_hdlc_buf *buf; while ((buf = n_hdlc_buf_get(&n_hdlc->rx_buf_list))) n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, buf); } static void flush_tx_queue(struct tty_struct *tty) { struct n_hdlc *n_hdlc = tty->disc_data; struct n_hdlc_buf *buf; while ((buf = n_hdlc_buf_get(&n_hdlc->tx_buf_list))) n_hdlc_buf_put(&n_hdlc->tx_free_buf_list, buf); } static void n_hdlc_free_buf_list(struct n_hdlc_buf_list *list) { struct n_hdlc_buf *buf; do { buf = n_hdlc_buf_get(list); kfree(buf); } while (buf); } /** * n_hdlc_tty_close - line discipline close * @tty: pointer to tty info structure * * Called when the line discipline is changed to something * else, the tty is closed, or the tty detects a hangup. */ static void n_hdlc_tty_close(struct tty_struct *tty) { struct n_hdlc *n_hdlc = tty->disc_data; #if defined(TTY_NO_WRITE_SPLIT) clear_bit(TTY_NO_WRITE_SPLIT, &tty->flags); #endif tty->disc_data = NULL; /* Ensure that the n_hdlcd process is not hanging on select()/poll() */ wake_up_interruptible(&tty->read_wait); wake_up_interruptible(&tty->write_wait); cancel_work_sync(&n_hdlc->write_work); n_hdlc_free_buf_list(&n_hdlc->rx_free_buf_list); n_hdlc_free_buf_list(&n_hdlc->tx_free_buf_list); n_hdlc_free_buf_list(&n_hdlc->rx_buf_list); n_hdlc_free_buf_list(&n_hdlc->tx_buf_list); kfree(n_hdlc); } /* end of n_hdlc_tty_close() */ /** * n_hdlc_tty_open - called when line discipline changed to n_hdlc * @tty: pointer to tty info structure * * Returns 0 if success, otherwise error code */ static int n_hdlc_tty_open(struct tty_struct *tty) { struct n_hdlc *n_hdlc = tty->disc_data; pr_debug("%s() called (device=%s)\n", __func__, tty->name); /* There should not be an existing table for this slot. */ if (n_hdlc) { pr_err("%s: tty already associated!\n", __func__); return -EEXIST; } n_hdlc = n_hdlc_alloc(); if (!n_hdlc) { pr_err("%s: n_hdlc_alloc failed\n", __func__); return -ENFILE; } INIT_WORK(&n_hdlc->write_work, n_hdlc_tty_write_work); n_hdlc->tty_for_write_work = tty; tty->disc_data = n_hdlc; tty->receive_room = 65536; /* change tty_io write() to not split large writes into 8K chunks */ set_bit(TTY_NO_WRITE_SPLIT, &tty->flags); /* flush receive data from driver */ tty_driver_flush_buffer(tty); return 0; } /* end of n_tty_hdlc_open() */ /** * n_hdlc_send_frames - send frames on pending send buffer list * @n_hdlc: pointer to ldisc instance data * @tty: pointer to tty instance data * * Send frames on pending send buffer list until the driver does not accept a * frame (busy) this function is called after adding a frame to the send buffer * list and by the tty wakeup callback. */ static void n_hdlc_send_frames(struct n_hdlc *n_hdlc, struct tty_struct *tty) { unsigned long flags; struct n_hdlc_buf *tbuf; ssize_t actual; check_again: spin_lock_irqsave(&n_hdlc->tx_buf_list.spinlock, flags); if (n_hdlc->tbusy) { n_hdlc->woke_up = true; spin_unlock_irqrestore(&n_hdlc->tx_buf_list.spinlock, flags); return; } n_hdlc->tbusy = true; n_hdlc->woke_up = false; spin_unlock_irqrestore(&n_hdlc->tx_buf_list.spinlock, flags); tbuf = n_hdlc_buf_get(&n_hdlc->tx_buf_list); while (tbuf) { pr_debug("sending frame %p, count=%zu\n", tbuf, tbuf->count); /* Send the next block of data to device */ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); actual = tty->ops->write(tty, tbuf->buf, tbuf->count); /* rollback was possible and has been done */ if (actual == -ERESTARTSYS) { n_hdlc_buf_return(&n_hdlc->tx_buf_list, tbuf); break; } /* if transmit error, throw frame away by */ /* pretending it was accepted by driver */ if (actual < 0) actual = tbuf->count; if (actual == tbuf->count) { pr_debug("frame %p completed\n", tbuf); /* free current transmit buffer */ n_hdlc_buf_put(&n_hdlc->tx_free_buf_list, tbuf); /* wait up sleeping writers */ wake_up_interruptible(&tty->write_wait); /* get next pending transmit buffer */ tbuf = n_hdlc_buf_get(&n_hdlc->tx_buf_list); } else { pr_debug("frame %p pending\n", tbuf); /* * the buffer was not accepted by driver, * return it back into tx queue */ n_hdlc_buf_return(&n_hdlc->tx_buf_list, tbuf); break; } } if (!tbuf) clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); /* Clear the re-entry flag */ spin_lock_irqsave(&n_hdlc->tx_buf_list.spinlock, flags); n_hdlc->tbusy = false; spin_unlock_irqrestore(&n_hdlc->tx_buf_list.spinlock, flags); if (n_hdlc->woke_up) goto check_again; } /* end of n_hdlc_send_frames() */ /** * n_hdlc_tty_write_work - Asynchronous callback for transmit wakeup * @work: pointer to work_struct * * Called when low level device driver can accept more send data. */ static void n_hdlc_tty_write_work(struct work_struct *work) { struct n_hdlc *n_hdlc = container_of(work, struct n_hdlc, write_work); struct tty_struct *tty = n_hdlc->tty_for_write_work; n_hdlc_send_frames(n_hdlc, tty); } /* end of n_hdlc_tty_write_work() */ /** * n_hdlc_tty_wakeup - Callback for transmit wakeup * @tty: pointer to associated tty instance data * * Called when low level device driver can accept more send data. */ static void n_hdlc_tty_wakeup(struct tty_struct *tty) { struct n_hdlc *n_hdlc = tty->disc_data; schedule_work(&n_hdlc->write_work); } /* end of n_hdlc_tty_wakeup() */ /** * n_hdlc_tty_receive - Called by tty driver when receive data is available * @tty: pointer to tty instance data * @data: pointer to received data * @flags: pointer to flags for data * @count: count of received data in bytes * * Called by tty low level driver when receive data is available. Data is * interpreted as one HDLC frame. */ static void n_hdlc_tty_receive(struct tty_struct *tty, const u8 *data, const u8 *flags, size_t count) { register struct n_hdlc *n_hdlc = tty->disc_data; register struct n_hdlc_buf *buf; pr_debug("%s() called count=%zu\n", __func__, count); if (count > maxframe) { pr_debug("rx count>maxframesize, data discarded\n"); return; } /* get a free HDLC buffer */ buf = n_hdlc_buf_get(&n_hdlc->rx_free_buf_list); if (!buf) { /* * no buffers in free list, attempt to allocate another rx * buffer unless the maximum count has been reached */ if (n_hdlc->rx_buf_list.count < MAX_RX_BUF_COUNT) buf = kmalloc(struct_size(buf, buf, maxframe), GFP_ATOMIC); } if (!buf) { pr_debug("no more rx buffers, data discarded\n"); return; } /* copy received data to HDLC buffer */ memcpy(buf->buf, data, count); buf->count = count; /* add HDLC buffer to list of received frames */ n_hdlc_buf_put(&n_hdlc->rx_buf_list, buf); /* wake up any blocked reads and perform async signalling */ wake_up_interruptible(&tty->read_wait); if (tty->fasync != NULL) kill_fasync(&tty->fasync, SIGIO, POLL_IN); } /* end of n_hdlc_tty_receive() */ /** * n_hdlc_tty_read - Called to retrieve one frame of data (if available) * @tty: pointer to tty instance data * @file: pointer to open file object * @kbuf: pointer to returned data buffer * @nr: size of returned data buffer * @cookie: stored rbuf from previous run * @offset: offset into the data buffer * * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, u8 *kbuf, size_t nr, void **cookie, unsigned long offset) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; struct n_hdlc_buf *rbuf; DECLARE_WAITQUEUE(wait, current); /* Is this a repeated call for an rbuf we already found earlier? */ rbuf = *cookie; if (rbuf) goto have_rbuf; add_wait_queue(&tty->read_wait, &wait); for (;;) { if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) { ret = -EIO; break; } if (tty_hung_up_p(file)) break; set_current_state(TASK_INTERRUPTIBLE); rbuf = n_hdlc_buf_get(&n_hdlc->rx_buf_list); if (rbuf) break; /* no data */ if (tty_io_nonblock(tty, file)) { ret = -EAGAIN; break; } schedule(); if (signal_pending(current)) { ret = -EINTR; break; } } remove_wait_queue(&tty->read_wait, &wait); __set_current_state(TASK_RUNNING); if (!rbuf) return ret; *cookie = rbuf; have_rbuf: /* Have we used it up entirely? */ if (offset >= rbuf->count) goto done_with_rbuf; /* More data to go, but can't copy any more? EOVERFLOW */ ret = -EOVERFLOW; if (!nr) goto done_with_rbuf; /* Copy as much data as possible */ ret = rbuf->count - offset; if (ret > nr) ret = nr; memcpy(kbuf, rbuf->buf+offset, ret); offset += ret; /* If we still have data left, we leave the rbuf in the cookie */ if (offset < rbuf->count) return ret; done_with_rbuf: *cookie = NULL; if (n_hdlc->rx_free_buf_list.count > DEFAULT_RX_BUF_COUNT) kfree(rbuf); else n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf); return ret; } /* end of n_hdlc_tty_read() */ /** * n_hdlc_tty_write - write a single frame of data to device * @tty: pointer to associated tty device instance data * @file: pointer to file object data * @data: pointer to transmit data (one frame) * @count: size of transmit frame in bytes * * Returns the number of bytes written (or error code). */ static ssize_t n_hdlc_tty_write(struct tty_struct *tty, struct file *file, const u8 *data, size_t count) { struct n_hdlc *n_hdlc = tty->disc_data; DECLARE_WAITQUEUE(wait, current); struct n_hdlc_buf *tbuf; ssize_t error = 0; pr_debug("%s() called count=%zd\n", __func__, count); /* verify frame size */ if (count > maxframe) { pr_debug("%s: truncating user packet from %zu to %d\n", __func__, count, maxframe); count = maxframe; } add_wait_queue(&tty->write_wait, &wait); for (;;) { set_current_state(TASK_INTERRUPTIBLE); tbuf = n_hdlc_buf_get(&n_hdlc->tx_free_buf_list); if (tbuf) break; if (tty_io_nonblock(tty, file)) { error = -EAGAIN; break; } schedule(); if (signal_pending(current)) { error = -EINTR; break; } } __set_current_state(TASK_RUNNING); remove_wait_queue(&tty->write_wait, &wait); if (!error) { /* Retrieve the user's buffer */ memcpy(tbuf->buf, data, count); /* Send the data */ tbuf->count = error = count; n_hdlc_buf_put(&n_hdlc->tx_buf_list, tbuf); n_hdlc_send_frames(n_hdlc, tty); } return error; } /* end of n_hdlc_tty_write() */ /** * n_hdlc_tty_ioctl - process IOCTL system call for the tty device. * @tty: pointer to tty instance data * @cmd: IOCTL command code * @arg: argument for IOCTL call (cmd dependent) * * Returns command dependent result. */ static int n_hdlc_tty_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) { struct n_hdlc *n_hdlc = tty->disc_data; int error = 0; int count; unsigned long flags; struct n_hdlc_buf *buf = NULL; pr_debug("%s() called %d\n", __func__, cmd); switch (cmd) { case FIONREAD: /* report count of read data available */ /* in next available frame (if any) */ spin_lock_irqsave(&n_hdlc->rx_buf_list.spinlock, flags); buf = list_first_entry_or_null(&n_hdlc->rx_buf_list.list, struct n_hdlc_buf, list_item); if (buf) count = buf->count; else count = 0; spin_unlock_irqrestore(&n_hdlc->rx_buf_list.spinlock, flags); error = put_user(count, (int __user *)arg); break; case TIOCOUTQ: /* get the pending tx byte count in the driver */ count = tty_chars_in_buffer(tty); /* add size of next output frame in queue */ spin_lock_irqsave(&n_hdlc->tx_buf_list.spinlock, flags); buf = list_first_entry_or_null(&n_hdlc->tx_buf_list.list, struct n_hdlc_buf, list_item); if (buf) count += buf->count; spin_unlock_irqrestore(&n_hdlc->tx_buf_list.spinlock, flags); error = put_user(count, (int __user *)arg); break; case TCFLSH: switch (arg) { case TCIOFLUSH: case TCOFLUSH: flush_tx_queue(tty); } fallthrough; /* to default */ default: error = n_tty_ioctl_helper(tty, cmd, arg); break; } return error; } /* end of n_hdlc_tty_ioctl() */ /** * n_hdlc_tty_poll - TTY callback for poll system call * @tty: pointer to tty instance data * @filp: pointer to open file object for device * @wait: wait queue for operations * * Determine which operations (read/write) will not block and return info * to caller. * Returns a bit mask containing info on which ops will not block. */ static __poll_t n_hdlc_tty_poll(struct tty_struct *tty, struct file *filp, poll_table *wait) { struct n_hdlc *n_hdlc = tty->disc_data; __poll_t mask = 0; /* * queue the current process into any wait queue that may awaken in the * future (read and write) */ poll_wait(filp, &tty->read_wait, wait); poll_wait(filp, &tty->write_wait, wait); /* set bits for operations that won't block */ if (!list_empty(&n_hdlc->rx_buf_list.list)) mask |= EPOLLIN | EPOLLRDNORM; /* readable */ if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) mask |= EPOLLHUP; if (tty_hung_up_p(filp)) mask |= EPOLLHUP; if (!tty_is_writelocked(tty) && !list_empty(&n_hdlc->tx_free_buf_list.list)) mask |= EPOLLOUT | EPOLLWRNORM; /* writable */ return mask; } /* end of n_hdlc_tty_poll() */ static void n_hdlc_alloc_buf(struct n_hdlc_buf_list *list, unsigned int count, const char *name) { struct n_hdlc_buf *buf; unsigned int i; for (i = 0; i < count; i++) { buf = kmalloc(struct_size(buf, buf, maxframe), GFP_KERNEL); if (!buf) { pr_debug("%s(), kmalloc() failed for %s buffer %u\n", __func__, name, i); return; } n_hdlc_buf_put(list, buf); } } /** * n_hdlc_alloc - allocate an n_hdlc instance data structure * * Returns a pointer to newly created structure if success, otherwise %NULL */ static struct n_hdlc *n_hdlc_alloc(void) { struct n_hdlc *n_hdlc = kzalloc(sizeof(*n_hdlc), GFP_KERNEL); if (!n_hdlc) return NULL; spin_lock_init(&n_hdlc->rx_free_buf_list.spinlock); spin_lock_init(&n_hdlc->tx_free_buf_list.spinlock); spin_lock_init(&n_hdlc->rx_buf_list.spinlock); spin_lock_init(&n_hdlc->tx_buf_list.spinlock); INIT_LIST_HEAD(&n_hdlc->rx_free_buf_list.list); INIT_LIST_HEAD(&n_hdlc->tx_free_buf_list.list); INIT_LIST_HEAD(&n_hdlc->rx_buf_list.list); INIT_LIST_HEAD(&n_hdlc->tx_buf_list.list); n_hdlc_alloc_buf(&n_hdlc->rx_free_buf_list, DEFAULT_RX_BUF_COUNT, "rx"); n_hdlc_alloc_buf(&n_hdlc->tx_free_buf_list, DEFAULT_TX_BUF_COUNT, "tx"); return n_hdlc; } /* end of n_hdlc_alloc() */ /** * n_hdlc_buf_return - put the HDLC buffer after the head of the specified list * @buf_list: pointer to the buffer list * @buf: pointer to the buffer */ static void n_hdlc_buf_return(struct n_hdlc_buf_list *buf_list, struct n_hdlc_buf *buf) { unsigned long flags; spin_lock_irqsave(&buf_list->spinlock, flags); list_add(&buf->list_item, &buf_list->list); buf_list->count++; spin_unlock_irqrestore(&buf_list->spinlock, flags); } /** * n_hdlc_buf_put - add specified HDLC buffer to tail of specified list * @buf_list: pointer to buffer list * @buf: pointer to buffer */ static void n_hdlc_buf_put(struct n_hdlc_buf_list *buf_list, struct n_hdlc_buf *buf) { unsigned long flags; spin_lock_irqsave(&buf_list->spinlock, flags); list_add_tail(&buf->list_item, &buf_list->list); buf_list->count++; spin_unlock_irqrestore(&buf_list->spinlock, flags); } /* end of n_hdlc_buf_put() */ /** * n_hdlc_buf_get - remove and return an HDLC buffer from list * @buf_list: pointer to HDLC buffer list * * Remove and return an HDLC buffer from the head of the specified HDLC buffer * list. * Returns a pointer to HDLC buffer if available, otherwise %NULL. */ static struct n_hdlc_buf *n_hdlc_buf_get(struct n_hdlc_buf_list *buf_list) { unsigned long flags; struct n_hdlc_buf *buf; spin_lock_irqsave(&buf_list->spinlock, flags); buf = list_first_entry_or_null(&buf_list->list, struct n_hdlc_buf, list_item); if (buf) { list_del(&buf->list_item); buf_list->count--; } spin_unlock_irqrestore(&buf_list->spinlock, flags); return buf; } /* end of n_hdlc_buf_get() */ static struct tty_ldisc_ops n_hdlc_ldisc = { .owner = THIS_MODULE, .num = N_HDLC, .name = "hdlc", .open = n_hdlc_tty_open, .close = n_hdlc_tty_close, .read = n_hdlc_tty_read, .write = n_hdlc_tty_write, .ioctl = n_hdlc_tty_ioctl, .poll = n_hdlc_tty_poll, .receive_buf = n_hdlc_tty_receive, .write_wakeup = n_hdlc_tty_wakeup, .flush_buffer = flush_rx_queue, }; static int __init n_hdlc_init(void) { int status; /* range check maxframe arg */ maxframe = clamp(maxframe, 4096, MAX_HDLC_FRAME_SIZE); status = tty_register_ldisc(&n_hdlc_ldisc); if (!status) pr_info("N_HDLC line discipline registered with maxframe=%d\n", maxframe); else pr_err("N_HDLC: error registering line discipline: %d\n", status); return status; } /* end of init_module() */ static void __exit n_hdlc_exit(void) { tty_unregister_ldisc(&n_hdlc_ldisc); } module_init(n_hdlc_init); module_exit(n_hdlc_exit); MODULE_DESCRIPTION("HDLC line discipline support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Paul Fulghum paulkf@microgate.com"); module_param(maxframe, int, 0); MODULE_ALIAS_LDISC(N_HDLC); |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Cryptographic scatter and gather helpers. * * Copyright (c) 2002 James Morris <jmorris@intercode.com.au> * Copyright (c) 2002 Adam J. Richter <adam@yggdrasil.com> * Copyright (c) 2004 Jean-Luc Cooke <jlcooke@certainkey.com> * Copyright (c) 2007 Herbert Xu <herbert@gondor.apana.org.au> */ #ifndef _CRYPTO_SCATTERWALK_H #define _CRYPTO_SCATTERWALK_H #include <linux/errno.h> #include <linux/highmem.h> #include <linux/mm.h> #include <linux/scatterlist.h> #include <linux/types.h> struct scatter_walk { /* Must be the first member, see struct skcipher_walk. */ union { void *const addr; /* Private API field, do not touch. */ union crypto_no_such_thing *__addr; }; struct scatterlist *sg; unsigned int offset; }; struct skcipher_walk { union { /* Virtual address of the source. */ struct { struct { const void *const addr; } virt; } src; /* Private field for the API, do not use. */ struct scatter_walk in; }; union { /* Virtual address of the destination. */ struct { struct { void *const addr; } virt; } dst; /* Private field for the API, do not use. */ struct scatter_walk out; }; unsigned int nbytes; unsigned int total; u8 *page; u8 *buffer; u8 *oiv; void *iv; unsigned int ivsize; int flags; unsigned int blocksize; unsigned int stride; unsigned int alignmask; }; static inline void scatterwalk_crypto_chain(struct scatterlist *head, struct scatterlist *sg, int num) { if (sg) sg_chain(head, num, sg); else sg_mark_end(head); } static inline void scatterwalk_start(struct scatter_walk *walk, struct scatterlist *sg) { walk->sg = sg; walk->offset = sg->offset; } /* * This is equivalent to scatterwalk_start(walk, sg) followed by * scatterwalk_skip(walk, pos). */ static inline void scatterwalk_start_at_pos(struct scatter_walk *walk, struct scatterlist *sg, unsigned int pos) { while (pos > sg->length) { pos -= sg->length; sg = sg_next(sg); } walk->sg = sg; walk->offset = sg->offset + pos; } static inline unsigned int scatterwalk_clamp(struct scatter_walk *walk, unsigned int nbytes) { unsigned int len_this_sg; unsigned int limit; if (walk->offset >= walk->sg->offset + walk->sg->length) scatterwalk_start(walk, sg_next(walk->sg)); len_this_sg = walk->sg->offset + walk->sg->length - walk->offset; /* * HIGHMEM case: the page may have to be mapped into memory. To avoid * the complexity of having to map multiple pages at once per sg entry, * clamp the returned length to not cross a page boundary. * * !HIGHMEM case: no mapping is needed; all pages of the sg entry are * already mapped contiguously in the kernel's direct map. For improved * performance, allow the walker to return data segments that cross a * page boundary. Do still cap the length to PAGE_SIZE, since some * users rely on that to avoid disabling preemption for too long when * using SIMD. It's also needed for when skcipher_walk uses a bounce * page due to the data not being aligned to the algorithm's alignmask. */ if (IS_ENABLED(CONFIG_HIGHMEM)) limit = PAGE_SIZE - offset_in_page(walk->offset); else limit = PAGE_SIZE; return min3(nbytes, len_this_sg, limit); } /* * Create a scatterlist that represents the remaining data in a walk. Uses * chaining to reference the original scatterlist, so this uses at most two * entries in @sg_out regardless of the number of entries in the original list. * Assumes that sg_init_table() was already done. */ static inline void scatterwalk_get_sglist(struct scatter_walk *walk, struct scatterlist sg_out[2]) { if (walk->offset >= walk->sg->offset + walk->sg->length) scatterwalk_start(walk, sg_next(walk->sg)); sg_set_page(sg_out, sg_page(walk->sg), walk->sg->offset + walk->sg->length - walk->offset, walk->offset); scatterwalk_crypto_chain(sg_out, sg_next(walk->sg), 2); } static inline void scatterwalk_map(struct scatter_walk *walk) { struct page *base_page = sg_page(walk->sg); unsigned int offset = walk->offset; void *addr; if (IS_ENABLED(CONFIG_HIGHMEM)) { struct page *page; page = nth_page(base_page, offset >> PAGE_SHIFT); offset = offset_in_page(offset); addr = kmap_local_page(page) + offset; } else { /* * When !HIGHMEM we allow the walker to return segments that * span a page boundary; see scatterwalk_clamp(). To make it * clear that in this case we're working in the linear buffer of * the whole sg entry in the kernel's direct map rather than * within the mapped buffer of a single page, compute the * address as an offset from the page_address() of the first * page of the sg entry. Either way the result is the address * in the direct map, but this makes it clearer what is really * going on. */ addr = page_address(base_page) + offset; } walk->__addr = addr; } /** * scatterwalk_next() - Get the next data buffer in a scatterlist walk * @walk: the scatter_walk * @total: the total number of bytes remaining, > 0 * * A virtual address for the next segment of data from the scatterlist will * be placed into @walk->addr. The caller must call scatterwalk_done_src() * or scatterwalk_done_dst() when it is done using this virtual address. * * Returns: the next number of bytes available, <= @total */ static inline unsigned int scatterwalk_next(struct scatter_walk *walk, unsigned int total) { unsigned int nbytes = scatterwalk_clamp(walk, total); scatterwalk_map(walk); return nbytes; } static inline void scatterwalk_unmap(struct scatter_walk *walk) { if (IS_ENABLED(CONFIG_HIGHMEM)) kunmap_local(walk->__addr); } static inline void scatterwalk_advance(struct scatter_walk *walk, unsigned int nbytes) { walk->offset += nbytes; } /** * scatterwalk_done_src() - Finish one step of a walk of source scatterlist * @walk: the scatter_walk * @nbytes: the number of bytes processed this step, less than or equal to the * number of bytes that scatterwalk_next() returned. * * Use this if the mapped address was not written to, i.e. it is source data. */ static inline void scatterwalk_done_src(struct scatter_walk *walk, unsigned int nbytes) { scatterwalk_unmap(walk); scatterwalk_advance(walk, nbytes); } /** * scatterwalk_done_dst() - Finish one step of a walk of destination scatterlist * @walk: the scatter_walk * @nbytes: the number of bytes processed this step, less than or equal to the * number of bytes that scatterwalk_next() returned. * * Use this if the mapped address may have been written to, i.e. it is * destination data. */ static inline void scatterwalk_done_dst(struct scatter_walk *walk, unsigned int nbytes) { scatterwalk_unmap(walk); /* * Explicitly check ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE instead of just * relying on flush_dcache_page() being a no-op when not implemented, * since otherwise the BUG_ON in sg_page() does not get optimized out. * This also avoids having to consider whether the loop would get * reliably optimized out or not. */ if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE) { struct page *base_page; unsigned int offset; int start, end, i; base_page = sg_page(walk->sg); offset = walk->offset; start = offset >> PAGE_SHIFT; end = start + (nbytes >> PAGE_SHIFT); end += (offset_in_page(offset) + offset_in_page(nbytes) + PAGE_SIZE - 1) >> PAGE_SHIFT; for (i = start; i < end; i++) flush_dcache_page(nth_page(base_page, i)); } scatterwalk_advance(walk, nbytes); } void scatterwalk_skip(struct scatter_walk *walk, unsigned int nbytes); void memcpy_from_scatterwalk(void *buf, struct scatter_walk *walk, unsigned int nbytes); void memcpy_to_scatterwalk(struct scatter_walk *walk, const void *buf, unsigned int nbytes); void memcpy_from_sglist(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes); void memcpy_to_sglist(struct scatterlist *sg, unsigned int start, const void *buf, unsigned int nbytes); void memcpy_sglist(struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes); /* In new code, please use memcpy_{from,to}_sglist() directly instead. */ static inline void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes, int out) { if (out) memcpy_to_sglist(sg, start, buf, nbytes); else memcpy_from_sglist(buf, sg, start, nbytes); } struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2], struct scatterlist *src, unsigned int len); int skcipher_walk_first(struct skcipher_walk *walk, bool atomic); int skcipher_walk_done(struct skcipher_walk *walk, int res); static inline void skcipher_walk_abort(struct skcipher_walk *walk) { skcipher_walk_done(walk, -ECANCELED); } #endif /* _CRYPTO_SCATTERWALK_H */ |
| 27 27 27 26 6 6 1 25 26 26 26 1 27 2 26 14 3 13 27 27 26 27 27 27 27 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 | // SPDX-License-Identifier: GPL-2.0-or-later /* * HID quirks support for Linux * * Copyright (c) 1999 Andreas Gal * Copyright (c) 2000-2005 Vojtech Pavlik <vojtech@suse.cz> * Copyright (c) 2005 Michael Haboustak <mike-@cinci.rr.com> for Concept2, Inc * Copyright (c) 2006-2007 Jiri Kosina * Copyright (c) 2007 Paul Walmsley */ /* */ #include <linux/hid.h> #include <linux/export.h> #include <linux/slab.h> #include <linux/mutex.h> #include <linux/input/elan-i2c-ids.h> #include "hid-ids.h" /* * Alphabetically sorted by vendor then product. */ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_AASHIMA, USB_DEVICE_ID_AASHIMA_GAMEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_AASHIMA, USB_DEVICE_ID_AASHIMA_PREDATOR), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_ADATA_XPG, USB_VENDOR_ID_ADATA_XPG_WL_GAMING_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_ADATA_XPG, USB_VENDOR_ID_ADATA_XPG_WL_GAMING_MOUSE_DONGLE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_AFATECH, USB_DEVICE_ID_AFATECH_AF9016), HID_QUIRK_FULLSPEED_INTERVAL }, { HID_USB_DEVICE(USB_VENDOR_ID_AIREN, USB_DEVICE_ID_AIREN_SLIMPLUS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_AKAI_09E8, USB_DEVICE_ID_AKAI_09E8_MIDIMIX), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_AKAI, USB_DEVICE_ID_AKAI_MPKMINI2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_ALPS, USB_DEVICE_ID_IBM_GAMEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_AMI, USB_DEVICE_ID_AMI_VIRT_KEYBOARD_AND_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_REVB_ANSI), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_2PORTKVM), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_4PORTKVMC), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_4PORTKVM), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_CS124U), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_CS1758), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_CS682), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_CS692), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ATEN, USB_DEVICE_ID_ATEN_UC100KM), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_MULTI_TOUCH), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE2), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_CHIC, USB_DEVICE_ID_CHIC_GAMEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_3AXIS_5BUTTON_STICK), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_AXIS_295), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_COMBATSTICK), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_FIGHTERSTICK), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_FLIGHT_SIM_ECLIPSE_YOKE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_FLIGHT_SIM_YOKE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_PEDALS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_THROTTLE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K65RGB_RAPIDFIRE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K70RGB), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K70RGB_RAPIDFIRE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K70R), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K95RGB), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_M65RGB), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_GLAIVE_RGB), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_SCIMITAR_PRO_RGB), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_STRAFE), HID_QUIRK_NO_INIT_REPORTS | HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_CREATIVELABS, USB_DEVICE_ID_CREATIVE_SB_OMNI_SURROUND_51), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_DELL, USB_DEVICE_ID_DELL_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_DELL, USB_DEVICE_ID_DELL_PRO_WIRELESS_KM5221W), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_DMI, USB_DEVICE_ID_DMI_ENC), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_2NES2SNES), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_4NES4SNES), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_REDRAGON_SEYMUR2), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_DOLPHINBAR), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_GAMECUBE1), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_GAMECUBE3), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_PS3), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_WIIU), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_DWAV, USB_DEVICE_ID_EGALAX_TOUCHCONTROLLER), HID_QUIRK_MULTI_INPUT | HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_ELAN, HID_ANY_ID), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_ELO, USB_DEVICE_ID_ELO_TS2700), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_EMS, USB_DEVICE_ID_EMS_TRIO_LINKER_PLUS_II), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_ETURBOTOUCH, USB_DEVICE_ID_ETURBOTOUCH_2968), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_ETURBOTOUCH, USB_DEVICE_ID_ETURBOTOUCH), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_FORMOSA, USB_DEVICE_ID_FORMOSA_IR_RECEIVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_FREESCALE, USB_DEVICE_ID_FREESCALE_MX28), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_FUTABA, USB_DEVICE_ID_LED_DISPLAY), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_SAT_ADAPTOR), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_USB_JOYPAD), HID_QUIRK_MULTI_INPUT }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_GAMEVICE, USB_DEVICE_ID_GAMEVICE_GV186), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_GAMEVICE, USB_DEVICE_ID_GAMEVICE_KISHI), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_DRIVING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FIGHTING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FLYING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_KEYBOARD_A096), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_KEYBOARD_A293), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0A4A), HID_QUIRK_ALWAYS_POLL }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_ELITE_PRESENTER_MOUSE_464A), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0B4A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_094A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_0941), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_0641), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_1f4a), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_IDEACOM, USB_DEVICE_ID_IDEACOM_IDC6680), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_INNOMEDIA, USB_DEVICE_ID_INNEX_GENESIS_ATARI), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE_ID2), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M406), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M506), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_I405X), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_MOUSEPEN_I608X), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M406W), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M610X), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_340), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_PENSKETCH_M912), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_MOUSEPEN_M508WX), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_MOUSEPEN_M508X), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_EASYPEN_M406XE), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_MOUSEPEN_I608X_V2), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_PENSKETCH_T609A), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_LABTEC, USB_DEVICE_ID_LABTEC_ODDOR_HANDBRAKE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_OPTICAL_USB_MOUSE_600E), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_608D), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_6019), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_602E), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_6093), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_C007), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_C077), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_KEYBOARD_G710_PLUS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C01A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C05A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C06A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MCS, USB_DEVICE_ID_MCS_GAMEPADBLOCK), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_MOUSE_0783), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PIXART_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_POWER_COVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_SURFACE3_COVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_SURFACE_PRO_2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MOJO, USB_DEVICE_ID_RETRO_ADAPTER), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_MSI, USB_DEVICE_ID_MSI_GT683R_LED_PANEL), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_MULTIPLE_1781, USB_DEVICE_ID_RAPHNET_4NES4SNES_OLD), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_NATSU, USB_DEVICE_ID_NATSU_GAMEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_NEC, USB_DEVICE_ID_NEC_USB_GAME_PAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_NEXIO, USB_DEVICE_ID_NEXIO_MULTITOUCH_PTI0750), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_NEXTWINDOW, USB_DEVICE_ID_NEXTWINDOW_TOUCHSCREEN), HID_QUIRK_MULTI_INPUT}, { HID_USB_DEVICE(USB_VENDOR_ID_NOVATEK, USB_DEVICE_ID_NOVATEK_MOUSE), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_DUOSENSE), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PANTHERLORD, USB_DEVICE_ID_PANTHERLORD_TWIN_USB_JOYSTICK), HID_QUIRK_MULTI_INPUT | HID_QUIRK_SKIP_OUTPUT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1610), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_1640), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_PI_ENGINEERING, USB_DEVICE_ID_PI_ENGINEERING_VEC_USB_FOOTPEDAL), HID_QUIRK_HIDINPUT_FORCE }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_OPTICAL_TOUCH_SCREEN1), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_OPTICAL_TOUCH_SCREEN2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_OPTICAL_TOUCH_SCREEN), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_PIXART, USB_DEVICE_ID_PIXART_USB_OPTICAL_MOUSE), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4D22), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_MOUSE_4E2A), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D0F), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4D65), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_PIXART_MOUSE_4E22), HID_QUIRK_ALWAYS_POLL }, { HID_USB_DEVICE(USB_VENDOR_ID_PRODIGE, USB_DEVICE_ID_PRODIGE_CORDLESS), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_QUANTA, USB_DEVICE_ID_QUANTA_OPTICAL_TOUCH_3001), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_QUANTA, USB_DEVICE_ID_QUANTA_OPTICAL_TOUCH_3003), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_QUANTA, USB_DEVICE_ID_QUANTA_OPTICAL_TOUCH_3008), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_REALTEK, USB_DEVICE_ID_REALTEK_READER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_RETROUSB, USB_DEVICE_ID_RETROUSB_SNES_RETROPAD), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_RETROUSB, USB_DEVICE_ID_RETROUSB_SNES_RETROPORT), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_RUMBLEPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_X52), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_X52_2), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_X52_PRO), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_X65), HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_SEMICO, USB_DEVICE_ID_SEMICO_USB_KEYKOARD2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SEMICO, USB_DEVICE_ID_SEMICO_USB_KEYKOARD), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SENNHEISER, USB_DEVICE_ID_SENNHEISER_BTD500USB), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SIGMA_MICRO, USB_DEVICE_ID_SIGMA_MICRO_KEYBOARD), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SIGMATEL, USB_DEVICE_ID_SIGMATEL_STMP3780), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SIS_TOUCH, USB_DEVICE_ID_SIS1030_TOUCH), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SIS_TOUCH, USB_DEVICE_ID_SIS817_TOUCH), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SIS_TOUCH, USB_DEVICE_ID_SIS9200_TOUCH), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SIS_TOUCH, USB_DEVICE_ID_SIS_TS), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SUN, USB_DEVICE_ID_RARITAN_KVM_DONGLE), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SYMBOL, USB_DEVICE_ID_SYMBOL_SCANNER_1), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SYMBOL, USB_DEVICE_ID_SYMBOL_SCANNER_2), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_HD), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_LTS1), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_LTS2), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_QUAD_HD), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_TP_V103), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_DELL_K12A), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_DELL_K15A), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_TOPMAX, USB_DEVICE_ID_TOPMAX_COBRAPAD), HID_QUIRK_BADPAD }, { HID_USB_DEVICE(USB_VENDOR_ID_TOUCHPACK, USB_DEVICE_ID_TOUCHPACK_RTS), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_TPV, USB_DEVICE_ID_TPV_OPTICAL_TOUCHSCREEN_8882), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_TPV, USB_DEVICE_ID_TPV_OPTICAL_TOUCHSCREEN_8883), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_TURBOX_KEYBOARD), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_UCLOGIC, USB_DEVICE_ID_UCLOGIC_TABLET_KNA5), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_UCLOGIC, USB_DEVICE_ID_UCLOGIC_TABLET_TWA60), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_UGTIZER, USB_DEVICE_ID_UGTIZER_TABLET_WP5540), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_WALTOP, USB_DEVICE_ID_WALTOP_MEDIA_TABLET_10_6_INCH), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_WALTOP, USB_DEVICE_ID_WALTOP_MEDIA_TABLET_14_1_INCH), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_WALTOP, USB_DEVICE_ID_WALTOP_SIRIUS_BATTERY_FREE_TABLET), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP_LTD2, USB_DEVICE_ID_SMARTJOY_DUAL_PLUS), HID_QUIRK_NOGET | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_QUAD_USB_JOYPAD), HID_QUIRK_NOGET | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_XIN_MO, USB_DEVICE_ID_XIN_MO_DUAL_ARCADE), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_GROUP_AUDIO), HID_QUIRK_NOGET }, { 0 } }; /* * A list of devices for which there is a specialized driver on HID bus. * * Please note that for multitouch devices (driven by hid-multitouch driver), * there is a proper autodetection and autoloading in place (based on presence * of HID_DG_CONTACTID), so those devices don't need to be added to this list, * as we are doing the right thing in hid_scan_usage(). * * Autodetection for (USB) HID sensor hubs exists too. If a collection of type * physical is found inside a usage page of type sensor, hid-sensor-hub will be * used as a driver. See hid_scan_report(). */ static const struct hid_device_id hid_have_special_driver[] = { #if IS_ENABLED(CONFIG_HID_A4TECH) { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_WCP32PU) }, { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_X5_005D) }, { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_RP_649) }, { HID_USB_DEVICE(USB_VENDOR_ID_A4TECH, USB_DEVICE_ID_A4TECH_NB_95) }, #endif #if IS_ENABLED(CONFIG_HID_ACCUTOUCH) { HID_USB_DEVICE(USB_VENDOR_ID_ELO, USB_DEVICE_ID_ELO_ACCUTOUCH_2216) }, #endif #if IS_ENABLED(CONFIG_HID_ACRUX) { HID_USB_DEVICE(USB_VENDOR_ID_ACRUX, 0x0802) }, { HID_USB_DEVICE(USB_VENDOR_ID_ACRUX, 0xf705) }, #endif #if IS_ENABLED(CONFIG_HID_ALPS) { HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY, USB_VENDOR_ID_ALPS_JP, HID_DEVICE_ID_ALPS_U1_DUAL) }, #endif #if IS_ENABLED(CONFIG_HID_APPLE) { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MIGHTYMOUSE) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_MINI_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_MINI_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_MINI_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_JIS) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_ANSI) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_ISO) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_REVB_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_REVB_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_REVB_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J140K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J132) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J680) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J213) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J214K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J223) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J230K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J152F) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2009_ANSI) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2009_ISO) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2009_JIS) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ANSI) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_2015) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_TP_ONLY) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER1_TP_ONLY) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_2021) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_FINGERPRINT_2021) }, #endif #if IS_ENABLED(CONFIG_HID_APPLEIR) { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_IRCONTROL) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_IRCONTROL2) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_IRCONTROL3) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_IRCONTROL4) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_IRCONTROL5) }, #endif #if IS_ENABLED(CONFIG_HID_APPLETB_BL) { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_TOUCHBAR_BACKLIGHT) }, #endif #if IS_ENABLED(CONFIG_HID_APPLETB_KBD) { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_TOUCHBAR_DISPLAY) }, #endif #if IS_ENABLED(CONFIG_HID_ASUS) { HID_I2C_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_I2C_KEYBOARD) }, { HID_I2C_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_I2C_TOUCHPAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD1) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD2) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD3) }, { HID_USB_DEVICE(USB_VENDOR_ID_JESS, USB_DEVICE_ID_ASUS_MD_5112) }, { HID_USB_DEVICE(USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_ASUS_MD_5110) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100CHI_KEYBOARD) }, #endif #if IS_ENABLED(CONFIG_HID_AUREAL) { HID_USB_DEVICE(USB_VENDOR_ID_AUREAL, USB_DEVICE_ID_AUREAL_W01RN) }, #endif #if IS_ENABLED(CONFIG_HID_BELKIN) { HID_USB_DEVICE(USB_VENDOR_ID_BELKIN, USB_DEVICE_ID_FLIP_KVM) }, { HID_USB_DEVICE(USB_VENDOR_ID_LABTEC, USB_DEVICE_ID_LABTEC_WIRELESS_KEYBOARD) }, #endif #if IS_ENABLED(CONFIG_HID_BETOP_FF) { HID_USB_DEVICE(USB_VENDOR_ID_BETOP_2185BFM, 0x2208) }, { HID_USB_DEVICE(USB_VENDOR_ID_BETOP_2185PC, 0x5506) }, { HID_USB_DEVICE(USB_VENDOR_ID_BETOP_2185V2PC, 0x1850) }, { HID_USB_DEVICE(USB_VENDOR_ID_BETOP_2185V2BFM, 0x5500) }, #endif #if IS_ENABLED(CONFIG_HID_CHERRY) { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION_SOLAR) }, #endif #if IS_ENABLED(CONFIG_HID_CHICONY) { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_ASUS_AK1D) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_ACER_SWITCH12) }, #endif #if IS_ENABLED(CONFIG_HID_CMEDIA) { HID_USB_DEVICE(USB_VENDOR_ID_CMEDIA, USB_DEVICE_ID_CM6533) }, #endif #if IS_ENABLED(CONFIG_HID_CORSAIR) { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_K90) }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_GLAIVE_RGB) }, { HID_USB_DEVICE(USB_VENDOR_ID_CORSAIR, USB_DEVICE_ID_CORSAIR_SCIMITAR_PRO_RGB) }, #endif #if IS_ENABLED(CONFIG_HID_CP2112) { HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_CP2112) }, #endif #if IS_ENABLED(CONFIG_HID_CYPRESS) { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_BARCODE_1) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_BARCODE_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_BARCODE_3) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_BARCODE_4) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE) }, #endif #if IS_ENABLED(CONFIG_HID_DRAGONRISE) { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, 0x0006) }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, 0x0011) }, #endif #if IS_ENABLED(CONFIG_HID_ELAN) { HID_USB_DEVICE(USB_VENDOR_ID_ELAN, USB_DEVICE_ID_HP_X2_10_COVER) }, #endif #if IS_ENABLED(CONFIG_HID_ELECOM) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_BM084) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_XGL20DLBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_XT3URBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_XT3DRBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_XT4DRBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_DT1URBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_DT1DRBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_HT1URBK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_HT1DRBK_010D) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELECOM, USB_DEVICE_ID_ELECOM_M_HT1DRBK_011C) }, #endif #if IS_ENABLED(CONFIG_HID_ELO) { HID_USB_DEVICE(USB_VENDOR_ID_ELO, 0x0009) }, { HID_USB_DEVICE(USB_VENDOR_ID_ELO, 0x0030) }, #endif #if IS_ENABLED(CONFIG_HID_EMS_FF) { HID_USB_DEVICE(USB_VENDOR_ID_EMS, USB_DEVICE_ID_EMS_TRIO_LINKER_PLUS_II) }, #endif #if IS_ENABLED(CONFIG_HID_EZKEY) { HID_USB_DEVICE(USB_VENDOR_ID_EZKEY, USB_DEVICE_ID_BTC_8193) }, #endif #if IS_ENABLED(CONFIG_HID_GEMBIRD) { HID_USB_DEVICE(USB_VENDOR_ID_GEMBIRD, USB_DEVICE_ID_GEMBIRD_JPD_DUALFORCE2) }, #endif #if IS_ENABLED(CONFIG_HID_GFRM) { HID_BLUETOOTH_DEVICE(0x58, 0x2000) }, { HID_BLUETOOTH_DEVICE(0x471, 0x2210) }, #endif #if IS_ENABLED(CONFIG_HID_GREENASIA) { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, 0x0012) }, #endif #if IS_ENABLED(CONFIG_HID_GT683R) { HID_USB_DEVICE(USB_VENDOR_ID_MSI, USB_DEVICE_ID_MSI_GT683R_LED_PANEL) }, #endif #if IS_ENABLED(CONFIG_HID_GYRATION) { HID_USB_DEVICE(USB_VENDOR_ID_GYRATION, USB_DEVICE_ID_GYRATION_REMOTE) }, { HID_USB_DEVICE(USB_VENDOR_ID_GYRATION, USB_DEVICE_ID_GYRATION_REMOTE_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_GYRATION, USB_DEVICE_ID_GYRATION_REMOTE_3) }, #endif #if IS_ENABLED(CONFIG_HID_HOLTEK) { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK, USB_DEVICE_ID_HOLTEK_ON_LINE_GRIP) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_KEYBOARD) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A04A) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A067) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A070) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A072) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A081) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A0C2) }, #endif #if IS_ENABLED(CONFIG_HID_ICADE) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ION, USB_DEVICE_ID_ICADE) }, #endif #if IS_ENABLED(CONFIG_HID_JABRA) { HID_USB_DEVICE(USB_VENDOR_ID_JABRA, HID_ANY_ID) }, #endif #if IS_ENABLED(CONFIG_HID_KENSINGTON) { HID_USB_DEVICE(USB_VENDOR_ID_KENSINGTON, USB_DEVICE_ID_KS_SLIMBLADE) }, #endif #if IS_ENABLED(CONFIG_HID_KEYTOUCH) { HID_USB_DEVICE(USB_VENDOR_ID_KEYTOUCH, USB_DEVICE_ID_KEYTOUCH_IEC) }, #endif #if IS_ENABLED(CONFIG_HID_KYE) { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_GENIUS_GILA_GAMING_MOUSE) }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_GENIUS_MANTICORE) }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_GENIUS_GX_IMPERATOR) }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_ERGO_525V) }, #endif #if IS_ENABLED(CONFIG_HID_LCPOWER) { HID_USB_DEVICE(USB_VENDOR_ID_LCPOWER, USB_DEVICE_ID_LCPOWER_LC1000) }, #endif #if IS_ENABLED(CONFIG_HID_LENOVO) { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_TPKBD) }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_CUSBKBD) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_CBTKBD) }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_TPPRODOCK) }, #endif #if IS_ENABLED(CONFIG_HID_LOGITECH) { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_MX3000_RECEIVER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_S510_RECEIVER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_RECEIVER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_DINOVO_DESKTOP) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_ELITE_KBD) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_CORDLESS_DESKTOP_LX500) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_EXTREME_3D) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_DUAL_ACTION) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_RUMBLEPAD_CORD) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_RUMBLEPAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_RUMBLEPAD2_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G29_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_WINGMAN_F3D) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_WINGMAN_FG) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_WINGMAN_FFG) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_FORCE3D_PRO) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_FLIGHT_SYSTEM_G940) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOMO_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOMO_WHEEL2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_VIBRATION_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_DFP_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_DFGT_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G25_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G27_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_WII_WHEEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_RUMBLEPAD2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_SPACETRAVELLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_SPACENAVIGATOR) }, #endif #if IS_ENABLED(CONFIG_HID_LOGITECH_HIDPP) { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_G920_WHEEL) }, #endif #if IS_ENABLED(CONFIG_HID_MAGICMOUSE) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGICMOUSE) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGICTRACKPAD) }, #endif #if IS_ENABLED(CONFIG_HID_MAYFLASH) { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_PS3) }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_DOLPHINBAR) }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_GAMECUBE1) }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_GAMECUBE2) }, { HID_USB_DEVICE(USB_VENDOR_ID_DRAGONRISE, USB_DEVICE_ID_DRAGONRISE_GAMECUBE3) }, #endif #if IS_ENABLED(CONFIG_HID_MICROSOFT) { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_COMFORT_MOUSE_4500) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_COMFORT_KEYBOARD) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_SIDEWINDER_GV) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_NE4K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_NE4K_JP) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_NE7K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_LK6K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_USB) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_WIRELESS_OPTICAL_DESKTOP_3_0) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_OFFICE_KB) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_7K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_600) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_3KV1) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_POWER_COVER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_BT) }, #endif #if IS_ENABLED(CONFIG_HID_MONTEREY) { HID_USB_DEVICE(USB_VENDOR_ID_MONTEREY, USB_DEVICE_ID_GENIUS_KB29E) }, #endif #if IS_ENABLED(CONFIG_HID_MULTITOUCH) { HID_USB_DEVICE(USB_VENDOR_ID_LG, USB_DEVICE_ID_LG_MELFAS_MT) }, #endif #if IS_ENABLED(CONFIG_HID_WIIMOTE) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_NINTENDO, USB_DEVICE_ID_NINTENDO_WIIMOTE) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_NINTENDO, USB_DEVICE_ID_NINTENDO_WIIMOTE2) }, #endif #if IS_ENABLED(CONFIG_HID_NTI) { HID_USB_DEVICE(USB_VENDOR_ID_NTI, USB_DEVICE_ID_USB_SUN) }, #endif #if IS_ENABLED(CONFIG_HID_NTRIG) { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_1) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_3) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_4) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_5) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_6) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_7) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_8) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_9) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_10) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_11) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_12) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_13) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_14) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_15) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_16) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_17) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_18) }, #endif #if IS_ENABLED(CONFIG_HID_ORTEK) { HID_USB_DEVICE(USB_VENDOR_ID_ORTEK, USB_DEVICE_ID_ORTEK_PKB1700) }, { HID_USB_DEVICE(USB_VENDOR_ID_ORTEK, USB_DEVICE_ID_ORTEK_WKB2000) }, { HID_USB_DEVICE(USB_VENDOR_ID_ORTEK, USB_DEVICE_ID_ORTEK_IHOME_IMAC_A210S) }, { HID_USB_DEVICE(USB_VENDOR_ID_SKYCABLE, USB_DEVICE_ID_SKYCABLE_WIRELESS_PRESENTER) }, #endif #if IS_ENABLED(CONFIG_HID_PANTHERLORD) { HID_USB_DEVICE(USB_VENDOR_ID_GAMERON, USB_DEVICE_ID_GAMERON_DUAL_PSX_ADAPTOR) }, { HID_USB_DEVICE(USB_VENDOR_ID_GAMERON, USB_DEVICE_ID_GAMERON_DUAL_PCS_ADAPTOR) }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, 0x0003) }, { HID_USB_DEVICE(USB_VENDOR_ID_JESS2, USB_DEVICE_ID_JESS2_COLOR_RUMBLE_PAD) }, #endif #if IS_ENABLED(CONFIG_HID_PENMOUNT) { HID_USB_DEVICE(USB_VENDOR_ID_PENMOUNT, USB_DEVICE_ID_PENMOUNT_6000) }, #endif #if IS_ENABLED(CONFIG_HID_PETALYNX) { HID_USB_DEVICE(USB_VENDOR_ID_PETALYNX, USB_DEVICE_ID_PETALYNX_MAXTER_REMOTE) }, #endif #if IS_ENABLED(CONFIG_HID_PICOLCD) { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICOLCD) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICOLCD_BOOTLOADER) }, #endif #if IS_ENABLED(CONFIG_HID_PLANTRONICS) { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, HID_ANY_ID) }, #endif #if IS_ENABLED(CONFIG_HID_PLAYSTATION) { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER_2) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER_DONGLE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER_2) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS5_CONTROLLER_2) }, #endif #if IS_ENABLED(CONFIG_HID_PRIMAX) { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_KEYBOARD) }, #endif #if IS_ENABLED(CONFIG_HID_PRODIKEYS) { HID_USB_DEVICE(USB_VENDOR_ID_CREATIVELABS, USB_DEVICE_ID_PRODIKEYS_PCMIDI) }, #endif #if IS_ENABLED(CONFIG_HID_RETRODE) { HID_USB_DEVICE(USB_VENDOR_ID_FUTURE_TECHNOLOGY, USB_DEVICE_ID_RETRODE2) }, #endif #if IS_ENABLED(CONFIG_HID_RMI) { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_COVER) }, { HID_USB_DEVICE(USB_VENDOR_ID_RAZER, USB_DEVICE_ID_RAZER_BLADE_14) }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_REZEL) }, #endif #if IS_ENABLED(CONFIG_HID_ROCCAT) { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_ARVO) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_ISKU) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_ISKUFX) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KONE) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KONEPLUS) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KONEPURE) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KONEPURE_OPTICAL) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KONEXTD) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_KOVAPLUS) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_LUA) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_PYRA_WIRED) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_PYRA_WIRELESS) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_RYOS_MK) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_RYOS_MK_GLOW) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_RYOS_MK_PRO) }, { HID_USB_DEVICE(USB_VENDOR_ID_ROCCAT, USB_DEVICE_ID_ROCCAT_SAVU) }, #endif #if IS_ENABLED(CONFIG_HID_SAITEK) { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_PS1000) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_RAT7_OLD) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_RAT7) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_RAT9) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAITEK, USB_DEVICE_ID_SAITEK_MMO7) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT5) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_RAT9) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_MMO7) }, #endif #if IS_ENABLED(CONFIG_HID_SAMSUNG) { HID_USB_DEVICE(USB_VENDOR_ID_SAMSUNG, USB_DEVICE_ID_SAMSUNG_IR_REMOTE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAMSUNG, USB_DEVICE_ID_SAMSUNG_WIRELESS_KBD_MOUSE) }, #endif #if IS_ENABLED(CONFIG_HID_SMARTJOYPLUS) { HID_USB_DEVICE(USB_VENDOR_ID_PLAYDOTCOM, USB_DEVICE_ID_PLAYDOTCOM_EMS_USBII) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_SMARTJOY_PLUS) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_SUPER_JOY_BOX_3) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_DUAL_USB_JOYPAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP_LTD, USB_DEVICE_ID_SUPER_JOY_BOX_3_PRO) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP_LTD, USB_DEVICE_ID_SUPER_DUAL_BOX_PRO) }, { HID_USB_DEVICE(USB_VENDOR_ID_WISEGROUP_LTD, USB_DEVICE_ID_SUPER_JOY_BOX_5_PRO) }, #endif #if IS_ENABLED(CONFIG_HID_SONY) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_HARMONY_PS3) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SMK, USB_DEVICE_ID_SMK_PS3_BDREMOTE) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SMK, USB_DEVICE_ID_SMK_NSG_MR5U_REMOTE) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SMK, USB_DEVICE_ID_SMK_NSG_MR7U_REMOTE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_BUZZ_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_WIRELESS_BUZZ_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_MOTION_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_MOTION_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_NAVIGATION_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_NAVIGATION_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS3_BDREMOTE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS3_CONTROLLER) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS3_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_VAIO_VGX_MOUSE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_VAIO_VGP_MOUSE) }, { HID_USB_DEVICE(USB_VENDOR_ID_SINO_LITE, USB_DEVICE_ID_SINO_LITE_CONTROLLER) }, #endif #if IS_ENABLED(CONFIG_HID_SPEEDLINK) { HID_USB_DEVICE(USB_VENDOR_ID_X_TENSIONS, USB_DEVICE_ID_SPEEDLINK_VAD_CEZANNE) }, #endif #if IS_ENABLED(CONFIG_HID_STEELSERIES) { HID_USB_DEVICE(USB_VENDOR_ID_STEELSERIES, USB_DEVICE_ID_STEELSERIES_SRWS1) }, #endif #if IS_ENABLED(CONFIG_HID_SUNPLUS) { HID_USB_DEVICE(USB_VENDOR_ID_SUNPLUS, USB_DEVICE_ID_SUNPLUS_WDESKTOP) }, #endif #if IS_ENABLED(CONFIG_HID_THRUSTMASTER) { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb300) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb304) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb323) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb324) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb605) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb651) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb653) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb654) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb65a) }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb65d) }, #endif #if IS_ENABLED(CONFIG_HID_TIVO) { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_TIVO, USB_DEVICE_ID_TIVO_SLIDE_BT) }, { HID_USB_DEVICE(USB_VENDOR_ID_TIVO, USB_DEVICE_ID_TIVO_SLIDE) }, { HID_USB_DEVICE(USB_VENDOR_ID_TIVO, USB_DEVICE_ID_TIVO_SLIDE_PRO) }, #endif #if IS_ENABLED(CONFIG_HID_TOPSEED) { HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE) }, { HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE_2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS) }, { HID_USB_DEVICE(USB_VENDOR_ID_TOPSEED, USB_DEVICE_ID_TOPSEED_CYBERLINK) }, { HID_USB_DEVICE(USB_VENDOR_ID_TOPSEED2, USB_DEVICE_ID_TOPSEED2_RF_COMBO) }, #endif #if IS_ENABLED(CONFIG_HID_TWINHAN) { HID_USB_DEVICE(USB_VENDOR_ID_TWINHAN, USB_DEVICE_ID_TWINHAN_IR_REMOTE) }, #endif #if IS_ENABLED(CONFIG_HID_UDRAW_PS3) { HID_USB_DEVICE(USB_VENDOR_ID_THQ, USB_DEVICE_ID_THQ_PS3_UDRAW) }, #endif #if IS_ENABLED(CONFIG_HID_XINMO) { HID_USB_DEVICE(USB_VENDOR_ID_XIN_MO, USB_DEVICE_ID_XIN_MO_DUAL_ARCADE) }, { HID_USB_DEVICE(USB_VENDOR_ID_XIN_MO, USB_DEVICE_ID_THT_2P_ARCADE) }, #endif #if IS_ENABLED(CONFIG_HID_ZEROPLUS) { HID_USB_DEVICE(USB_VENDOR_ID_ZEROPLUS, 0x0005) }, { HID_USB_DEVICE(USB_VENDOR_ID_ZEROPLUS, 0x0030) }, #endif #if IS_ENABLED(CONFIG_HID_ZYDACRON) { HID_USB_DEVICE(USB_VENDOR_ID_ZYDACRON, USB_DEVICE_ID_ZYDACRON_REMOTE_CONTROL) }, #endif { } }; /* a list of devices that shouldn't be handled by HID core at all */ static const struct hid_device_id hid_ignore_list[] = { { HID_USB_DEVICE(USB_VENDOR_ID_ACECAD, USB_DEVICE_ID_ACECAD_FLAIR) }, { HID_USB_DEVICE(USB_VENDOR_ID_ACECAD, USB_DEVICE_ID_ACECAD_302) }, { HID_USB_DEVICE(USB_VENDOR_ID_ADS_TECH, USB_DEVICE_ID_ADS_TECH_RADIO_SI470X) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_01) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_10) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_20) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_21) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_22) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_23) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_24) }, { HID_USB_DEVICE(USB_VENDOR_ID_AIRCABLE, USB_DEVICE_ID_AIRCABLE1) }, { HID_USB_DEVICE(USB_VENDOR_ID_ALCOR, USB_DEVICE_ID_ALCOR_USBRS232) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_LCM)}, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_LCM2)}, { HID_USB_DEVICE(USB_VENDOR_ID_AVERMEDIA, USB_DEVICE_ID_AVER_FM_MR800) }, { HID_USB_DEVICE(USB_VENDOR_ID_AXENTIA, USB_DEVICE_ID_AXENTIA_FM_RADIO) }, { HID_USB_DEVICE(USB_VENDOR_ID_BERKSHIRE, USB_DEVICE_ID_BERKSHIRE_PCWD) }, { HID_USB_DEVICE(USB_VENDOR_ID_CIDC, 0x0103) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_RADIO_SI470X) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_RADIO_SI4713) }, { HID_USB_DEVICE(USB_VENDOR_ID_CMEDIA, USB_DEVICE_ID_CM109) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_HIDCOM) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_ULTRAMOUSE) }, { HID_USB_DEVICE(USB_VENDOR_ID_DEALEXTREAME, USB_DEVICE_ID_DEALEXTREAME_RADIO_SI4701) }, { HID_USB_DEVICE(USB_VENDOR_ID_DELORME, USB_DEVICE_ID_DELORME_EARTHMATE) }, { HID_USB_DEVICE(USB_VENDOR_ID_DELORME, USB_DEVICE_ID_DELORME_EM_LT20) }, { HID_USB_DEVICE(USB_VENDOR_ID_ESSENTIAL_REALITY, USB_DEVICE_ID_ESSENTIAL_REALITY_P5) }, { HID_USB_DEVICE(USB_VENDOR_ID_ETT, USB_DEVICE_ID_TC5UH) }, { HID_USB_DEVICE(USB_VENDOR_ID_ETT, USB_DEVICE_ID_TC4UM) }, { HID_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH, 0x0001) }, { HID_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH, 0x0002) }, { HID_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH, 0x0004) }, { HID_USB_DEVICE(USB_VENDOR_ID_GOTOP, USB_DEVICE_ID_SUPER_Q2) }, { HID_USB_DEVICE(USB_VENDOR_ID_GOTOP, USB_DEVICE_ID_GOGOPEN) }, { HID_USB_DEVICE(USB_VENDOR_ID_GOTOP, USB_DEVICE_ID_PENPOWER) }, { HID_USB_DEVICE(USB_VENDOR_ID_GRETAGMACBETH, USB_DEVICE_ID_GRETAGMACBETH_HUEY) }, { HID_USB_DEVICE(USB_VENDOR_ID_GRIFFIN, USB_DEVICE_ID_POWERMATE) }, { HID_USB_DEVICE(USB_VENDOR_ID_GRIFFIN, USB_DEVICE_ID_SOUNDKNOB) }, { HID_USB_DEVICE(USB_VENDOR_ID_GRIFFIN, USB_DEVICE_ID_RADIOSHARK) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_90) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_100) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_101) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_103) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_104) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_105) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_106) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_107) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_108) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_200) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_201) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_202) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_203) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_204) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_205) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_206) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_207) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_300) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_301) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_302) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_303) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_304) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_305) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_306) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_307) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_308) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_309) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_400) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_401) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_402) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_403) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_404) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_405) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_500) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_501) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_502) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_503) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_504) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1000) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1001) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1002) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1003) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1004) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1005) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1006) }, { HID_USB_DEVICE(USB_VENDOR_ID_GTCO, USB_DEVICE_ID_GTCO_1007) }, { HID_USB_DEVICE(USB_VENDOR_ID_IMATION, USB_DEVICE_ID_DISC_STAKKA) }, { HID_USB_DEVICE(USB_VENDOR_ID_JABRA, USB_DEVICE_ID_JABRA_GN9350E) }, { HID_USB_DEVICE(USB_VENDOR_ID_KBGEAR, USB_DEVICE_ID_KBGEAR_JAMSTUDIO) }, { HID_USB_DEVICE(USB_VENDOR_ID_KWORLD, USB_DEVICE_ID_KWORLD_RADIO_FM700) }, { HID_USB_DEVICE(USB_VENDOR_ID_KYE, USB_DEVICE_ID_KYE_GPEN_560) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_KYE, 0x0058) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_CASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_CASSY2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_POCKETCASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_POCKETCASSY2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MOBILECASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MOBILECASSY2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MICROCASSYVOLTAGE) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MICROCASSYCURRENT) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MICROCASSYTIME) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MICROCASSYTEMPERATURE) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MICROCASSYPH) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_POWERANALYSERCASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_CONVERTERCONTROLLERCASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MACHINETESTCASSY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_JWM) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_DMMP) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_UMIP) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_UMIC) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_UMIB) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_XRAY) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_XRAY2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_VIDEOCOM) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MOTOR) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_COM3LAB) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_TELEPORT) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_NETWORKANALYSER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_POWERCONTROL) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MACHINETEST) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MOSTANALYSER) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MOSTANALYSER2) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_ABSESP) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_AUTODATABUS) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_MCT) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_HYBRID) }, { HID_USB_DEVICE(USB_VENDOR_ID_LD, USB_DEVICE_ID_LD_HEATCONTROL) }, { HID_USB_DEVICE(USB_VENDOR_ID_MADCATZ, USB_DEVICE_ID_MADCATZ_BEATPAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_MCC, USB_DEVICE_ID_MCC_PMD1024LS) }, { HID_USB_DEVICE(USB_VENDOR_ID_MCC, USB_DEVICE_ID_MCC_PMD1208LS) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICKIT1) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICKIT2) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICK16F1454) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROCHIP, USB_DEVICE_ID_PICK16F1454_V2) }, { HID_USB_DEVICE(USB_VENDOR_ID_NATIONAL_SEMICONDUCTOR, USB_DEVICE_ID_N_S_HARMONY) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 20) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 30) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 100) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 108) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 118) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 200) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 300) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 400) }, { HID_USB_DEVICE(USB_VENDOR_ID_ONTRAK, USB_DEVICE_ID_ONTRAK_ADU100 + 500) }, { HID_USB_DEVICE(USB_VENDOR_ID_PANJIT, 0x0001) }, { HID_USB_DEVICE(USB_VENDOR_ID_PANJIT, 0x0002) }, { HID_USB_DEVICE(USB_VENDOR_ID_PANJIT, 0x0003) }, { HID_USB_DEVICE(USB_VENDOR_ID_PANJIT, 0x0004) }, { HID_USB_DEVICE(USB_VENDOR_ID_PETZL, USB_DEVICE_ID_PETZL_HEADLAMP) }, { HID_USB_DEVICE(USB_VENDOR_ID_PHILIPS, USB_DEVICE_ID_PHILIPS_IEEE802154_DONGLE) }, { HID_USB_DEVICE(USB_VENDOR_ID_POWERCOM, USB_DEVICE_ID_POWERCOM_UPS) }, { HID_USB_DEVICE(USB_VENDOR_ID_SAI, USB_DEVICE_ID_CYPRESS_HIDCOM) }, #if IS_ENABLED(CONFIG_MOUSE_SYNAPTICS_USB) { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_TP) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_INT_TP) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_CPAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_STICK) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_WP) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_COMP_TP) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_WTP) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_DPAD) }, #endif { HID_USB_DEVICE(USB_VENDOR_ID_YEALINK, USB_DEVICE_ID_YEALINK_P1K_P4K_B2K) }, { HID_USB_DEVICE(USB_VENDOR_ID_QUANTA, USB_DEVICE_ID_QUANTA_HP_5MP_CAMERA_5473) }, { } }; /* * hid_mouse_ignore_list - mouse devices which should not be handled by the hid layer * * There are composite devices for which we want to ignore only a certain * interface. This is a list of devices for which only the mouse interface will * be ignored. This allows a dedicated driver to take care of the interface. */ static const struct hid_device_id hid_mouse_ignore_list[] = { /* appletouch driver */ { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER3_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER4_HF_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING2_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING3_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING4A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING5A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING6A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING7A_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING8_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_ANSI) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_ISO) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRING9_JIS) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J140K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J132) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J680) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J213) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J214K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J223) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J230K) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_WELLSPRINGT2_J152F) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_TP_ONLY) }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER1_TP_ONLY) }, { } }; bool hid_ignore(struct hid_device *hdev) { int i; if (hdev->quirks & HID_QUIRK_NO_IGNORE) return false; if (hdev->quirks & HID_QUIRK_IGNORE) return true; switch (hdev->vendor) { case USB_VENDOR_ID_CODEMERCS: /* ignore all Code Mercenaries IOWarrior devices */ if (hdev->product >= USB_DEVICE_ID_CODEMERCS_IOW_FIRST && hdev->product <= USB_DEVICE_ID_CODEMERCS_IOW_LAST) return true; break; case USB_VENDOR_ID_LOGITECH: if (hdev->product >= USB_DEVICE_ID_LOGITECH_HARMONY_FIRST && hdev->product <= USB_DEVICE_ID_LOGITECH_HARMONY_LAST) return true; /* * The Keene FM transmitter USB device has the same USB ID as * the Logitech AudioHub Speaker, but it should ignore the hid. * Check if the name is that of the Keene device. * For reference: the name of the AudioHub is * "HOLTEK AudioHub Speaker". */ if (hdev->product == USB_DEVICE_ID_LOGITECH_AUDIOHUB && !strcmp(hdev->name, "HOLTEK B-LINK USB Audio ")) return true; break; case USB_VENDOR_ID_SOUNDGRAPH: if (hdev->product >= USB_DEVICE_ID_SOUNDGRAPH_IMON_FIRST && hdev->product <= USB_DEVICE_ID_SOUNDGRAPH_IMON_LAST) return true; break; case USB_VENDOR_ID_HANWANG: if (hdev->product >= USB_DEVICE_ID_HANWANG_TABLET_FIRST && hdev->product <= USB_DEVICE_ID_HANWANG_TABLET_LAST) return true; break; case USB_VENDOR_ID_JESS: if (hdev->product == USB_DEVICE_ID_JESS_YUREX && hdev->type == HID_TYPE_USBNONE) return true; break; case USB_VENDOR_ID_VELLEMAN: /* These are not HID devices. They are handled by comedi. */ if ((hdev->product >= USB_DEVICE_ID_VELLEMAN_K8055_FIRST && hdev->product <= USB_DEVICE_ID_VELLEMAN_K8055_LAST) || (hdev->product >= USB_DEVICE_ID_VELLEMAN_K8061_FIRST && hdev->product <= USB_DEVICE_ID_VELLEMAN_K8061_LAST)) return true; break; case USB_VENDOR_ID_ATMEL_V_USB: /* Masterkit MA901 usb radio based on Atmel tiny85 chip and * it has the same USB ID as many Atmel V-USB devices. This * usb radio is handled by radio-ma901.c driver so we want * ignore the hid. Check the name, bus, product and ignore * if we have MA901 usb radio. */ if (hdev->product == USB_DEVICE_ID_ATMEL_V_USB && hdev->bus == BUS_USB && strncmp(hdev->name, "www.masterkit.ru MA901", 22) == 0) return true; break; case USB_VENDOR_ID_ELAN: /* * Blacklist of everything that gets handled by the elan_i2c * input driver. This avoids disabling valid touchpads and * other ELAN devices. */ if ((hdev->product == 0x0401 || hdev->product == 0x0400)) for (i = 0; strlen(elan_acpi_id[i].id); ++i) if (!strncmp(hdev->name, elan_acpi_id[i].id, strlen(elan_acpi_id[i].id))) return true; break; } if (hdev->type == HID_TYPE_USBMOUSE && hdev->quirks & HID_QUIRK_IGNORE_MOUSE) return true; return !!hid_match_id(hdev, hid_ignore_list); } EXPORT_SYMBOL_GPL(hid_ignore); /* Dynamic HID quirks list - specified at runtime */ struct quirks_list_struct { struct hid_device_id hid_bl_item; struct list_head node; }; static LIST_HEAD(dquirks_list); static DEFINE_MUTEX(dquirks_lock); /* Runtime ("dynamic") quirks manipulation functions */ /** * hid_exists_dquirk - find any dynamic quirks for a HID device * @hdev: the HID device to match * * Description: * Scans dquirks_list for a matching dynamic quirk and returns * the pointer to the relevant struct hid_device_id if found. * Must be called with a read lock held on dquirks_lock. * * Return: NULL if no quirk found, struct hid_device_id * if found. */ static struct hid_device_id *hid_exists_dquirk(const struct hid_device *hdev) { struct quirks_list_struct *q; struct hid_device_id *bl_entry = NULL; list_for_each_entry(q, &dquirks_list, node) { if (hid_match_one_id(hdev, &q->hid_bl_item)) { bl_entry = &q->hid_bl_item; break; } } if (bl_entry != NULL) dbg_hid("Found dynamic quirk 0x%lx for HID device 0x%04x:0x%04x\n", bl_entry->driver_data, bl_entry->vendor, bl_entry->product); return bl_entry; } /** * hid_modify_dquirk - add/replace a HID quirk * @id: the HID device to match * @quirks: the unsigned long quirks value to add/replace * * Description: * If an dynamic quirk exists in memory for this device, replace its * quirks value with what was provided. Otherwise, add the quirk * to the dynamic quirks list. * * Return: 0 OK, -error on failure. */ static int hid_modify_dquirk(const struct hid_device_id *id, const unsigned long quirks) { struct hid_device *hdev; struct quirks_list_struct *q_new, *q; int list_edited = 0; int ret = 0; hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); if (!hdev) return -ENOMEM; q_new = kmalloc(sizeof(struct quirks_list_struct), GFP_KERNEL); if (!q_new) { ret = -ENOMEM; goto out; } hdev->bus = q_new->hid_bl_item.bus = id->bus; hdev->group = q_new->hid_bl_item.group = id->group; hdev->vendor = q_new->hid_bl_item.vendor = id->vendor; hdev->product = q_new->hid_bl_item.product = id->product; q_new->hid_bl_item.driver_data = quirks; mutex_lock(&dquirks_lock); list_for_each_entry(q, &dquirks_list, node) { if (hid_match_one_id(hdev, &q->hid_bl_item)) { list_replace(&q->node, &q_new->node); kfree(q); list_edited = 1; break; } } if (!list_edited) list_add_tail(&q_new->node, &dquirks_list); mutex_unlock(&dquirks_lock); out: kfree(hdev); return ret; } /** * hid_remove_all_dquirks - remove all runtime HID quirks from memory * @bus: bus to match against. Use HID_BUS_ANY if all need to be removed. * * Description: * Free all memory associated with dynamic quirks - called before * module unload. * */ static void hid_remove_all_dquirks(__u16 bus) { struct quirks_list_struct *q, *temp; mutex_lock(&dquirks_lock); list_for_each_entry_safe(q, temp, &dquirks_list, node) { if (bus == HID_BUS_ANY || bus == q->hid_bl_item.bus) { list_del(&q->node); kfree(q); } } mutex_unlock(&dquirks_lock); } /** * hid_quirks_init - apply HID quirks specified at module load time * @quirks_param: array of quirks strings (vendor:product:quirks) * @bus: bus type * @count: number of quirks to check */ int hid_quirks_init(char **quirks_param, __u16 bus, int count) { struct hid_device_id id = { 0 }; int n = 0, m; unsigned short int vendor, product; u32 quirks; id.bus = bus; for (; n < count && quirks_param[n]; n++) { m = sscanf(quirks_param[n], "0x%hx:0x%hx:0x%x", &vendor, &product, &quirks); id.vendor = (__u16)vendor; id.product = (__u16)product; if (m != 3 || hid_modify_dquirk(&id, quirks) != 0) { pr_warn("Could not parse HID quirk module param %s\n", quirks_param[n]); } } return 0; } EXPORT_SYMBOL_GPL(hid_quirks_init); /** * hid_quirks_exit - release memory associated with dynamic_quirks * @bus: a bus to match against * * Description: * Release all memory associated with dynamic quirks for a given bus. * Called upon module unload. * Use HID_BUS_ANY to remove all dynamic quirks. * * Returns: nothing */ void hid_quirks_exit(__u16 bus) { hid_remove_all_dquirks(bus); } EXPORT_SYMBOL_GPL(hid_quirks_exit); /** * hid_gets_squirk - return any static quirks for a HID device * @hdev: the HID device to match * * Description: * Given a HID device, return a pointer to the quirked hid_device_id entry * associated with that device. * * Return: the quirks. */ static unsigned long hid_gets_squirk(const struct hid_device *hdev) { const struct hid_device_id *bl_entry; unsigned long quirks = hdev->initial_quirks; if (hid_match_id(hdev, hid_ignore_list)) quirks |= HID_QUIRK_IGNORE; if (hid_match_id(hdev, hid_mouse_ignore_list)) quirks |= HID_QUIRK_IGNORE_MOUSE; if (hid_match_id(hdev, hid_have_special_driver)) quirks |= HID_QUIRK_HAVE_SPECIAL_DRIVER; bl_entry = hid_match_id(hdev, hid_quirks); if (bl_entry != NULL) quirks |= bl_entry->driver_data; if (quirks) dbg_hid("Found squirk 0x%lx for HID device 0x%04x:0x%04x\n", quirks, hdev->vendor, hdev->product); return quirks; } /** * hid_lookup_quirk - return any quirks associated with a HID device * @hdev: the HID device to look for * * Description: * Given a HID device, return any quirks associated with that device. * * Return: an unsigned long quirks value. */ unsigned long hid_lookup_quirk(const struct hid_device *hdev) { unsigned long quirks = 0; const struct hid_device_id *quirk_entry = NULL; /* NCR devices must not be queried for reports */ if (hdev->bus == BUS_USB && hdev->vendor == USB_VENDOR_ID_NCR && hdev->product >= USB_DEVICE_ID_NCR_FIRST && hdev->product <= USB_DEVICE_ID_NCR_LAST) return HID_QUIRK_NO_INIT_REPORTS; /* These devices must be ignored if version (bcdDevice) is too old */ if (hdev->bus == BUS_USB && hdev->vendor == USB_VENDOR_ID_JABRA) { switch (hdev->product) { case USB_DEVICE_ID_JABRA_SPEAK_410: if (hdev->version < 0x0111) return HID_QUIRK_IGNORE; break; case USB_DEVICE_ID_JABRA_SPEAK_510: if (hdev->version < 0x0214) return HID_QUIRK_IGNORE; break; } } mutex_lock(&dquirks_lock); quirk_entry = hid_exists_dquirk(hdev); if (quirk_entry) quirks = quirk_entry->driver_data; else quirks = hid_gets_squirk(hdev); mutex_unlock(&dquirks_lock); return quirks; } EXPORT_SYMBOL_GPL(hid_lookup_quirk); |
| 18 16 1 1 1 27 28 2 3 3 2 2 2 38 1 6 6 6 6 6 6 6 40 40 36 50 48 49 49 48 47 2 47 45 46 47 47 47 46 46 11 46 46 47 47 46 46 11 46 47 47 50 3 50 50 50 50 47 46 46 8 6 46 21 47 47 47 7 50 49 50 50 36 22 48 50 50 45 46 45 34 45 46 7 161 163 164 149 158 156 159 154 158 154 140 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 1991, 1992 Linus Torvalds * Copyright (C) 1994, Karl Keyte: Added support for disk statistics * Elevator latency, (C) 2000 Andrea Arcangeli <andrea@suse.de> SuSE * Queue request tables / lock, selectable elevator, Jens Axboe <axboe@suse.de> * kernel-doc documentation started by NeilBrown <neilb@cse.unsw.edu.au> * - July2000 * bio rewrite, highmem i/o, etc, Jens Axboe <axboe@suse.de> - may 2001 */ /* * This handles all read/write requests to block devices */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/bio.h> #include <linux/blkdev.h> #include <linux/blk-pm.h> #include <linux/blk-integrity.h> #include <linux/highmem.h> #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/kernel_stat.h> #include <linux/string.h> #include <linux/init.h> #include <linux/completion.h> #include <linux/slab.h> #include <linux/swap.h> #include <linux/writeback.h> #include <linux/task_io_accounting_ops.h> #include <linux/fault-inject.h> #include <linux/list_sort.h> #include <linux/delay.h> #include <linux/ratelimit.h> #include <linux/pm_runtime.h> #include <linux/t10-pi.h> #include <linux/debugfs.h> #include <linux/bpf.h> #include <linux/part_stat.h> #include <linux/sched/sysctl.h> #include <linux/blk-crypto.h> #define CREATE_TRACE_POINTS #include <trace/events/block.h> #include "blk.h" #include "blk-mq-sched.h" #include "blk-pm.h" #include "blk-cgroup.h" #include "blk-throttle.h" #include "blk-ioprio.h" struct dentry *blk_debugfs_root; EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); EXPORT_TRACEPOINT_SYMBOL_GPL(block_split); EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_insert); static DEFINE_IDA(blk_queue_ida); /* * For queue allocation */ static struct kmem_cache *blk_requestq_cachep; /* * Controlling structure to kblockd */ static struct workqueue_struct *kblockd_workqueue; /** * blk_queue_flag_set - atomically set a queue flag * @flag: flag to be set * @q: request queue */ void blk_queue_flag_set(unsigned int flag, struct request_queue *q) { set_bit(flag, &q->queue_flags); } EXPORT_SYMBOL(blk_queue_flag_set); /** * blk_queue_flag_clear - atomically clear a queue flag * @flag: flag to be cleared * @q: request queue */ void blk_queue_flag_clear(unsigned int flag, struct request_queue *q) { clear_bit(flag, &q->queue_flags); } EXPORT_SYMBOL(blk_queue_flag_clear); #define REQ_OP_NAME(name) [REQ_OP_##name] = #name static const char *const blk_op_name[] = { REQ_OP_NAME(READ), REQ_OP_NAME(WRITE), REQ_OP_NAME(FLUSH), REQ_OP_NAME(DISCARD), REQ_OP_NAME(SECURE_ERASE), REQ_OP_NAME(ZONE_RESET), REQ_OP_NAME(ZONE_RESET_ALL), REQ_OP_NAME(ZONE_OPEN), REQ_OP_NAME(ZONE_CLOSE), REQ_OP_NAME(ZONE_FINISH), REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_ZEROES), REQ_OP_NAME(DRV_IN), REQ_OP_NAME(DRV_OUT), }; #undef REQ_OP_NAME /** * blk_op_str - Return string XXX in the REQ_OP_XXX. * @op: REQ_OP_XXX. * * Description: Centralize block layer function to convert REQ_OP_XXX into * string format. Useful in the debugging and tracing bio or request. For * invalid REQ_OP_XXX it returns string "UNKNOWN". */ inline const char *blk_op_str(enum req_op op) { const char *op_str = "UNKNOWN"; if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op]) op_str = blk_op_name[op]; return op_str; } EXPORT_SYMBOL_GPL(blk_op_str); static const struct { int errno; const char *name; } blk_errors[] = { [BLK_STS_OK] = { 0, "" }, [BLK_STS_NOTSUPP] = { -EOPNOTSUPP, "operation not supported" }, [BLK_STS_TIMEOUT] = { -ETIMEDOUT, "timeout" }, [BLK_STS_NOSPC] = { -ENOSPC, "critical space allocation" }, [BLK_STS_TRANSPORT] = { -ENOLINK, "recoverable transport" }, [BLK_STS_TARGET] = { -EREMOTEIO, "critical target" }, [BLK_STS_RESV_CONFLICT] = { -EBADE, "reservation conflict" }, [BLK_STS_MEDIUM] = { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION] = { -EILSEQ, "protection" }, [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, [BLK_STS_OFFLINE] = { -ENODEV, "device offline" }, /* device mapper special case, should not leak out: */ [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, /* zone device specific errors */ [BLK_STS_ZONE_OPEN_RESOURCE] = { -ETOOMANYREFS, "open zones exceeded" }, [BLK_STS_ZONE_ACTIVE_RESOURCE] = { -EOVERFLOW, "active zones exceeded" }, /* Command duration limit device-side timeout */ [BLK_STS_DURATION_LIMIT] = { -ETIME, "duration limit exceeded" }, [BLK_STS_INVAL] = { -EINVAL, "invalid" }, /* everything else not covered above: */ [BLK_STS_IOERR] = { -EIO, "I/O" }, }; blk_status_t errno_to_blk_status(int errno) { int i; for (i = 0; i < ARRAY_SIZE(blk_errors); i++) { if (blk_errors[i].errno == errno) return (__force blk_status_t)i; } return BLK_STS_IOERR; } EXPORT_SYMBOL_GPL(errno_to_blk_status); int blk_status_to_errno(blk_status_t status) { int idx = (__force int)status; if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors))) return -EIO; return blk_errors[idx].errno; } EXPORT_SYMBOL_GPL(blk_status_to_errno); const char *blk_status_to_str(blk_status_t status) { int idx = (__force int)status; if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors))) return "<null>"; return blk_errors[idx].name; } EXPORT_SYMBOL_GPL(blk_status_to_str); /** * blk_sync_queue - cancel any pending callbacks on a queue * @q: the queue * * Description: * The block layer may perform asynchronous callback activity * on a queue, such as calling the unplug function after a timeout. * A block device may call blk_sync_queue to ensure that any * such activity is cancelled, thus allowing it to release resources * that the callbacks might use. The caller must already have made sure * that its ->submit_bio will not re-add plugging prior to calling * this function. * * This function does not cancel any asynchronous activity arising * out of elevator or throttling code. That would require elevator_exit() * and blkcg_exit_queue() to be called with queue lock initialized. * */ void blk_sync_queue(struct request_queue *q) { timer_delete_sync(&q->timeout); cancel_work_sync(&q->timeout_work); } EXPORT_SYMBOL(blk_sync_queue); /** * blk_set_pm_only - increment pm_only counter * @q: request queue pointer */ void blk_set_pm_only(struct request_queue *q) { atomic_inc(&q->pm_only); } EXPORT_SYMBOL_GPL(blk_set_pm_only); void blk_clear_pm_only(struct request_queue *q) { int pm_only; pm_only = atomic_dec_return(&q->pm_only); WARN_ON_ONCE(pm_only < 0); if (pm_only == 0) wake_up_all(&q->mq_freeze_wq); } EXPORT_SYMBOL_GPL(blk_clear_pm_only); static void blk_free_queue_rcu(struct rcu_head *rcu_head) { struct request_queue *q = container_of(rcu_head, struct request_queue, rcu_head); percpu_ref_exit(&q->q_usage_counter); kmem_cache_free(blk_requestq_cachep, q); } static void blk_free_queue(struct request_queue *q) { blk_free_queue_stats(q->stats); if (queue_is_mq(q)) blk_mq_release(q); ida_free(&blk_queue_ida, q->id); lockdep_unregister_key(&q->io_lock_cls_key); lockdep_unregister_key(&q->q_lock_cls_key); call_rcu(&q->rcu_head, blk_free_queue_rcu); } /** * blk_put_queue - decrement the request_queue refcount * @q: the request_queue structure to decrement the refcount for * * Decrements the refcount of the request_queue and free it when the refcount * reaches 0. */ void blk_put_queue(struct request_queue *q) { if (refcount_dec_and_test(&q->refs)) blk_free_queue(q); } EXPORT_SYMBOL(blk_put_queue); bool blk_queue_start_drain(struct request_queue *q) { /* * When queue DYING flag is set, we need to block new req * entering queue, so we call blk_freeze_queue_start() to * prevent I/O from crossing blk_queue_enter(). */ bool freeze = __blk_freeze_queue_start(q, current); if (queue_is_mq(q)) blk_mq_wake_waiters(q); /* Make blk_queue_enter() reexamine the DYING flag. */ wake_up_all(&q->mq_freeze_wq); return freeze; } /** * blk_queue_enter() - try to increase q->q_usage_counter * @q: request queue pointer * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM */ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) { const bool pm = flags & BLK_MQ_REQ_PM; while (!blk_try_enter_queue(q, pm)) { if (flags & BLK_MQ_REQ_NOWAIT) return -EAGAIN; /* * read pair of barrier in blk_freeze_queue_start(), we need to * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and * reading .mq_freeze_depth or queue dying flag, otherwise the * following wait may never return if the two reads are * reordered. */ smp_rmb(); wait_event(q->mq_freeze_wq, (!q->mq_freeze_depth && blk_pm_resume_queue(pm, q)) || blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } rwsem_acquire_read(&q->q_lockdep_map, 0, 0, _RET_IP_); rwsem_release(&q->q_lockdep_map, _RET_IP_); return 0; } int __bio_queue_enter(struct request_queue *q, struct bio *bio) { while (!blk_try_enter_queue(q, false)) { struct gendisk *disk = bio->bi_bdev->bd_disk; if (bio->bi_opf & REQ_NOWAIT) { if (test_bit(GD_DEAD, &disk->state)) goto dead; bio_wouldblock_error(bio); return -EAGAIN; } /* * read pair of barrier in blk_freeze_queue_start(), we need to * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and * reading .mq_freeze_depth or queue dying flag, otherwise the * following wait may never return if the two reads are * reordered. */ smp_rmb(); wait_event(q->mq_freeze_wq, (!q->mq_freeze_depth && blk_pm_resume_queue(false, q)) || test_bit(GD_DEAD, &disk->state)); if (test_bit(GD_DEAD, &disk->state)) goto dead; } rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); rwsem_release(&q->io_lockdep_map, _RET_IP_); return 0; dead: bio_io_error(bio); return -ENODEV; } void blk_queue_exit(struct request_queue *q) { percpu_ref_put(&q->q_usage_counter); } static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); wake_up_all(&q->mq_freeze_wq); } static void blk_rq_timed_out_timer(struct timer_list *t) { struct request_queue *q = timer_container_of(q, t, timeout); kblockd_schedule_work(&q->timeout_work); } static void blk_timeout_work(struct work_struct *work) { } struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) { struct request_queue *q; int error; q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO, node_id); if (!q) return ERR_PTR(-ENOMEM); q->last_merge = NULL; q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL); if (q->id < 0) { error = q->id; goto fail_q; } q->stats = blk_alloc_queue_stats(); if (!q->stats) { error = -ENOMEM; goto fail_id; } error = blk_set_default_limits(lim); if (error) goto fail_stats; q->limits = *lim; q->node = node_id; atomic_set(&q->nr_active_requests_shared_tags, 0); timer_setup(&q->timeout, blk_rq_timed_out_timer, 0); INIT_WORK(&q->timeout_work, blk_timeout_work); INIT_LIST_HEAD(&q->icq_list); refcount_set(&q->refs, 1); mutex_init(&q->debugfs_mutex); mutex_init(&q->elevator_lock); mutex_init(&q->sysfs_lock); mutex_init(&q->limits_lock); mutex_init(&q->rq_qos_mutex); spin_lock_init(&q->queue_lock); init_waitqueue_head(&q->mq_freeze_wq); mutex_init(&q->mq_freeze_lock); blkg_init_queue(q); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. * See blk_register_queue() for details. */ error = percpu_ref_init(&q->q_usage_counter, blk_queue_usage_counter_release, PERCPU_REF_INIT_ATOMIC, GFP_KERNEL); if (error) goto fail_stats; lockdep_register_key(&q->io_lock_cls_key); lockdep_register_key(&q->q_lock_cls_key); lockdep_init_map(&q->io_lockdep_map, "&q->q_usage_counter(io)", &q->io_lock_cls_key, 0); lockdep_init_map(&q->q_lockdep_map, "&q->q_usage_counter(queue)", &q->q_lock_cls_key, 0); /* Teach lockdep about lock ordering (reclaim WRT queue freeze lock). */ fs_reclaim_acquire(GFP_KERNEL); rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); rwsem_release(&q->io_lockdep_map, _RET_IP_); fs_reclaim_release(GFP_KERNEL); q->nr_requests = BLKDEV_DEFAULT_RQ; return q; fail_stats: blk_free_queue_stats(q->stats); fail_id: ida_free(&blk_queue_ida, q->id); fail_q: kmem_cache_free(blk_requestq_cachep, q); return ERR_PTR(error); } /** * blk_get_queue - increment the request_queue refcount * @q: the request_queue structure to increment the refcount for * * Increment the refcount of the request_queue kobject. * * Context: Any context. */ bool blk_get_queue(struct request_queue *q) { if (unlikely(blk_queue_dying(q))) return false; refcount_inc(&q->refs); return true; } EXPORT_SYMBOL(blk_get_queue); #ifdef CONFIG_FAIL_MAKE_REQUEST static DECLARE_FAULT_ATTR(fail_make_request); static int __init setup_fail_make_request(char *str) { return setup_fault_attr(&fail_make_request, str); } __setup("fail_make_request=", setup_fail_make_request); bool should_fail_request(struct block_device *part, unsigned int bytes) { return bdev_test_flag(part, BD_MAKE_IT_FAIL) && should_fail(&fail_make_request, bytes); } static int __init fail_make_request_debugfs(void) { struct dentry *dir = fault_create_debugfs_attr("fail_make_request", NULL, &fail_make_request); return PTR_ERR_OR_ZERO(dir); } late_initcall(fail_make_request_debugfs); #endif /* CONFIG_FAIL_MAKE_REQUEST */ static inline void bio_check_ro(struct bio *bio) { if (op_is_write(bio_op(bio)) && bdev_read_only(bio->bi_bdev)) { if (op_is_flush(bio->bi_opf) && !bio_sectors(bio)) return; if (bdev_test_flag(bio->bi_bdev, BD_RO_WARNED)) return; bdev_set_flag(bio->bi_bdev, BD_RO_WARNED); /* * Use ioctl to set underlying disk of raid/dm to read-only * will trigger this. */ pr_warn("Trying to write to read-only block-device %pg\n", bio->bi_bdev); } } static noinline int should_fail_bio(struct bio *bio) { if (should_fail_request(bdev_whole(bio->bi_bdev), bio->bi_iter.bi_size)) return -EIO; return 0; } ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO); /* * Check whether this bio extends beyond the end of the device or partition. * This may well happen - the kernel calls bread() without checking the size of * the device, e.g., when mounting a file system. */ static inline int bio_check_eod(struct bio *bio) { sector_t maxsector = bdev_nr_sectors(bio->bi_bdev); unsigned int nr_sectors = bio_sectors(bio); if (nr_sectors && (nr_sectors > maxsector || bio->bi_iter.bi_sector > maxsector - nr_sectors)) { pr_info_ratelimited("%s: attempt to access beyond end of device\n" "%pg: rw=%d, sector=%llu, nr_sectors = %u limit=%llu\n", current->comm, bio->bi_bdev, bio->bi_opf, bio->bi_iter.bi_sector, nr_sectors, maxsector); return -EIO; } return 0; } /* * Remap block n of partition p to block n+start(p) of the disk. */ static int blk_partition_remap(struct bio *bio) { struct block_device *p = bio->bi_bdev; if (unlikely(should_fail_request(p, bio->bi_iter.bi_size))) return -EIO; if (bio_sectors(bio)) { bio->bi_iter.bi_sector += p->bd_start_sect; trace_block_bio_remap(bio, p->bd_dev, bio->bi_iter.bi_sector - p->bd_start_sect); } bio_set_flag(bio, BIO_REMAPPED); return 0; } /* * Check write append to a zoned block device. */ static inline blk_status_t blk_check_zone_append(struct request_queue *q, struct bio *bio) { int nr_sectors = bio_sectors(bio); /* Only applicable to zoned block devices */ if (!bdev_is_zoned(bio->bi_bdev)) return BLK_STS_NOTSUPP; /* The bio sector must point to the start of a sequential zone */ if (!bdev_is_zone_start(bio->bi_bdev, bio->bi_iter.bi_sector)) return BLK_STS_IOERR; /* * Not allowed to cross zone boundaries. Otherwise, the BIO will be * split and could result in non-contiguous sectors being written in * different zones. */ if (nr_sectors > q->limits.chunk_sectors) return BLK_STS_IOERR; /* Make sure the BIO is small enough and will not get split */ if (nr_sectors > q->limits.max_zone_append_sectors) return BLK_STS_IOERR; bio->bi_opf |= REQ_NOMERGE; return BLK_STS_OK; } static void __submit_bio(struct bio *bio) { /* If plug is not used, add new plug here to cache nsecs time. */ struct blk_plug plug; if (unlikely(!blk_crypto_bio_prep(&bio))) return; blk_start_plug(&plug); if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { blk_mq_submit_bio(bio); } else if (likely(bio_queue_enter(bio) == 0)) { struct gendisk *disk = bio->bi_bdev->bd_disk; if ((bio->bi_opf & REQ_POLLED) && !(disk->queue->limits.features & BLK_FEAT_POLL)) { bio->bi_status = BLK_STS_NOTSUPP; bio_endio(bio); } else { disk->fops->submit_bio(bio); } blk_queue_exit(disk->queue); } blk_finish_plug(&plug); } /* * The loop in this function may be a bit non-obvious, and so deserves some * explanation: * * - Before entering the loop, bio->bi_next is NULL (as all callers ensure * that), so we have a list with a single bio. * - We pretend that we have just taken it off a longer list, so we assign * bio_list to a pointer to the bio_list_on_stack, thus initialising the * bio_list of new bios to be added. ->submit_bio() may indeed add some more * bios through a recursive call to submit_bio_noacct. If it did, we find a * non-NULL value in bio_list and re-enter the loop from the top. * - In this case we really did just take the bio of the top of the list (no * pretending) and so remove it from bio_list, and call into ->submit_bio() * again. * * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. * bio_list_on_stack[1] contains bios that were submitted before the current * ->submit_bio, but that haven't been processed yet. */ static void __submit_bio_noacct(struct bio *bio) { struct bio_list bio_list_on_stack[2]; BUG_ON(bio->bi_next); bio_list_init(&bio_list_on_stack[0]); current->bio_list = bio_list_on_stack; do { struct request_queue *q = bdev_get_queue(bio->bi_bdev); struct bio_list lower, same; /* * Create a fresh bio_list for all subordinate requests. */ bio_list_on_stack[1] = bio_list_on_stack[0]; bio_list_init(&bio_list_on_stack[0]); __submit_bio(bio); /* * Sort new bios into those for a lower level and those for the * same level. */ bio_list_init(&lower); bio_list_init(&same); while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) if (q == bdev_get_queue(bio->bi_bdev)) bio_list_add(&same, bio); else bio_list_add(&lower, bio); /* * Now assemble so we handle the lowest level first. */ bio_list_merge(&bio_list_on_stack[0], &lower); bio_list_merge(&bio_list_on_stack[0], &same); bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); current->bio_list = NULL; } static void __submit_bio_noacct_mq(struct bio *bio) { struct bio_list bio_list[2] = { }; current->bio_list = bio_list; do { __submit_bio(bio); } while ((bio = bio_list_pop(&bio_list[0]))); current->bio_list = NULL; } void submit_bio_noacct_nocheck(struct bio *bio) { blk_cgroup_bio_start(bio); blkcg_bio_issue_init(bio); if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) { trace_block_bio_queue(bio); /* * Now that enqueuing has been traced, we need to trace * completion as well. */ bio_set_flag(bio, BIO_TRACE_COMPLETION); } /* * We only want one ->submit_bio to be active at a time, else stack * usage with stacked devices could be a problem. Use current->bio_list * to collect a list of requests submited by a ->submit_bio method while * it is active, and then process them after it returned. */ if (current->bio_list) bio_list_add(¤t->bio_list[0], bio); else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) __submit_bio_noacct_mq(bio); else __submit_bio_noacct(bio); } static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q, struct bio *bio) { if (bio->bi_iter.bi_size > queue_atomic_write_unit_max_bytes(q)) return BLK_STS_INVAL; if (bio->bi_iter.bi_size % queue_atomic_write_unit_min_bytes(q)) return BLK_STS_INVAL; return BLK_STS_OK; } /** * submit_bio_noacct - re-submit a bio to the block device layer for I/O * @bio: The bio describing the location in memory and on the device. * * This is a version of submit_bio() that shall only be used for I/O that is * resubmitted to lower level drivers by stacking block drivers. All file * systems and other upper level users of the block layer should use * submit_bio() instead. */ void submit_bio_noacct(struct bio *bio) { struct block_device *bdev = bio->bi_bdev; struct request_queue *q = bdev_get_queue(bdev); blk_status_t status = BLK_STS_IOERR; might_sleep(); /* * For a REQ_NOWAIT based request, return -EOPNOTSUPP * if queue does not support NOWAIT. */ if ((bio->bi_opf & REQ_NOWAIT) && !bdev_nowait(bdev)) goto not_supported; if (should_fail_bio(bio)) goto end_io; bio_check_ro(bio); if (!bio_flagged(bio, BIO_REMAPPED)) { if (unlikely(bio_check_eod(bio))) goto end_io; if (bdev_is_partition(bdev) && unlikely(blk_partition_remap(bio))) goto end_io; } /* * Filter flush bio's early so that bio based drivers without flush * support don't have to worry about them. */ if (op_is_flush(bio->bi_opf)) { if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE && bio_op(bio) != REQ_OP_ZONE_APPEND)) goto end_io; if (!bdev_write_cache(bdev)) { bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA); if (!bio_sectors(bio)) { status = BLK_STS_OK; goto end_io; } } } switch (bio_op(bio)) { case REQ_OP_READ: break; case REQ_OP_WRITE: if (bio->bi_opf & REQ_ATOMIC) { status = blk_validate_atomic_write_op_size(q, bio); if (status != BLK_STS_OK) goto end_io; } break; case REQ_OP_FLUSH: /* * REQ_OP_FLUSH can't be submitted through bios, it is only * synthetized in struct request by the flush state machine. */ goto not_supported; case REQ_OP_DISCARD: if (!bdev_max_discard_sectors(bdev)) goto not_supported; break; case REQ_OP_SECURE_ERASE: if (!bdev_max_secure_erase_sectors(bdev)) goto not_supported; break; case REQ_OP_ZONE_APPEND: status = blk_check_zone_append(q, bio); if (status != BLK_STS_OK) goto end_io; break; case REQ_OP_WRITE_ZEROES: if (!q->limits.max_write_zeroes_sectors) goto not_supported; break; case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_OPEN: case REQ_OP_ZONE_CLOSE: case REQ_OP_ZONE_FINISH: case REQ_OP_ZONE_RESET_ALL: if (!bdev_is_zoned(bio->bi_bdev)) goto not_supported; break; case REQ_OP_DRV_IN: case REQ_OP_DRV_OUT: /* * Driver private operations are only used with passthrough * requests. */ fallthrough; default: goto not_supported; } if (blk_throtl_bio(bio)) return; submit_bio_noacct_nocheck(bio); return; not_supported: status = BLK_STS_NOTSUPP; end_io: bio->bi_status = status; bio_endio(bio); } EXPORT_SYMBOL(submit_bio_noacct); static void bio_set_ioprio(struct bio *bio) { /* Nobody set ioprio so far? Initialize it based on task's nice value */ if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) bio->bi_ioprio = get_current_ioprio(); blkcg_set_ioprio(bio); } /** * submit_bio - submit a bio to the block device layer for I/O * @bio: The &struct bio which describes the I/O * * submit_bio() is used to submit I/O requests to block devices. It is passed a * fully set up &struct bio that describes the I/O that needs to be done. The * bio will be send to the device described by the bi_bdev field. * * The success/failure status of the request, along with notification of * completion, is delivered asynchronously through the ->bi_end_io() callback * in @bio. The bio must NOT be touched by the caller until ->bi_end_io() has * been called. */ void submit_bio(struct bio *bio) { if (bio_op(bio) == REQ_OP_READ) { task_io_account_read(bio->bi_iter.bi_size); count_vm_events(PGPGIN, bio_sectors(bio)); } else if (bio_op(bio) == REQ_OP_WRITE) { count_vm_events(PGPGOUT, bio_sectors(bio)); } bio_set_ioprio(bio); submit_bio_noacct(bio); } EXPORT_SYMBOL(submit_bio); /** * bio_poll - poll for BIO completions * @bio: bio to poll for * @iob: batches of IO * @flags: BLK_POLL_* flags that control the behavior * * Poll for completions on queue associated with the bio. Returns number of * completed entries found. * * Note: the caller must either be the context that submitted @bio, or * be in a RCU critical section to prevent freeing of @bio. */ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) { blk_qc_t cookie = READ_ONCE(bio->bi_cookie); struct block_device *bdev; struct request_queue *q; int ret = 0; bdev = READ_ONCE(bio->bi_bdev); if (!bdev) return 0; q = bdev_get_queue(bdev); if (cookie == BLK_QC_T_NONE) return 0; blk_flush_plug(current->plug, false); /* * We need to be able to enter a frozen queue, similar to how * timeouts also need to do that. If that is blocked, then we can * have pending IO when a queue freeze is started, and then the * wait for the freeze to finish will wait for polled requests to * timeout as the poller is preventer from entering the queue and * completing them. As long as we prevent new IO from being queued, * that should be all that matters. */ if (!percpu_ref_tryget(&q->q_usage_counter)) return 0; if (queue_is_mq(q)) { ret = blk_mq_poll(q, cookie, iob, flags); } else { struct gendisk *disk = q->disk; if ((q->limits.features & BLK_FEAT_POLL) && disk && disk->fops->poll_bio) ret = disk->fops->poll_bio(bio, iob, flags); } blk_queue_exit(q); return ret; } EXPORT_SYMBOL_GPL(bio_poll); /* * Helper to implement file_operations.iopoll. Requires the bio to be stored * in iocb->private, and cleared before freeing the bio. */ int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, unsigned int flags) { struct bio *bio; int ret = 0; /* * Note: the bio cache only uses SLAB_TYPESAFE_BY_RCU, so bio can * point to a freshly allocated bio at this point. If that happens * we have a few cases to consider: * * 1) the bio is beeing initialized and bi_bdev is NULL. We can just * simply nothing in this case * 2) the bio points to a not poll enabled device. bio_poll will catch * this and return 0 * 3) the bio points to a poll capable device, including but not * limited to the one that the original bio pointed to. In this * case we will call into the actual poll method and poll for I/O, * even if we don't need to, but it won't cause harm either. * * For cases 2) and 3) above the RCU grace period ensures that bi_bdev * is still allocated. Because partitions hold a reference to the whole * device bdev and thus disk, the disk is also still valid. Grabbing * a reference to the queue in bio_poll() ensures the hctxs and requests * are still valid as well. */ rcu_read_lock(); bio = READ_ONCE(kiocb->private); if (bio) ret = bio_poll(bio, iob, flags); rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(iocb_bio_iopoll); void update_io_ticks(struct block_device *part, unsigned long now, bool end) { unsigned long stamp; again: stamp = READ_ONCE(part->bd_stamp); if (unlikely(time_after(now, stamp)) && likely(try_cmpxchg(&part->bd_stamp, &stamp, now)) && (end || bdev_count_inflight(part))) __part_stat_add(part, io_ticks, now - stamp); if (bdev_is_partition(part)) { part = bdev_whole(part); goto again; } } unsigned long bdev_start_io_acct(struct block_device *bdev, enum req_op op, unsigned long start_time) { part_stat_lock(); update_io_ticks(bdev, start_time, false); part_stat_local_inc(bdev, in_flight[op_is_write(op)]); part_stat_unlock(); return start_time; } EXPORT_SYMBOL(bdev_start_io_acct); /** * bio_start_io_acct - start I/O accounting for bio based drivers * @bio: bio to start account for * * Returns the start time that should be passed back to bio_end_io_acct(). */ unsigned long bio_start_io_acct(struct bio *bio) { return bdev_start_io_acct(bio->bi_bdev, bio_op(bio), jiffies); } EXPORT_SYMBOL_GPL(bio_start_io_acct); void bdev_end_io_acct(struct block_device *bdev, enum req_op op, unsigned int sectors, unsigned long start_time) { const int sgrp = op_stat_group(op); unsigned long now = READ_ONCE(jiffies); unsigned long duration = now - start_time; part_stat_lock(); update_io_ticks(bdev, now, true); part_stat_inc(bdev, ios[sgrp]); part_stat_add(bdev, sectors[sgrp], sectors); part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration)); part_stat_local_dec(bdev, in_flight[op_is_write(op)]); part_stat_unlock(); } EXPORT_SYMBOL(bdev_end_io_acct); void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time, struct block_device *orig_bdev) { bdev_end_io_acct(orig_bdev, bio_op(bio), bio_sectors(bio), start_time); } EXPORT_SYMBOL_GPL(bio_end_io_acct_remapped); /** * blk_lld_busy - Check if underlying low-level drivers of a device are busy * @q : the queue of the device being checked * * Description: * Check if underlying low-level drivers of a device are busy. * If the drivers want to export their busy state, they must set own * exporting function using blk_queue_lld_busy() first. * * Basically, this function is used only by request stacking drivers * to stop dispatching requests to underlying devices when underlying * devices are busy. This behavior helps more I/O merging on the queue * of the request stacking driver and prevents I/O throughput regression * on burst I/O load. * * Return: * 0 - Not busy (The request stacking driver should dispatch request) * 1 - Busy (The request stacking driver should stop dispatching request) */ int blk_lld_busy(struct request_queue *q) { if (queue_is_mq(q) && q->mq_ops->busy) return q->mq_ops->busy(q); return 0; } EXPORT_SYMBOL_GPL(blk_lld_busy); int kblockd_schedule_work(struct work_struct *work) { return queue_work(kblockd_workqueue, work); } EXPORT_SYMBOL(kblockd_schedule_work); int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay) { return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); } EXPORT_SYMBOL(kblockd_mod_delayed_work_on); void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) { struct task_struct *tsk = current; /* * If this is a nested plug, don't actually assign it. */ if (tsk->plug) return; plug->cur_ktime = 0; rq_list_init(&plug->mq_list); rq_list_init(&plug->cached_rqs); plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT); plug->rq_count = 0; plug->multiple_queues = false; plug->has_elevator = false; INIT_LIST_HEAD(&plug->cb_list); /* * Store ordering should not be needed here, since a potential * preempt will imply a full memory barrier */ tsk->plug = plug; } /** * blk_start_plug - initialize blk_plug and track it inside the task_struct * @plug: The &struct blk_plug that needs to be initialized * * Description: * blk_start_plug() indicates to the block layer an intent by the caller * to submit multiple I/O requests in a batch. The block layer may use * this hint to defer submitting I/Os from the caller until blk_finish_plug() * is called. However, the block layer may choose to submit requests * before a call to blk_finish_plug() if the number of queued I/Os * exceeds %BLK_MAX_REQUEST_COUNT, or if the size of the I/O is larger than * %BLK_PLUG_FLUSH_SIZE. The queued I/Os may also be submitted early if * the task schedules (see below). * * Tracking blk_plug inside the task_struct will help with auto-flushing the * pending I/O should the task end up blocking between blk_start_plug() and * blk_finish_plug(). This is important from a performance perspective, but * also ensures that we don't deadlock. For instance, if the task is blocking * for a memory allocation, memory reclaim could end up wanting to free a * page belonging to that request that is currently residing in our private * plug. By flushing the pending I/O when the process goes to sleep, we avoid * this kind of deadlock. */ void blk_start_plug(struct blk_plug *plug) { blk_start_plug_nr_ios(plug, 1); } EXPORT_SYMBOL(blk_start_plug); static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule) { LIST_HEAD(callbacks); while (!list_empty(&plug->cb_list)) { list_splice_init(&plug->cb_list, &callbacks); while (!list_empty(&callbacks)) { struct blk_plug_cb *cb = list_first_entry(&callbacks, struct blk_plug_cb, list); list_del(&cb->list); cb->callback(cb, from_schedule); } } } struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data, int size) { struct blk_plug *plug = current->plug; struct blk_plug_cb *cb; if (!plug) return NULL; list_for_each_entry(cb, &plug->cb_list, list) if (cb->callback == unplug && cb->data == data) return cb; /* Not currently on the callback list */ BUG_ON(size < sizeof(*cb)); cb = kzalloc(size, GFP_ATOMIC); if (cb) { cb->data = data; cb->callback = unplug; list_add(&cb->list, &plug->cb_list); } return cb; } EXPORT_SYMBOL(blk_check_plugged); void __blk_flush_plug(struct blk_plug *plug, bool from_schedule) { if (!list_empty(&plug->cb_list)) flush_plug_callbacks(plug, from_schedule); blk_mq_flush_plug_list(plug, from_schedule); /* * Unconditionally flush out cached requests, even if the unplug * event came from schedule. Since we know hold references to the * queue for cached requests, we don't want a blocked task holding * up a queue freeze/quiesce event. */ if (unlikely(!rq_list_empty(&plug->cached_rqs))) blk_mq_free_plug_rqs(plug); plug->cur_ktime = 0; current->flags &= ~PF_BLOCK_TS; } /** * blk_finish_plug - mark the end of a batch of submitted I/O * @plug: The &struct blk_plug passed to blk_start_plug() * * Description: * Indicate that a batch of I/O submissions is complete. This function * must be paired with an initial call to blk_start_plug(). The intent * is to allow the block layer to optimize I/O submission. See the * documentation for blk_start_plug() for more information. */ void blk_finish_plug(struct blk_plug *plug) { if (plug == current->plug) { __blk_flush_plug(plug, false); current->plug = NULL; } } EXPORT_SYMBOL(blk_finish_plug); void blk_io_schedule(void) { /* Prevent hang_check timer from firing at us during very long I/O */ unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; if (timeout) io_schedule_timeout(timeout); else io_schedule(); } EXPORT_SYMBOL_GPL(blk_io_schedule); int __init blk_dev_init(void) { BUILD_BUG_ON((__force u32)REQ_OP_LAST >= (1 << REQ_OP_BITS)); BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 * sizeof_field(struct request, cmd_flags)); BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 * sizeof_field(struct bio, bi_opf)); /* used for unplugging and affects IO latency/throughput - HIGHPRI */ kblockd_workqueue = alloc_workqueue("kblockd", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); if (!kblockd_workqueue) panic("Failed to create kblockd\n"); blk_requestq_cachep = KMEM_CACHE(request_queue, SLAB_PANIC); blk_debugfs_root = debugfs_create_dir("block", NULL); return 0; } |
| 3 6 2 9 13 13 4 27 48 46 48 10 10 10 2 15 22 18 61 61 13 13 7 6 7 7 7 6 1 1 2 6 6 6 6 6 6 8 9 4 35 35 35 35 35 35 35 1 3 1 1 1 3 6 6 6 6 6 4 6 6 6 7 7 7 5 4 3 2 4 5 5 7 2 2 7 7 7 3 51 51 4 3 4 71 1 1 1 1 1 1 3 3 2 2 1 3 3 5 50 50 22 28 50 50 50 51 50 49 3 46 51 6 5 6 10 10 7 7 6 5 1 2 10 2 2 1 12 10 2 12 5 5 4 4 4 4 4 4 4 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 1 1 12 3 3 3 2 2 2 2 2 1 1 12 3 2 10 10 12 3 3 6 6 9 9 6 6 6 1 6 3 5 3 2 2 3 3 2 3 3 1 2 2 3 1 2 1 8 3 3 2 2 1 2 2 2 2 1 1 1 1 1 2 2 3 3 48 48 48 48 48 48 48 1 1 1 6 4 2 2 4 3 1 1 4 6 6 6 4 1 33 2 33 33 58 6 6 57 2 35 33 2 35 6 6 1 2 2 49 49 48 48 1 48 7 47 1 47 1 46 7 46 46 46 45 45 4 2 44 45 21 23 24 1 24 2 1 1 1 1 1 23 23 22 15 23 22 22 23 1 22 2 23 23 23 1 1 24 26 32 32 6 6 6 6 6 6 6 2 6 21 21 20 19 7 6 18 18 1 17 16 7 7 1 6 6 6 6 1 1 5 6 6 6 6 2 6 15 6 6 15 1 1 1 1 2 3 27 2 26 25 1 27 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 1 1 2 2 6 6 5 5 5 5 4 4 4 5 10 3 2 1 2 1 2 1 2 2 9 3 3 3 3 2 2 1 1 3 14 14 3 11 11 11 3 11 11 9 9 3 7 7 7 7 6 6 6 6 2 7 7 7 2 7 7 7 7 7 5 5 1 1 5 1 2 1 1 1 1 1 1 8 7 1 11 9 9 13 10 13 1 1 1 1 6 5 5 5 5 4 5 4 1 2 3 2 5 6 4 4 1 4 1 1 4 4 3 1 1 1 1 1 1 1 1 44 2 2 3 3 1 3 2 2 2 1 3 6 44 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2 |